Sudesh P — AI Systems Engineer
Creator of OmniSLM. Building production-ready AI applications with Small Language Models.
Focused on RAG pipelines, local-first LLM platforms, agent architectures, and privacy-first AI infrastructure.
OmniSLM v0.5 is now available
Introducing native agent orchestration, seamless vLLM continuous batching integration, and enhanced memory providers for complex multi-turn workflows.
Latest Insights
Thoughts on building production-grade AI infrastructure and the shift towards Small Language Models.
Why Small Language Models Matter in Production
A deep dive into why enterprise AI is shifting towards specialized, privacy-first Small Language Models over massive generic APIs.
Building Multi-Tenant AI Systems
Architectural patterns for designing AI infrastructure that securely isolates tenant data while maximizing resource utilization.
RAG vs Fine-Tuning: Choosing the Right Strategy
A comprehensive guide on when to use Retrieval-Augmented Generation versus Fine-Tuning for your AI projects.
Engineering Note: Vector Isolation
"Never trust the LLM prompt to filter tenant data. In multi-tenant RAG architectures, isolation must happen at the physical or metadata layer before the retrieved context ever reaches the inference engine."
Engineering Note: Async Inference
"Synchronous HTTP requests to an LLM endpoint will eventually bring down your system. Always decouple the web tier from the inference tier using a robust message queue like RabbitMQ."
Subscribe to AI Engineering Notes
Occasional insights on Small Language Models, RAG architectures, and building production-ready local AI systems. No spam, ever.
Other Engineering Work
Case studies of production architectures, from academic intelligence platforms to Web3 supply chains.
SeedTracking
Blockchain-based seed supply chain platform with ML fraud detection.
Problem
Fraud and opacity in agricultural seed supply chains costs farmers billions annually. Counterfeit seeds reduce crop yields and there's no reliable way to verify authenticity.
Outcome
Enables end-to-end traceability of seed batches. ML model flags anomalous distribution patterns that indicate potential fraud.
Architecture Highlights
Smart contracts on Ethereum handle state changes, while IPFS is used for decentralized document storage. An ML service scores fraud risk.
Local LLM Application
Privacy-first local LLM platform built with Java 21, Spring AI, and MongoDB.
Problem
Java enterprise teams need LLM capabilities but existing tools are Python-only, creating a skills gap.
Outcome
Bridges the Java-AI gap. Enterprise teams can integrate LLM features using familiar Spring patterns.
Architecture Highlights
A Spring Boot application using WebFlux for reactive endpoints, MongoDB for session storage, and Spring AI for model orchestration.
PaathAI
AI-driven lecture intelligence platform for transcription, summarization, and progress tracking.
Problem
Students miss key points in lectures, and there's no structured way to search, review, or track coverage of syllabus topics across sessions.
Outcome
Transforms passive lecture recordings into structured, searchable knowledge bases with syllabus alignment.
Architecture Highlights
An AI platform that processes lecture audio, maps content to syllabus topics, and provides searchable summaries with progress analytics.
RAG System for Local LLM
Privacy-preserving Retrieval-Augmented Generation pipeline using FAISS and Ollama.
Problem
Organizations with sensitive documents can't use cloud-based AI services due to data privacy and compliance requirements.
Outcome
Enables AI-powered document Q&A for privacy-sensitive organizations. Processes documents locally with zero data leakage.
Architecture Highlights
A pipeline that ingests documents, chunks them, embeds them locally using SentenceTransformers, and stores them in FAISS. Ollama handles LLM inference.
