Sudesh P (Sudhii)

AI Systems Engineer & Creator of OmniSLM

I build production-ready AI applications that prioritize privacy, security, and unit economics. My current focus is on operationalizing Small Language Models (SLMs) and designing scalable RAG infrastructure.

AI Architecture

Designing decoupled, resilient systems for LLM inference, continuous batching, and agent orchestration.

Vector Search (RAG)

Implementing highly isolated, multi-tenant vector databases using FAISS, Qdrant, and Pinecone.

Local Inference

Running optimized SLMs on edge devices and VPCs using Ollama, Llama.cpp, and vLLM.

Published Notes

Why Small Language Models Matter in Production

AI Systems Engineering

Jun 10, 20262 min read

Why Small Language Models Matter in Production

A deep dive into why enterprise AI is shifting towards specialized, privacy-first Small Language Models over massive generic APIs.

AI Infrastructure

May 22, 20262 min read

Building Multi-Tenant AI Systems

Architectural patterns for designing AI infrastructure that securely isolates tenant data while maximizing resource utilization.

RAG vs Fine-Tuning: Choosing the Right Strategy

RAG

Apr 15, 20262 min read

RAG vs Fine-Tuning: Choosing the Right Strategy

A comprehensive guide on when to use Retrieval-Augmented Generation versus Fine-Tuning for your AI projects.

OmniSLM

Mar 10, 20261 min read

Lessons Learned Building OmniSLM

The technical hurdles and architectural decisions behind creating an open-source Small Language Model framework.

AI Infrastructure

Feb 5, 20262 min read

Designing AI Infrastructure for Scale

How to architect backend systems capable of handling thousands of concurrent LLM inferences without melting down.