
AI Architecture
Designing decoupled, resilient systems for LLM inference, continuous batching, and agent orchestration.
Vector Search (RAG)
Implementing highly isolated, multi-tenant vector databases using FAISS, Qdrant, and Pinecone.
Local Inference
Running optimized SLMs on edge devices and VPCs using Ollama, Llama.cpp, and vLLM.
Published Notes
Why Small Language Models Matter in Production
A deep dive into why enterprise AI is shifting towards specialized, privacy-first Small Language Models over massive generic APIs.
Building Multi-Tenant AI Systems
Architectural patterns for designing AI infrastructure that securely isolates tenant data while maximizing resource utilization.
RAG vs Fine-Tuning: Choosing the Right Strategy
A comprehensive guide on when to use Retrieval-Augmented Generation versus Fine-Tuning for your AI projects.
Lessons Learned Building OmniSLM
The technical hurdles and architectural decisions behind creating an open-source Small Language Model framework.
Designing AI Infrastructure for Scale
How to architect backend systems capable of handling thousands of concurrent LLM inferences without melting down.