Architecture
High-level system diagrams and structural documentation for my core engineering projects. Focusing on modularity, local execution, and scalable AI infrastructure.
OmniSLM Architecture
OmniSLM uses a layered architecture designed for extensibility. The Core Orchestrator sits between the API gateway and the underlying inference/memory engines. By abstracting the Vector Store and Inference Engine behind unified interfaces, developers can swap out FAISS for Qdrant, or Ollama for vLLM, without altering their agent logic.
RAG Pipeline Architecture
Ingestion Flow
- 1. Document Loaders (PDF, TXT, MD)
- 2. Semantic Text Splitters (Overlap allowed)
- 3. Local Embedding (SentenceTransformers)
- 4. Vector Indexing (FAISS / Pinecone)
Retrieval Flow
- 1. Query Expansion & Rewriting
- 2. Hybrid Search (Dense Vector + Sparse Keyword)
- 3. Cross-Encoder Re-ranking
- 4. Context Injection & Prompt Construction
The RAG system focuses on Hybrid Retrieval to maximize accuracy. Relying solely on dense embeddings often misses exact keyword matches (like acronyms or IDs). By combining dense vector search with sparse retrieval (e.g., BM25) and passing the combined results through a cross-encoder for re-ranking, the pipeline ensures the LLM receives the most relevant context possible.
Agent Runtime Architecture
Built using a ReAct (Reasoning and Acting) paradigm tailored for smaller context windows. Instead of overwhelming an 8B model with 50 tools, the agent runtime uses a hierarchical routing architecture. A lightweight classifier model selects a specialized sub-agent, which is then provisioned with only the 3-5 tools necessary for its specific domain.
Spring AI Multi-Tenant Platform
(Dynamic Routing)
In the Java ecosystem, the Local LLM Platform leverages Spring Boot's ThreadLocal context (or Reactor Context for WebFlux) to inject tenant IDs into every AI request. This guarantees isolated vector searches and allows per-tenant model configurations (e.g., Tenant A uses local Llama 3 for privacy, Tenant B uses GPT-4 for complex reasoning).
Blockchain + AI Architecture
Used in SeedTracking. The architecture creates a clear separation between deterministic consensus and probabilistic inference. Smart contracts on Ethereum govern state transitions (e.g., transferring seed ownership), while an off-chain Python microservice listens to contract events. When a transfer occurs, the Python service fetches the IPFS metadata, runs an ML fraud-detection model, and writes a risk-score back to the blockchain via an Oracle pattern.