OmniSLM
OmniSLM is an extensible Python framework that provides a unified API for building AI applications powered by Small Language Models. It includes RAG pipelines, memory management, agent orchestration, and extensible workflow patterns — all designed to run locally without cloud dependencies.
Tech Stack & Infrastructure
The Problem
Developers building AI applications with small, local models face fragmented tooling. Every project requires stitching together RAG, memory, inference, and agent patterns from scratch.
The Solution
A framework-level abstraction that unifies these concerns into a single, extensible architecture with plugin support, built-in vector storage, and conversation memory.
Architecture Overview
The architecture consists of a pluggable Runtime Layer, a Vector-backed Memory Engine, a built-in RAG Framework using FAISS, and an Agent SDK for autonomous tool execution.
Engineering Decisions
Chose Python for its rich AI ecosystem. Designed a plugin system to allow developers to swap out underlying vector databases or inference engines without rewriting application logic.
Key Tradeoffs
Abstracting the underlying vector databases meant sacrificing some database-specific optimization features for the sake of a unified API.
Core Challenges
Managing memory contexts effectively when dealing with the strict context window limitations of small models.
Results & Impact
Created a functional, extensible framework that significantly reduces boilerplate code for local AI applications.
Future Roadmap
Add native support for vLLM, expand the agent tool ecosystem, and implement a visual workflow builder.
Related Projects
RAG System for Local LLM
Privacy-preserving Retrieval-Augmented Generation pipeline using FAISS and Ollama.
Local LLM Application
Privacy-first local LLM platform built with Java 21, Spring AI, and MongoDB.
PaathAI
AI-driven lecture intelligence platform for transcription, summarization, and progress tracking.