2026 flagship builds

Projects

The portfolio is now anchored around AI-heavy, backend-heavy projects that better reflect the direction I am taking professionally: model building, agent orchestration, secure autonomy, and workflow engineering.

2026 Flagship builds

2026 · Decoder-only language model

Phoenix 125M

A LLaMA-style 125M parameter model trained from scratch on a single RTX 3080 Ti with a custom tokenizer, data pipeline, and training loop.

The project is an end-to-end exercise in model building: corpus curation, tokenization, training stability, benchmarking, and open-source packaging.

✓ ~2B tokens processed, Apache 2.0 release, WinoGrande 0.507.

PyTorch Transformers Tokenization Benchmarking Distributed training MLflow Weights & Biases DVC

2026 · Multilingual language models

Sweta-Hi and Sweta-Kn

Hindi and Kannada pretraining efforts built on a LLaMA-style architecture with custom tokenizers and an end-to-end multilingual data pipeline.

This work is focused on underrepresented language coverage, practical training throughput, and evaluation quality ahead of release.

✓ Custom tokenizers, async data loading, released on HuggingFace.

Multilingual NLP Data engineering Custom tokenizers Model evaluation

2026 · Fine-tuning · Text-to-SQL

SQLForge: Mistral 7B QLoRA

A 4-bit QLoRA fine-tune that turns Mistral 7B v0.3 into a reliable text-to-SQL model. The same 12 GB GPU used for Phoenix 125M, with a 3.75 GB VRAM headroom budgeted up front, and a schema-aware evaluation rebuild after the first WikiSQL run showed the metric was lying.

The project is a focused engineering exercise in capability lift on consumer hardware: model selection by VRAM math, LoRA rank tuning, instruction-template-correct loss masking, and an evaluation harness that does true execution-accuracy comparison against table rows rather than string match.

✓ +77.8 percentage point exact-match lift, 97.4 percent valid SQL, ~8.25 GB peak VRAM on a 12 GB card.

QLoRA Fine-tuning bitsandbytes PEFT Text-to-SQL Evaluation harnesses vLLM

2026 · Agentic content pipeline

LinkedIn Post Swarm

A multi-agent publishing workflow that uses Claude, Ollama, Playwright, and Telegram for draft generation, review, approval, and scheduled publishing.

The system includes critic-revision loops, source aggregation, state management, retries, and escalation paths so autonomy stays controllable.

✓ Human-in-the-loop approvals, resilient retries, scheduled output.

Agent orchestration Prompt engineering Playwright Telegram Bot API Workflow reliability

2026 · Autonomous AI security orchestrator

Rudra

A multi-agent offensive security architecture built around strict scope guardrails, sandboxed execution, and auditable event-driven workflows.

The emphasis is on safe autonomy: typed validation, retry budgets, isolation boundaries, and guardrails that make the system usable for serious testing.

✓ Recon + Analyst agents complete. Scope validation, sandbox, and audit trail fully designed.

AI security Sandbox design Distributed systems Event-driven architecture API integration

2026 · AI-powered lead generation workflow

LocalLeads

An end-to-end backend system for business discovery, AI content generation, site assembly, deployment, and personalized outreach.

Operational controls include SQLite state tracking, Telegram approvals, deployment automation, and delivery flows aimed at production-style reliability.

✓ ~25 businesses contacted, live deployment automation, Telegram approval gates.

Backend development Playwright SQLite Workflow engineering Operational automation

2026 · Autonomous trading intelligence system

ATIS

A 6-tier autonomous system that ingests research papers and filings, builds a causal knowledge graph, backtests theses with walk-forward and Monte Carlo validation, and generates daily ranked swing trade signals on 600 NSE/BSE stocks.

The system is built on three principles: every signal traces to a validated thesis in the knowledge graph, every LLM reasoning step is verified against Neo4j facts, and the architecture self-improves through Elo-based thesis lifecycle management and agent decision auditing.

✓ 59 agents built, Rust hot path implemented, 85/100 system effectiveness score on free data alone.

LangGraph Neo4j Rust VectorBT GraphRAG K3s Kafka Qdrant

2026 · Model Context Protocol server

Dhan MCP Server

A Model Context Protocol server that exposes my live DhanHQ broker account and Indian market data to any MCP client, so Claude or Copilot can answer questions about holdings, positions, and option chains directly.

Built on the official DhanHQ SDK v2 with 11 live tools across portfolio, market data, options, and instrument discovery, two transports, and order placement implemented but disabled so it can never touch real money unintentionally.

✓ 11 live MCP tools, official DhanHQ SDK v2, order placement disabled by default for safety.

MCP FastMCP DhanHQ API Python Options data API integration

2026 · Code comprehension tool

CodeAtlas

A code comprehension tool that turns C source into per-function Mermaid flowcharts, with two interchangeable engines: a deterministic AST path and an LLM path, served over a FastAPI backend and a React canvas.

The same backend exposes two diagram modes selected by an environment flag. AST mode parses C with ast-grep and emits Mermaid directly, fast and deterministic with no model in the loop. LLM mode sends each function to a model through LiteLLM, so it works against local Ollama, GitHub Models, OpenAI, or Anthropic without code changes.

✓ Two diagram engines (AST and LLM), four-service Docker stack, default to local Ollama so it runs free and offline.

FastAPI ast-grep LiteLLM React TypeScript Mermaid WebSocket Docker

2026 · GraphRAG knowledge graph

GraphMind

A GraphRAG build that ingests trade and research data into a Neo4j knowledge graph and a ChromaDB vector store, then answers questions through a LangGraph ReAct agent that picks between graph traversal, dense retrieval, and hybrid search.

Retrieval is three-pronged: Neo4j Cypher for multi-hop relationship questions (which analysts covered an instrument, how funds connect to trades to sectors), dense vectors for semantic similarity, and BM25 for exact terms like tickers. Dense and sparse results are merged with Reciprocal Rank Fusion so meaning and exact-match both count.

✓ Neo4j + ChromaDB + BM25 fused via Reciprocal Rank Fusion, exposed as tools to a LangGraph ReAct agent.

GraphRAG Neo4j Cypher ChromaDB BM25 LangGraph NetworkX Reciprocal Rank Fusion

2026 · GraphRAG with an LLM firewall

PaperGraph

GraphRAG question-answering over a curated corpus of research papers, fronted by a four-layer LLM firewall, with offline RAGAS evaluation and self-hosted Langfuse tracing, packaged for GCP Cloud Run.

PaperGraph fuses a Neo4j knowledge graph with vector retrieval, then guards the entire path with an LLM firewall built to survive untrusted input. It is the flagship for production RAG, AI safety, and LLM observability.

✓ 100% injection-block on a 20-prompt adversarial set, offline RAGAS + Langfuse, 45 tests passing.

GraphRAG Neo4j pgvector Prompt-injection defense RAGAS Langfuse FastAPI Docker

2026 · Demand-forecasting + inventory copilot

KiranaIQ

A demand-forecasting and inventory copilot for small Indian kirana stores: snap a bill, get per-SKU forecasts, plain-language explanations, reorder quantities, and price experiments.

KiranaIQ bundles five capabilities small retailers have no analytics for today: reading a paper bill, forecasting demand per item, explaining the forecast, recommending what and how much to reorder, and testing price changes safely. It is the flagship for classical and forecasting ML, explainability, and applied product engineering.

✓ Measured WAPE 35.8% vs 62.1% seasonal-naive on synthetic retail data, 64 tests passing.

LightGBM SHAP Forecasting A/B testing OCR Recommender systems FastAPI Telegram Bot API

2026 · Learning-to-rank retrieval benchmark

hybrid-search-bench

An honest hybrid-retrieval benchmark: BM25, SPLADE, and dense retrieval, fused and then reranked by a LambdaMART learning-to-rank model, measured on a public BEIR dataset.

Three retrieval legs are evaluated on the same qrels, fused with reciprocal-rank fusion, and then reordered by a learning-to-rank reranker trained on the train split. It is the flagship for search, ranking, and the classical learning-to-rank skills that pure-LLM portfolios usually miss.

✓ BEIR SciFact nDCG@10 0.778 vs 0.728 RRF (+6.9%); BM25 0.686 matches the published figure.

Learning-to-rank LambdaMART BM25 SPLADE Dense retrieval FAISS BEIR Information retrieval

Earlier work

Before the current generation of work, I used smaller ML and web projects to build the habits that still matter now: experimentation, debugging, and shipping complete systems.

2023

Semantic Search Engine

An earlier information retrieval build that combined semantic search ideas with enterprise documentation use cases and set up later work in retrieval-heavy AI systems.

2022

Super Resolution

An image enhancement project built to understand GAN-based vision pipelines and experiment rigor in visual ML work.

2022

Photo to Monet-style art

A CycleGAN style-transfer exploration that taught me a lot about training instability, qualitative evaluation, and visual debugging.

2022

Library Management

A MERN-stack build that sharpened my full-stack fundamentals around CRUD, search, and practical product structure.

On the roadmap

What I am planning next, ahead of building it in the open.

Planned

KiranaIQ forecasting v2

Improve KiranaIQ forecasting accuracy with intermittent-demand handling and a single global model across stores.

View on GitHub →

Open-source and public work

Explore models, code, and experiments

All flagship projects are documented on GitHub. Model weights and cards are published on Hugging Face.

GitHub Hugging Face