2026 · GraphRAG with an LLM firewall

PaperGraph

GraphRAG question-answering over research papers, fronted by a four-layer LLM firewall, with offline RAGAS evaluation and self-hosted Langfuse tracing, packaged for GCP Cloud Run.

Built 2026

Why I built this

Most RAG demos quietly assume the retrieved text is friendly. I wanted the opposite: a GraphRAG pipeline that treats its own corpus as potentially hostile, wrapped in a firewall I could measure against real adversarial prompts, with the evaluation and tracing a production system needs instead of a notebook screenshot.

100% Injection-block rate (20 adversarial prompts)

4-layer LLM firewall

45 Tests passing

Architecture

Four-layer LLM firewall

Input intent-gate · classifies and blocks hostile prompts before retrieval
Chunk sanitizer · defuses indirect prompt-injection hidden in retrieved text
Output guard · checks generations before they reach the user
Rate + cost caps · per-IP rate, token, and cost limits

GraphRAG retrieval and observability

Neo4j knowledge-graph traversal plus vector search over pgvector and Chroma, fused with reciprocal-rank fusion. Offline RAGAS scores faithfulness and context-precision; self-hosted Langfuse traces latency, token cost, and retrieval hits. Dockerized for GCP Cloud Run at roughly zero cost at portfolio scale.

Tech stack

Technologies used

core

PythonFastAPINeo4jpgvectorChroma

infra

DockerGCP Cloud RunOllamaLlama Guard

tools

RAGAS (offline eval)Langfuse (tracing)Reciprocal Rank Fusion

Key highlights

Proof points

01
GraphRAG retrieval: Neo4j knowledge-graph traversal plus vector search (pgvector and Chroma), fused with reciprocal-rank fusion.
02
Four-layer LLM firewall: input intent-gate, retrieved-chunk sanitizer (an indirect prompt-injection defense), output guard, and per-IP rate plus token and cost caps.
03
Measured 100 percent injection-block rate on a 20-prompt adversarial set, with zero false positives on benign questions.
04
Offline RAGAS evaluation (faithfulness and context-precision) and self-hosted Langfuse tracing of latency, token cost, and retrieval hits.
05
Dockerized for GCP Cloud Run at roughly zero cost at portfolio scale, with 45 passing tests.

Focus areas

GraphRAGNeo4jpgvectorPrompt-injection defenseRAGASLangfuseFastAPIDocker

Explore the work

View on GitHub ← All projects