Architecture
System design, schema, deployment, and the decisions behind them.
10 min read
System Overview
One container, one database, two pipelines. Runs locally against any Postgres 14+ instance.
Reference Layout
Local machine (or any container host)
├── borg-engine
│ ├── FastAPI (main.py)
│ ├── MCP endpoint (:8080/mcp) ← Claude Code, Codex CLI, Kiro, Copilot
│ ├── REST endpoint (:8080/api) ← Manual ingestion, admin
│ ├── Background worker ← Async extraction loop
│ └── Snapshot loop ← 24h hot-tier snapshots
│
├── PostgreSQL 14+ (local, Supabase, Neon, Azure, etc.)
│ ├── pgvector, pgAudit, uuid-ossp
│ └── borg_* tables (15 tables + 1 function)
│
└── OpenAI / Azure OpenAI
├── text-embedding-3-small ← Episode embeddings (1536-dim)
└── gpt-5-mini / gpt-4o-mini ← Entity, fact, procedure extractionSupports both standard OpenAI and Azure OpenAI endpoints. The engine runs as a single process with the API server and background worker as async tasks. Any Postgres 14+ instance with pgvector works.
Schema
15 tables + 1 function. Every table separates canonical data from derived serving state.
| Table | Role | Type | Status |
|---|---|---|---|
| borg_episodes | Immutable evidence layer | Canonical | ✅ |
| borg_entities | Graph nodes (typed, namespaced) | Canonical | ✅ |
| borg_entity_state | Entity serving state (tier, salience, access) | Derived | ✅ |
| borg_facts | Graph edges (temporal, with supersession) | Canonical | ✅ |
| borg_fact_state | Fact serving state (salience, access) | Derived | ✅ |
| borg_procedures | Candidate behavioral patterns | Canonical | ✅ |
| borg_predicates | 24 canonical relationship predicates | Reference | ✅ |
| borg_predicate_candidates | Non-canonical predicates pending review | Queue | ✅ |
| borg_resolution_conflicts | Ambiguous entity matches for review | Queue | ✅ |
| borg_namespace_config | Per-namespace token budgets | Config | ✅ |
| borg_audit_log | Full compilation + extraction traces | Audit | ✅ |
| borg_snapshots | 24h hot-tier state captures | Snapshot | ✅ |
borg_traverse()
A recursive CTE function for 1-2 hop graph traversal. Cycle-safe via path tracking. Scoped to a single namespace. Used by the graph_neighborhood retrieval strategy.
SELECT * FROM borg_traverse(
p_entity_id := 'a1b2c3...',
p_max_hops := 2,
p_namespace := 'product-engineering'
);
-- Returns: entity_id, entity_name, entity_type,
-- fact_id, predicate, evidence_status,
-- hop_depth, pathAPI Surface
Three MCP tools + REST endpoints + admin. OSS release runs locally with no authentication.
Core
| Method | Path | Description |
|---|---|---|
| POST | /mcp | MCP Streamable HTTP (borg_think, borg_learn, borg_recall) |
| POST | /api/think | Compile context (REST equivalent of borg_think) |
| POST | /api/learn | Ingest episode (REST equivalent of borg_learn) |
| POST | /api/recall | Search memory (REST equivalent of borg_recall) |
Namespace Management
| Method | Path | Description |
|---|---|---|
| GET | /api/namespaces | List all namespaces with budgets |
| GET | /api/namespaces/:ns | Get config + stats (entity/fact/episode/procedure counts) |
| POST | /api/namespaces | Create namespace with configurable budgets |
| PUT | /api/namespaces/:ns | Update budgets / description |
| DELETE | /api/namespaces/:ns | Delete (protects 'default') |
Admin
| Method | Path | Description |
|---|---|---|
| GET | /api/admin/queue | Processing queue depth + failed count |
| GET | /api/admin/entities | List entities (with tier, salience, access count) |
| GET | /api/admin/facts | List current facts (with salience + access tracking) |
| GET | /api/admin/procedures | List procedures (confidence + observation counts) |
| GET | /api/admin/conflicts | Unresolved entity resolution conflicts |
| GET | /api/admin/predicates | Canonical predicates + pending custom candidates |
| POST | /api/admin/process-episode | Manual extraction trigger |
| POST | /api/admin/requeue-failed | Requeue episodes with extraction errors |
| POST | /api/admin/snapshot | Manual hot-tier snapshot |
| GET | /api/admin/cost-summary | LLM token usage and estimated spend by namespace |
| GET | /api/admin/snapshot/latest | Most recent snapshot for a namespace |
Integrations
Detailed client setup, AGENTS.md guidance, steering files, and MCP examples live on a dedicated page.
This page focuses on runtime, schema, and API surface. Client setup for Claude Code, Codex CLI, Kiro, Claude Desktop, and REST ingestion now lives on the dedicated integrations page.
Design Decisions
The constraints are intentional.
Why LLM in the write path?
Borg extracts structured knowledge (entities, typed facts, procedures) — not just text blobs. This requires an LLM. The trade-off is extraction cost and latency, but it runs offline so it never blocks queries. The alternative (embedding-only, like Ogham) gives you similarity search but not a queryable knowledge graph.
Why not Neo4j / FalkorDB?
Adding a graph database means syncing between two systems. Sync means drift. PostgreSQL recursive CTEs handle 1-2 hop traversal at the expected scale (hundreds of entities, thousands of facts). A dedicated graph DB is a measured-bottleneck escape hatch, not a starting point.
Why three-pass resolution instead of always-merge?
Collision is catastrophic — two different things merged corrupt every attached fact. Fragmentation is recoverable. The 0.92 semantic threshold is deliberately high. The 0.03 ambiguity gap flags uncertain matches for human review. You can always merge entities manually; you can never safely un-merge them.
Why task-specific memory weights instead of one ranking?
A debug task and a compliance audit need fundamentally different memory. Debug needs episodic recall (what happened?) and procedures (what patterns do I follow?). Compliance needs episodic evidence and semantic facts, but should never surface unverified procedures. One ranking can't serve both.
Why two output formats, not a universal one?
Claude handles structured XML with metadata attributes well. GPT prefers compact JSON. Sending XML to GPT wastes tokens on tags it doesn't need. Sending flat JSON to Claude loses the metadata Claude can reason about. Two formats, chosen by model parameter.