Architecture

System design, schema, deployment, and the decisions behind them.

10 min read

System Overview

One container, one database, two pipelines. Runs locally against any Postgres 14+ instance.

Reference Layout

Local machine (or any container host)
├── borg-engine
│   ├── FastAPI (main.py)
│   ├── MCP endpoint (:8080/mcp)          ← Claude Code, Codex CLI, Kiro, Copilot
│   ├── REST endpoint (:8080/api)          ← Manual ingestion, admin
│   ├── Background worker                  ← Async extraction loop
│   └── Snapshot loop                      ← 24h hot-tier snapshots
│
├── PostgreSQL 14+ (local, Supabase, Neon, Azure, etc.)
│   ├── pgvector, pgAudit, uuid-ossp
│   └── borg_* tables (15 tables + 1 function)
│
└── OpenAI / Azure OpenAI
    ├── text-embedding-3-small             ← Episode embeddings (1536-dim)
    └── gpt-5-mini / gpt-4o-mini           ← Entity, fact, procedure extraction

Supports both standard OpenAI and Azure OpenAI endpoints. The engine runs as a single process with the API server and background worker as async tasks. Any Postgres 14+ instance with pgvector works.

Schema

15 tables + 1 function. Every table separates canonical data from derived serving state.

TableRoleTypeStatus
borg_episodesImmutable evidence layerCanonical
borg_entitiesGraph nodes (typed, namespaced)Canonical
borg_entity_stateEntity serving state (tier, salience, access)Derived
borg_factsGraph edges (temporal, with supersession)Canonical
borg_fact_stateFact serving state (salience, access)Derived
borg_proceduresCandidate behavioral patternsCanonical
borg_predicates24 canonical relationship predicatesReference
borg_predicate_candidatesNon-canonical predicates pending reviewQueue
borg_resolution_conflictsAmbiguous entity matches for reviewQueue
borg_namespace_configPer-namespace token budgetsConfig
borg_audit_logFull compilation + extraction tracesAudit
borg_snapshots24h hot-tier state capturesSnapshot

borg_traverse()

A recursive CTE function for 1-2 hop graph traversal. Cycle-safe via path tracking. Scoped to a single namespace. Used by the graph_neighborhood retrieval strategy.

SELECT * FROM borg_traverse(
  p_entity_id := 'a1b2c3...',
  p_max_hops  := 2,
  p_namespace := 'product-engineering'
);
-- Returns: entity_id, entity_name, entity_type,
--          fact_id, predicate, evidence_status,
--          hop_depth, path

API Surface

Three MCP tools + REST endpoints + admin. OSS release runs locally with no authentication.

Core

MethodPathDescription
POST/mcpMCP Streamable HTTP (borg_think, borg_learn, borg_recall)
POST/api/thinkCompile context (REST equivalent of borg_think)
POST/api/learnIngest episode (REST equivalent of borg_learn)
POST/api/recallSearch memory (REST equivalent of borg_recall)

Namespace Management

MethodPathDescription
GET/api/namespacesList all namespaces with budgets
GET/api/namespaces/:nsGet config + stats (entity/fact/episode/procedure counts)
POST/api/namespacesCreate namespace with configurable budgets
PUT/api/namespaces/:nsUpdate budgets / description
DELETE/api/namespaces/:nsDelete (protects 'default')

Admin

MethodPathDescription
GET/api/admin/queueProcessing queue depth + failed count
GET/api/admin/entitiesList entities (with tier, salience, access count)
GET/api/admin/factsList current facts (with salience + access tracking)
GET/api/admin/proceduresList procedures (confidence + observation counts)
GET/api/admin/conflictsUnresolved entity resolution conflicts
GET/api/admin/predicatesCanonical predicates + pending custom candidates
POST/api/admin/process-episodeManual extraction trigger
POST/api/admin/requeue-failedRequeue episodes with extraction errors
POST/api/admin/snapshotManual hot-tier snapshot
GET/api/admin/cost-summaryLLM token usage and estimated spend by namespace
GET/api/admin/snapshot/latestMost recent snapshot for a namespace

Integrations

Detailed client setup, AGENTS.md guidance, steering files, and MCP examples live on a dedicated page.

This page focuses on runtime, schema, and API surface. Client setup for Claude Code, Codex CLI, Kiro, Claude Desktop, and REST ingestion now lives on the dedicated integrations page.

Design Decisions

The constraints are intentional.

Why LLM in the write path?

Borg extracts structured knowledge (entities, typed facts, procedures) — not just text blobs. This requires an LLM. The trade-off is extraction cost and latency, but it runs offline so it never blocks queries. The alternative (embedding-only, like Ogham) gives you similarity search but not a queryable knowledge graph.

Why not Neo4j / FalkorDB?

Adding a graph database means syncing between two systems. Sync means drift. PostgreSQL recursive CTEs handle 1-2 hop traversal at the expected scale (hundreds of entities, thousands of facts). A dedicated graph DB is a measured-bottleneck escape hatch, not a starting point.

Why three-pass resolution instead of always-merge?

Collision is catastrophic — two different things merged corrupt every attached fact. Fragmentation is recoverable. The 0.92 semantic threshold is deliberately high. The 0.03 ambiguity gap flags uncertain matches for human review. You can always merge entities manually; you can never safely un-merge them.

Why task-specific memory weights instead of one ranking?

A debug task and a compliance audit need fundamentally different memory. Debug needs episodic recall (what happened?) and procedures (what patterns do I follow?). Compliance needs episodic evidence and semantic facts, but should never surface unverified procedures. One ranking can't serve both.

Why two output formats, not a universal one?

Claude handles structured XML with metadata attributes well. GPT prefers compact JSON. Sending XML to GPT wastes tokens on tags it doesn't need. Sending flat JSON to Claude loses the metadata Claude can reason about. Two formats, chosen by model parameter.