LLM memory is not
a search problem.
It is a knowledge-compilation problem. Borg's architecture is not an aesthetic choice — it is a response to how memory fails in LLM systems, grounded in cognitive science, information retrieval, temporal data management, and database design.
LLMs are stateless inference engines. Every call starts from zero. The memory layer therefore has to do more than retrieve text. It has to produce a trustworthy, budgeted, task-appropriate artifact on demand — then defend every element inside it.
Two approaches
that don't hold up.
Both are popular. Both work fine on a demo. Both degrade sharply once conversations accumulate across weeks and projects.
Naive RAG
Vector search over raw conversation logs.
Conversation chunks are noisy, context-dependent, and often stale. A chunk that says "we use Redis" becomes actively harmful once the team moves to PostgreSQL. Without supersession, decay, or trust signals, retrieval becomes contamination — and the model sounds confident while being wrong.
Summarization chains
Periodic summaries of summaries.
Lossy compression compounds. After a few passes you're left with clean, generic prose that sounds helpful but has lost the specific facts, edge cases, and decision rationale that matter during real work. The system gets more fluent and less useful.
Six anchors,
six decisions.
Each choice solves a specific failure mode. None are aesthetic.
Evidence-grounded memory
Borg treats conversations as episodic memory and extracted facts as semantic memory. The offline pipeline turns raw episodes into structured facts, procedures, and entities — each with provenance. That gives the system something it can compare, supersede, and rank, instead of treating an old conversation chunk as permanent truth.
Prefer fragmentation over collision
When entity resolution is uncertain, Borg keeps two references separate rather than merging the wrong things. A false merge poisons every downstream fact. A temporary split is inconvenient but recoverable. That tradeoff is the right one when the output feeds an LLM that will reason over bad context with confidence.
Faceted retrieval under budget
Borg doesn't bet everything on one ranked list. It retrieves across entities, facts, procedures, and snapshots, then applies memory-type weights per task and namespace. That makes context selection more robust when the real budget is not documents, but a few thousand tokens inside a model window.
Temporal consistency through supersession
Facts in Borg have a lifecycle. Not merely true or false — observed, current, superseded, or archived, with valid time and recording time. That prevents the classic memory failure where a system retrieves both "Python 3.9" and "Python 3.12" with no attempt to resolve which one still applies.
Namespace scoping & token budgets
LLMs do worse when useful context is diluted by unrelated material. Borg scopes memory by namespace before retrieval, then applies configurable token budgets per namespace. Context is treated like a scarce runtime resource, not an infinite dump target.
PostgreSQL as the source of truth
pgvector covers similarity search. Recursive CTEs cover graph traversal. ACID transactions keep mutations coherent. pgAudit preserves traceability. A separate vector database or graph store adds new consistency boundaries and new failure modes. Borg stays PostgreSQL-native to avoid an entire class of distributed-state bugs.
The thesis,
on ten tasks.
Cloud-infrastructure engineering workloads. Three conditions. Same evaluator, same ground truth, reproducible seeds.
Across 10 benchmark tasks, Borg-compiled context (C) achieved 10/10 task success versus 8/10 for top-10 vector RAG (B) and 0/10 for no memory (A). Retrieval precision reached 91.3%, with a 78% lower stale-fact rate and 61% less irrelevant content than vector RAG. Knowledge coverage improved by 16% (90.8% vs 78.2%).
The gain comes from what is included, not from using fewer tokens — context token counts are comparable between B and C (2,806 vs 3,026). Full methodology, per-task results, and seeds on the benchmarks page.
The systems that win at AI memory will treat it as data engineering, not vector search with better marketing.
Borg is built around that assumption from the start.
Compilation,
not search.
This is the architectural bet underneath the whole system. The compiler analogy is literal.
RAG treats memory as a search problem: query in, documents out. Borg treats memory as a compilation problem: messy source material flows through a sequence of extraction, validation, resolution, supersession, ranking, and formatting passes before any context reaches a model.
Source code is redundant, inconsistent, and written for humans. A compiler turns it into a smaller, structured artifact machines can use. Conversations are the same — contradictory, local, emotional, full of dead ends. Borg's offline pipeline compiles them into memory that can be trusted enough to retrieve from.
That is why the pipeline has multiple passes. Embeddings, entity extraction, three-pass resolution, fact extraction, supersession, serving-state updates, procedure extraction, and snapshots each preserve a guarantee you lose if you cut the step out. The cost of each is modest; the cost of their absence is not.
The runtime — 15 tables, one function.
Topology, schema, API surface, and the decisions behind each constraint. Where the theory becomes code.
OPEN ↗/featuresFeatures, with tradeoffs.
Each capability plus the decision it implies. How supersession, classification, and ranking actually run.
OPEN ↗