§ /science — a paper, disguised as a page

LLM memory is not
a search problem.

It is a knowledge-compilation problem. Borg's architecture is not an aesthetic choice — it is a response to how memory fails in LLM systems, grounded in cognitive science, information retrieval, temporal data management, and database design.

read · 12 minwritten · apr 2026cites · 4 primary sources

LLMs are stateless inference engines. Every call starts from zero. The memory layer therefore has to do more than retrieve text. It has to produce a trustworthy, budgeted, task-appropriate artifact on demand — then defend every element inside it.

§ 01 — WHAT FAILS

Two approaches
that don't hold up.

Both are popular. Both work fine on a demo. Both degrade sharply once conversations accumulate across weeks and projects.

common failure

Naive RAG

Vector search over raw conversation logs.

Conversation chunks are noisy, context-dependent, and often stale. A chunk that says "we use Redis" becomes actively harmful once the team moves to PostgreSQL. Without supersession, decay, or trust signals, retrieval becomes contamination — and the model sounds confident while being wrong.

common failure

Summarization chains

Periodic summaries of summaries.

Lossy compression compounds. After a few passes you're left with clean, generic prose that sounds helpful but has lost the specific facts, edge cases, and decision rationale that matter during real work. The system gets more fluent and less useful.

§ 02 — FIRST PRINCIPLES

Six anchors,
six decisions.

Each choice solves a specific failure mode. None are aesthetic.

Scientific anchorTulving's episodic + semantic memory model

Evidence-grounded memory

Borg treats conversations as episodic memory and extracted facts as semantic memory. The offline pipeline turns raw episodes into structured facts, procedures, and entities — each with provenance. That gives the system something it can compare, supersede, and rank, instead of treating an old conversation chunk as permanent truth.

Scientific anchorFellegi–Sunter record linkage

Prefer fragmentation over collision

When entity resolution is uncertain, Borg keeps two references separate rather than merging the wrong things. A false merge poisons every downstream fact. A temporary split is inconvenient but recoverable. That tradeoff is the right one when the output feeds an LLM that will reason over bad context with confidence.

Scientific anchorFaceted retrieval, constrained ranking

Faceted retrieval under budget

Borg doesn't bet everything on one ranked list. It retrieves across entities, facts, procedures, and snapshots, then applies memory-type weights per task and namespace. That makes context selection more robust when the real budget is not documents, but a few thousand tokens inside a model window.

Scientific anchorBitemporal data management

Temporal consistency through supersession

Facts in Borg have a lifecycle. Not merely true or false — observed, current, superseded, or archived, with valid time and recording time. That prevents the classic memory failure where a system retrieves both "Python 3.9" and "Python 3.12" with no attempt to resolve which one still applies.

Scientific anchor“Lost in the Middle” · resource-aware retrieval

Namespace scoping & token budgets

LLMs do worse when useful context is diluted by unrelated material. Borg scopes memory by namespace before retrieval, then applies configurable token budgets per namespace. Context is treated like a scarce runtime resource, not an infinite dump target.

Scientific anchorConsistency over system sprawl

PostgreSQL as the source of truth

pgvector covers similarity search. Recursive CTEs cover graph traversal. ACID transactions keep mutations coherent. pgAudit preserves traceability. A separate vector database or graph store adds new consistency boundaries and new failure modes. Borg stays PostgreSQL-native to avoid an entire class of distributed-state bugs.

§ 03 — MEASURED

The thesis,
on ten tasks.

Cloud-infrastructure engineering workloads. Three conditions. Same evaluator, same ground truth, reproducible seeds.

Across 10 benchmark tasks, Borg-compiled context (C) achieved 10/10 task success versus 8/10 for top-10 vector RAG (B) and 0/10 for no memory (A). Retrieval precision reached 91.3%, with a 78% lower stale-fact rate and 61% less irrelevant content than vector RAG. Knowledge coverage improved by 16% (90.8% vs 78.2%).

The gain comes from what is included, not from using fewer tokens — context token counts are comparable between B and C (2,806 vs 3,026). Full methodology, per-task results, and seeds on the benchmarks page.

§ bottom line

The systems that win at AI memory will treat it as data engineering, not vector search with better marketing.

Borg is built around that assumption from the start.

§ 04 — COMPILATION

Compilation,
not search.

This is the architectural bet underneath the whole system. The compiler analogy is literal.

RAG treats memory as a search problem: query in, documents out. Borg treats memory as a compilation problem: messy source material flows through a sequence of extraction, validation, resolution, supersession, ranking, and formatting passes before any context reaches a model.

Source code is redundant, inconsistent, and written for humans. A compiler turns it into a smaller, structured artifact machines can use. Conversations are the same — contradictory, local, emotional, full of dead ends. Borg's offline pipeline compiles them into memory that can be trusted enough to retrieve from.

That is why the pipeline has multiple passes. Embeddings, entity extraction, three-pass resolution, fact extraction, supersession, serving-state updates, procedure extraction, and snapshots each preserve a guarantee you lose if you cut the step out. The cost of each is modest; the cost of their absence is not.