Features

Everything Borg does, how it works under the hood, and the trade-offs behind each decision.

Every feature below ships in the open source release under Apache 2.0.

Knowledge Graph Extraction

An LLM pipeline that runs on every ingested episode. Extracts entities, facts, and procedures into a typed, temporal knowledge graph. No manual tagging.

Entity Extraction

Each episode is processed by gpt-5-mini to extract up to 10 entities with a constrained type taxonomy: person, organization, project, service, technology, pattern, environment, document, metric, decision. The LLM is instructed to use the most specific common name ("Webhook Gateway" not "event delivery system") and to include known aliases.

Generic concepts are excluded — "authentication" is not an entity, but "Webhook Signature Validation Pattern" is. Actions and events are excluded. This constraint keeps the graph clean and mergeable.

Three-Pass Entity Resolution

Design principle: prefer fragmentation over collision. Two separate entities for the same thing can be merged later with a simple UPDATE. Two different things incorrectly merged corrupt every fact attached to both. Fragmentation is recoverable. Collision is not.

PassMethodConfidenceCondition
1Exact match (name + type + namespace)1.0Case-insensitive
2Alias match (properties→aliases)0.95Single match only
3Semantic similarity (embedding)> 0.92Top-two gap > 0.03, else flag as conflict

When ambiguous — the top two semantic candidates are within 0.03 of each other — a conflict record is created and a new entity is born. Conflicts are visible via the admin API for manual review.

Canonical Predicate Registry

24 canonical predicates across four categories ensure consistent, queryable relationships. The LLM is given the full list and instructed to use canonical predicates where possible. Non-canonical predicates are flagged as custom and tracked with occurrence counts for promotion review.

Structural

uses/used_by, contains/contained_in, depends_on/dependency_of, implements/implemented_by, integrates_with, authored/authored_by, owns/owned_by

Temporal

replaced/replaced_by

Decisional

decided/decided_by

Operational

deployed_to/hosts, manages/managed_by, configured_with, targets, blocked_by/blocks

Temporal Facts with Supersession

Facts are not overwritten. They're superseded. The full history is always available.

Every fact carries valid_from and valid_until timestamps. When a new fact contradicts an existing current fact (same subject + predicate + different object), the old fact is marked superseded withvalid_until = now(). The old fact is never deleted.

-- March 1: Customer Portal uses Semantic Kernel
fact_id: a1b2... | predicate: uses | valid_until: NULL | status: observed

-- March 10: Customer Portal uses Azure AI Foundry (new decision)
-- Old fact automatically superseded:
fact_id: a1b2... | predicate: uses | valid_until: 2026-03-10 | status: superseded
fact_id: c3d4... | predicate: uses | valid_until: NULL       | status: observed

Seven evidence statuses track the lifecycle of every fact:user_assertedobservedextractedinferredpromoteddeprecatedsuperseded. Authority hierarchy: user-asserted facts outrank LLM-extracted ones. Superseded facts are excluded from compilation but always available for compliance queries.

Task-Specific Context Compilation

Different tasks need different memory. The compiler classifies intent, retrieves from multiple strategies, and weights memory types per task class.

Dual-Profile Classification

Instead of mapping a query to one task class, Borg identifies a primary and optional secondary class. If the confidence gap between them is less than 0.3, both profiles execute retrieval. This eliminates the single-path classification failure mode.

TaskRetrieval ProfilesEpisodicSemanticProcedural
debugGraph + Episode Recall1.00.70.8
architectureFact Lookup + Graph0.51.00.3
complianceEpisode Recall + Facts1.00.80.0
writingFact Lookup0.31.00.6
chatFact Lookup0.41.00.3

Weight 0.0 = hard exclude. Procedural memory is excluded from compliance tasks because candidate patterns are not authoritative enough for audit trails.

Four-Dimension Ranking

Every candidate is scored on four interpretable dimensions. No opaque composite. All scores are logged in the audit trace.

DimensionWeightHow it works
Relevance0.40Vector similarity (when available), multiplied by memory-type weight modifier
Recency0.25Linear decay over 90 days from occurred_at
Stability0.20Evidence status score blended with fact_state.salience_score (70/30)
Provenance0.15Retrieval source quality (procedure_assist=0.9, fact_lookup=0.8, graph=0.7, episode=0.6)

Two Output Formats

Structured XML for Claude, Claude Code, and Copilot. Tags carry metadata (evidence status, salience scores, dates) that models can reason about. Compact JSON for GPT, Codex CLI, and local models — minimal overhead, no tag parsing required. The model parameter on borg_think selects the format automatically, with a manual override available.

Specific Facts Extraction

Named resources, IPs, CLI commands, and counts are stored as structured metadata on each episode.

Beyond entity and fact extraction, the offline pipeline runs a dedicated specific-facts pass on each episode. It captures concrete, named details that general extraction tends to generalize away: IP addresses, CLI invocations, resource names, port numbers, version strings, and numeric counts. These are stored in episode metadata as structured key-value pairs.

Specific facts participate in retrieval like any other memory type. When a debug task asks about a particular error or command, the ranker can surface the original CLI invocation or IP address instead of a generic description.

Procedure Extraction

Borg identifies repeatable patterns, workflows, and decision rules from your conversations. Procedures earn trust through observation, not assertion.

The extraction pipeline runs a dedicated LLM prompt to identify candidate procedures in each episode. Categories include workflow, decision_rule, best_practice, convention, and troubleshooting. Each procedure starts with a low confidence score (0.3) and evidence_status = "extracted".

When the same pattern appears again, the existing record is merged: observation_count is incremented, confidence is recalculated as a weighted average, and the source episode is appended. Procedures are not used in compilation until promoted — which requires observation in 3+ distinct episodes and confidence ≥ 0.8.

This is deliberately conservative. A pattern that appears once might be a one-off. A pattern that appears in five conversations over two weeks is probably a real practice. The pipeline captures candidates early and lets evidence accumulate before surfacing them in context.

Audit Log & Observability

Every compilation decision is traceable. This is the primary mechanism for improving retrieval quality.

Every borg_think call writes a full trace to borg_audit_log: classification results (primary + secondary class with confidences), all retrieval profiles executed, candidates found/selected/rejected with per-item score breakdowns, rejection reasons, compiled token count, output format, and per-stage latency (classify, retrieve, rank, compile).

The offline worker logs extraction metrics for every episode processed: entities extracted/resolved/new, facts extracted, custom predicates encountered, evidence strengths, procedures extracted/merged, and any errors.

Example dashboard queries

-- Retrieval precision over time
SELECT DATE(created_at),
       AVG(candidates_selected::float / NULLIF(candidates_found, 0)) as precision
FROM borg_audit_log WHERE task_class != 'extraction'
GROUP BY 1 ORDER BY 1;

-- Noise detection: most rejected items
SELECT item->>'id', item->>'reason', COUNT(*)
FROM borg_audit_log, jsonb_array_elements(rejected_items) item
WHERE task_class != 'extraction'
GROUP BY 1, 2 ORDER BY 3 DESC LIMIT 20;

-- Entity access hotspots
SELECT e.name, es.access_count, es.last_accessed, es.tier
FROM borg_entities e
JOIN borg_entity_state es ON es.entity_id = e.id
WHERE e.deleted_at IS NULL
ORDER BY es.access_count DESC LIMIT 20;

Cost Tracking

Know what Borg costs before the invoice arrives.

The admin API exposes GET /api/admin/cost-summary, which reports LLM token usage and estimated spend broken down by extraction, embedding, and compilation. The endpoint covers a configurable time window and groups costs by namespace. For a small team (~5 devs), expect roughly $15-30/month in LLM costs.

Supports both standard OpenAI and Azure OpenAI — pricing depends on which provider you configure.

Hybrid Episode Guarantee

Ranking never discards all episodic evidence.

The ranker enforces a minimum of 3 episodes in every compiled context package. If the scored ranking would drop all episodic candidates in favor of higher-scoring facts or procedures, the guarantee forces the top 3 episodes back into the output. This prevents the failure mode where compilation returns only abstract facts with no grounding evidence.

Namespace Isolation

Hard isolation by default. Intentionally restrictive.

Every entity, fact, episode, and procedure belongs to exactly one namespace. All queries are scoped to one namespace. Cross-namespace retrieval is not supported. If the same real-world entity ("APIM") appears in two projects, it exists as two separate entity records. Hot-tier content is namespace-scoped. There is no global hot tier.

Namespaces are managed via full CRUD endpoints at /api/namespaces. Each namespace has configurable hot_tier_budget (default 500 tokens) and warm_tier_budget (default 3000 tokens). The compiler reads these budgets at query time.

This is restrictive by design. Cross-namespace features can be added later with explicit design, not as an accident of unscoped queries.

Memory Tiers

Hot memory is always injected. Warm memory is retrieved per-query. Cold tier is deferred.

TierBehaviorPromotionDemotion
HotAlways injected. ~500 token budget.Pinned by user, OR 5+ compilations in 14 daysUnpinned, OR not retrieved in 30 days, OR superseded
WarmRetrieved per-query. Default tier.Default state for all new memorySuperseded → archived. 90 days no access → archived.

• New facts always start warm. Never hot by default.

• Superseded facts cannot be hot.

• Procedures cannot be hot until ≥3 episodes, ≥7 days, confidence ≥0.8.

• Hot tier overflow demotes lowest-salience item.

Why PostgreSQL Only

One database. No sync. No drift.

Borg uses PostgreSQL for everything: relational data, vector search (pgvector), graph traversal (recursive CTEs), audit logging (pgAudit), and compliance. There is no external graph database. There is no separate vector store.

The advantage is consistency. Delete a fact, and the embedding, the graph edges, and the audit trail all update in one transaction. There is no sync to drift. The MCP server calls stored procedures and passes back results. All the heavy lifting — scoring, graph traversal, hybrid retrieval — runs inside the database.

The trade-off is real: at very high scale, a dedicated graph database or vector store would likely outperform recursive CTEs and pgvector HNSW indexes. That's a future escape hatch, introduced only if a measured bottleneck appears first. For the expected volume (hundreds of entities, thousands of facts), PostgreSQL is the right call.

15 tables + 1 function. Runs on any PostgreSQL 14+ instance.