$date:January 13, 2026$author:Maguyva Team$read:10 min

Ground Truths: Anchoring AI Agents to Reality

[Architecture][Grounding]

> AI agents hallucinate confidently. Ground truths are versioned, scoped facts that anchor agent behavior to reality. Here is how we built and enforce them.

Numbers in this post reflect the system at publication (January 2026). See our team page for current figures.

AI agents are remarkably capable. They can reason, synthesize, and generate. But they have a fundamental weakness: they make things up. Not maliciously, but confidently. An agent might invent API parameters that do not exist, reference configurations that were never defined, or apply patterns from its training data that contradict your actual architecture.

The standard mitigation is “give the agent more context.” But context can be contradictory. Documentation drifts from implementation. Comments lie. Even code can mislead when read without understanding intent.

We needed something more explicit. Something that could not be ignored or misinterpreted. Something that would anchor agents to verifiable reality.

We call them Ground Truths.

What is a Ground Truth?

A ground truth is an explicit, versioned statement of fact that agents must respect. It is not documentation. It is not a comment. It is a first-class entity in the system with:

A unique identifier (like GT-MAG-015 or GT-MAG-036)
A lifecycle status (current, tentative, or deprecated)
A scope (platform-wide, package-specific, or domain-bound)
Evidence (file paths, URLs, or references that prove the statement)
Agent guidance (explicit do/avoid instructions)

Here is an example from our Maguyva code intelligence platform:

- id: GT-MAG-015
  status: current
  scope: package
  statement: |
    Fuzzy symbol matching is opt-in via `find_similar=true`.
    Default behavior returns empty results for non-existent symbols;
    `exact_match=true` enforces strict matching and disables all fuzzy fallbacks.
  rationale: |
    Deterministic defaults prevent agents from receiving misleading results.
    Typos should fail explicitly rather than silently returning unrelated symbols.
  evidence:
    - "packages/maguyva/server/src/maguyva/services/code_analysis_service.py"
    - "packages/maguyva/server/docs/quick_reference/parameters.md"
  last_verified: "2026-01-25"
  tags:
    - product
    - ai_first
    - principle

This is not prose. It is a contract. When an agent encounters this ground truth, it knows:

The default is deterministic (empty results, not fuzzy guesses)
There are specific parameters (find_similar, exact_match) with defined behaviors
Evidence exists in specific files that can be verified
The statement was verified on a specific date

The Anatomy of a Ground Truth Registry

Ground truths live in YAML registries under ai_assets/reference/ground_truths.yaml. Each package or domain can have its own registry. The structure is:

metadata:
  title: "Maguyva Ground Truths"
  summary: "Foundational constraints and principles that guide Maguyva."
  last_updated: "2026-01-26"
  owner: "maguyva"
  render:
    include_statuses: [current, tentative]
    show_deprecated: true
    groups:
      - title: "Product Principles"
        tags: [product, principle, brand]
      - title: "Architecture & Boundaries"
        tags: [architecture, boundaries, cqrs]

statements:
  - id: GT-MAG-001
    status: current
    scope: package
    statement: "Maguyva is read-only with respect to user repositories..."
    ...

The registry includes metadata about the collection itself, render configuration for documentation generation, and the statements themselves. Each statement follows a strict schema validated by Pydantic models:

class GroundTruthStatement(BaseModel):
    id: str
    status: GTStatus  # current, tentative, deprecated
    source: GTSource | None  # claude-code, orkestra, discipline
    scope: GTScope  # platform, package, domain
    statement: str
    rationale: str | None
    evidence: list[str]
    last_verified: str | None
    tags: list[str]
    agent_guidance: AgentGuidance | None

How Agents Access Ground Truths

Ground truths are exposed through multiple channels:

1. Rendered Documentation

The orkestra sync command transforms YAML registries into readable markdown:

uv run orkestra sync

This generates GROUND_TRUTHS.md files that are included in agent context. The rendered output groups statements by status and category:

## Current
### Product Principles
- Maguyva is read-only with respect to user repositories... (GT-MAG-001)
- Design tool responses for AI agents first... (GT-MAG-009)

### Architecture & Boundaries
- Pipeline and Maguyva boundaries are intentional... (GT-MAG-006)

2. CLI Search

Agents with shell access can search ground truths programmatically:

uv run orkestra context ground-truths search "embedding"
uv run orkestra context ground-truths search --tag security
uv run orkestra context ground-truths list --status current

The search function scores matches across multiple fields with weighted relevance:

def truth_fields(gt: GroundTruthStatement) -> list[FieldSpec]:
    return [
        FieldSpec(name="id", weight=6, values=[gt.id]),
        FieldSpec(name="statement", weight=5, values=[gt.statement]),
        FieldSpec(name="rationale", weight=4, values=[gt.rationale] if gt.rationale else []),
        FieldSpec(name="tags", weight=3, values=gt.tags or []),
        FieldSpec(name="evidence", weight=2, values=gt.evidence or []),
    ]

3. Context Composition

When agents are rendered from YAML definitions, their context can reference ground truth registries:

context_composition:
  domain_knowledge:
    - packages/maguyva/ai_assets/reference/ground_truths.yaml

This ensures relevant ground truths are loaded before the agent begins work.

Categories of Ground Truths

Looking across our registries, ground truths cluster into several patterns:

Product Principles

Constraints on what the product is and is not:

“Maguyva is read-only with respect to user repositories; the only non-rebuildable asset is the paid embeddings cache.” (GT-MAG-001)

Architecture Boundaries

Where responsibilities live and why:

“Pipeline and Maguyva boundaries are intentional: pipeline is reusable, Maguyva holds code-specific logic, and CQRS separates stage writes from server reads.” (GT-MAG-006)

Anti-Hallucination Rules

Explicit mandates that keep tool contracts deterministic instead of inferred:

“Fuzzy symbol matching is opt-in via find_similar=true. Default behavior returns empty results for non-existent symbols; exact_match=true enforces strict matching and disables all fuzzy fallbacks.” (GT-MAG-015)

Quality Gates

Standards that must be maintained:

“Changes to shared infrastructure (post_filters.py, relationship extractors, shared handlers) MUST be validated against ALL supported languages via full manifest generation before commit. A single-language validation is insufficient for shared code.” (GT-MAG-036)

Code Patterns

Implementation requirements:

“Use asyncio.to_thread() for CPU-bound work in async contexts; the deprecated loop.run_in_executor() pattern should not be used in new code.” (GT-MAG-018)

The Lifecycle of a Ground Truth

Ground truths are not static. They evolve through a defined lifecycle:

Tentative

A proposed truth under evaluation. The statement is recorded but may change:

- id: GT-MAG-044
  status: tentative
  statement: |
    get_file with include_metadata=false may still return metadata in the
    response because middleware may re-inject it for AI agent disambiguation.

Current

A verified truth that agents must respect. Evidence has been validated:

- id: GT-MAG-017
  status: current
  last_verified: "2026-01-26"
  evidence:
    - "packages/maguyva/server/src/maguyva/services/supabase.py"

Deprecated

A truth that no longer applies. Kept for historical reference with a pointer to what replaced it:

- id: GT-MAG-099
  status: deprecated
  superseded_by: GT-MAG-015
  notes: "Replaced when we moved to explicit matching behavior"

Why Not Just Documentation?

Documentation serves a different purpose. It explains. It teaches. It can be vague, can use qualifiers like “generally” or “typically.”

Ground truths cannot be vague. They are assertions. They either apply or they do not.

Consider the difference:

Documentation: “The API generally returns empty results when a symbol is not found, though fuzzy matching may be enabled in some configurations.”

Ground Truth: “Default behavior returns empty results for non-existent symbols; exact_match=true enforces strict matching and disables all fuzzy fallbacks.”

The first is helpful for humans learning the system. The second is actionable for agents making decisions.

Agent Guidance: Do and Avoid

Some ground truths include explicit agent guidance:

- id: GT-MAG-022
  statement: |
    Accuracy fixes must happen at extraction time via production code,
    never via validator filters.
  agent_guidance:
    do:
      - "Fix extraction bugs in YAML config, handlers, or tree-sitter queries"
      - "Add test cases at the layer where the fix lives"
    avoid:
      - "Adding validator filters to mask production bugs"
      - "Creating test-only workarounds for extraction issues"

This removes ambiguity. An agent reading this knows not just what is true, but what actions that truth implies.

Verification and Maintenance

Ground truths require maintenance. We track:

last_verified: When someone confirmed the statement still holds
evidence: Files that prove the statement (can be checked for existence)
source: Where the truth originated (CLI inspection, architecture review, post-incident learning)

A ground truth with stale verification dates or broken evidence links is a signal to investigate. Either the truth is still valid and needs re-verification, or reality has changed and the truth needs updating.

Real Examples from Production

Security Boundary

- id: GT-MAG-014
  statement: |
    Maguyva queries are search patterns, not executable code.
    SQL injection prevention is handled by PostgREST parameterization;
    application-layer SQL keyword blocking must never be added.
  rationale: |
    Blocking SQL keywords breaks legitimate code search. Users search FOR
    code containing patterns like 'DROP TABLE', they don't execute them.

This ground truth prevents a class of misguided “security improvements” that would break the product.

Extraction-Time Accuracy

- id: GT-MAG-022
  statement: |
    Accuracy fixes must happen at extraction time via production code
    (YAML config, handlers, queries), never via validator filters.
  rationale: |
    Validator filters only run during tests. They can hide extractor bugs
    while production responses remain wrong.

This came from painful experience. Agents would patch failing language packs by adding validator-only filters that made the test harness look greener, while the live Maguyva extractor still emitted the wrong edges. The rule forces fixes back into the real path: YAML config, queries, or handlers.

Multi-Tier Filtering

- id: GT-MAG-023
  statement: |
    Language engine uses three-tier filtering: external_method_patterns
    (builtins/stdlib), allowlists (legitimate idioms), and expected_call_targets
    (validation-time deduplication). Each tier serves a distinct purpose.
  rationale: |
    Conflating filter purposes leads to either over-filtering (missing real
    relationships) or under-filtering (noise).

This prevents agents from adding filters in the wrong place, a common mistake that caused accuracy regressions.

Integration with the Orchestration System

Ground truths are one layer of a broader context system:

Architectural Decisions (ADRs) - Record why we chose approach A over B
Ground Truths - State what is definitively true right now
Domain Patterns - Describe how to do things correctly
Anti-Patterns - Describe what to avoid and why

An agent working in the system has access to all four. Ground truths provide the factual anchor, while decisions explain history, patterns guide implementation, and anti-patterns warn of pitfalls.

Measuring Impact

Since introducing ground truths, we have observed:

Fewer “fix the hallucinated fix” cycles
More confident agent decision-making when facts are clear
Better PR reviews because expectations are explicit
Reduced onboarding time for new agents (and humans)

The investment in maintaining ground truths pays off in reduced debugging and clearer system boundaries.

Getting Started

To add a ground truth to your system:

Create a ground_truths.yaml in your package’s ai_assets/reference/ directory
Define metadata and render configuration
Add statements following the schema
Run uv run orkestra sync to generate documentation
Include the registry in agent context composition

Start with the facts that cause the most confusion or the constraints that get violated most often. Those are your highest-value ground truths.

Conclusion

AI agents will hallucinate. That is their nature. But we can create environments where hallucination is constrained, where certain facts are non-negotiable, where agents can check their assumptions against verified reality.

Ground truths are not a complete solution. They require maintenance. They can become stale. They add overhead to the development process.

But they provide something valuable: a shared vocabulary of facts that both humans and agents can trust. In a world where agents increasingly participate in software development, that shared foundation becomes essential.

The alternative is endless cycles of agents making confident mistakes and humans correcting them. Ground truths break that cycle by making the corrections explicit and durable.

Your agents deserve to know what is true. Tell them.

More from the Maguyva build log

Jun 2026

Why We Upgraded Code Search to voyage-4-large_

We moved our code embeddings to voyage-4-large — currently top of the public RTEB code retrieval leaderboard. The honest version: the trade we make, what we actually index, and why we pay for premium embeddings.

[Embeddings][Search][Architecture]

May 2026

Language Recursive Self-Improvement: Grinding Code Intelligence Across ~280 Languages_

We support code intelligence for ~280 languages. No human can hand-audit that. So we built a language recursive self-improvement loop — spot-check, LLM-as-judge, fix one thing, re-validate — and run it with a fleet of isolated agents until extraction is actually right, not just green.

[Architecture][Languages][Agents]

Apr 2026

Multi-Modal Fusion Search: Picking the Right Retriever For Every Query_

A query like 'where is parseConfig defined' wants a different search than 'how does auth work'. Maguyva classifies the intent, weights four retrieval modalities accordingly, and fuses the results with weighted Reciprocal Rank Fusion.

[Search][Architecture]