Ground Truths: Anchoring AI Agents to Reality
> AI agents hallucinate confidently. Ground truths are versioned, scoped facts that anchor agent behavior to reality. Here is how we built and enforce them.
Numbers in this post reflect the system at publication (January 2026). See our team page for current figures.
AI agents are remarkably capable. They can reason, synthesize, and generate. But they have a fundamental weakness: they make things up. Not maliciously, but confidently. An agent might invent API parameters that do not exist, reference configurations that were never defined, or apply patterns from its training data that contradict your actual architecture.
The standard mitigation is “give the agent more context.” But context can be contradictory. Documentation drifts from implementation. Comments lie. Even code can mislead when read without understanding intent.
We needed something more explicit. Something that could not be ignored or misinterpreted. Something that would anchor agents to verifiable reality.
We call them Ground Truths.
What is a Ground Truth?
A ground truth is an explicit, versioned statement of fact that agents must respect. It is not documentation. It is not a comment. It is a first-class entity in the system with:
- A unique identifier (like
GT-MAG-015orGT-MAG-036) - A lifecycle status (current, tentative, or deprecated)
- A scope (platform-wide, package-specific, or domain-bound)
- Evidence (file paths, URLs, or references that prove the statement)
- Agent guidance (explicit do/avoid instructions)
Here is an example from our Maguyva code intelligence platform:
- id: GT-MAG-015
status: current
scope: package
statement: |
Fuzzy symbol matching is opt-in via `find_similar=true`.
Default behavior returns empty results for non-existent symbols;
`exact_match=true` enforces strict matching and disables all fuzzy fallbacks.
rationale: |
Deterministic defaults prevent agents from receiving misleading results.
Typos should fail explicitly rather than silently returning unrelated symbols.
evidence:
- "packages/maguyva/server/src/maguyva/services/code_analysis_service.py"
- "packages/maguyva/server/docs/quick_reference/parameters.md"
last_verified: "2026-01-25"
tags:
- product
- ai_first
- principle
This is not prose. It is a contract. When an agent encounters this ground truth, it knows:
- The default is deterministic (empty results, not fuzzy guesses)
- There are specific parameters (
find_similar,exact_match) with defined behaviors - Evidence exists in specific files that can be verified
- The statement was verified on a specific date
The Anatomy of a Ground Truth Registry
Ground truths live in YAML registries under ai_assets/reference/ground_truths.yaml. Each package or domain can have its own registry. The structure is:
metadata:
title: "Maguyva Ground Truths"
summary: "Foundational constraints and principles that guide Maguyva."
last_updated: "2026-01-26"
owner: "maguyva"
render:
include_statuses: [current, tentative]
show_deprecated: true
groups:
- title: "Product Principles"
tags: [product, principle, brand]
- title: "Architecture & Boundaries"
tags: [architecture, boundaries, cqrs]
statements:
- id: GT-MAG-001
status: current
scope: package
statement: "Maguyva is read-only with respect to user repositories..."
...
The registry includes metadata about the collection itself, render configuration for documentation generation, and the statements themselves. Each statement follows a strict schema validated by Pydantic models:
class GroundTruthStatement(BaseModel):
id: str
status: GTStatus # current, tentative, deprecated
source: GTSource | None # claude-code, orkestra, discipline
scope: GTScope # platform, package, domain
statement: str
rationale: str | None
evidence: list[str]
last_verified: str | None
tags: list[str]
agent_guidance: AgentGuidance | None
How Agents Access Ground Truths
Ground truths are exposed through multiple channels:
1. Rendered Documentation
The orkestra sync command transforms YAML registries into readable markdown:
uv run orkestra sync
This generates GROUND_TRUTHS.md files that are included in agent context. The rendered output groups statements by status and category:
## Current
### Product Principles
- Maguyva is read-only with respect to user repositories... (GT-MAG-001)
- Design tool responses for AI agents first... (GT-MAG-009)
### Architecture & Boundaries
- Pipeline and Maguyva boundaries are intentional... (GT-MAG-006)
2. CLI Search
Agents with shell access can search ground truths programmatically:
uv run orkestra knowledge truths search "embedding"
uv run orkestra knowledge truths search --tag security
uv run orkestra knowledge truths list --status current
The search function scores matches across multiple fields with weighted relevance:
def truth_fields(gt: GroundTruthStatement) -> list[FieldSpec]:
return [
FieldSpec(name="id", weight=6, values=[gt.id]),
FieldSpec(name="statement", weight=5, values=[gt.statement]),
FieldSpec(name="rationale", weight=4, values=[gt.rationale] if gt.rationale else []),
FieldSpec(name="tags", weight=3, values=gt.tags or []),
FieldSpec(name="evidence", weight=2, values=gt.evidence or []),
]
3. Context Composition
When agents are rendered from YAML definitions, their context can reference ground truth registries:
context_composition:
domain_knowledge:
- packages/maguyva/ai_assets/reference/ground_truths.yaml
This ensures relevant ground truths are loaded before the agent begins work.
Categories of Ground Truths
Looking across our registries, ground truths cluster into several patterns:
Product Principles
Constraints on what the product is and is not:
“Maguyva is read-only with respect to user repositories; the only non-rebuildable asset is the paid embeddings cache.” (GT-MAG-001)
Architecture Boundaries
Where responsibilities live and why:
“Pipeline and Maguyva boundaries are intentional: pipeline is reusable, Maguyva holds code-specific logic, and CQRS separates stage writes from server reads.” (GT-MAG-006)
Anti-Hallucination Rules
Explicit mandates that keep tool contracts deterministic instead of inferred:
“Fuzzy symbol matching is opt-in via
find_similar=true. Default behavior returns empty results for non-existent symbols;exact_match=trueenforces strict matching and disables all fuzzy fallbacks.” (GT-MAG-015)
Quality Gates
Standards that must be maintained:
“Changes to shared infrastructure (post_filters.py, relationship extractors, shared handlers) MUST be validated against ALL supported languages via full manifest generation before commit. A single-language validation is insufficient for shared code.” (GT-MAG-036)
Code Patterns
Implementation requirements:
“Use
asyncio.to_thread()for CPU-bound work in async contexts; the deprecatedloop.run_in_executor()pattern should not be used in new code.” (GT-MAG-018)
The Lifecycle of a Ground Truth
Ground truths are not static. They evolve through a defined lifecycle:
Tentative
A proposed truth under evaluation. The statement is recorded but may change:
- id: GT-MAG-044
status: tentative
statement: |
get_file with include_metadata=false may still return metadata in the
response because middleware may re-inject it for AI agent disambiguation.
Current
A verified truth that agents must respect. Evidence has been validated:
- id: GT-MAG-017
status: current
last_verified: "2026-01-26"
evidence:
- "packages/maguyva/server/src/maguyva/services/supabase.py"
Deprecated
A truth that no longer applies. Kept for historical reference with a pointer to what replaced it:
- id: GT-MAG-099
status: deprecated
superseded_by: GT-MAG-015
notes: "Replaced when we moved to explicit matching behavior"
Why Not Just Documentation?
Documentation serves a different purpose. It explains. It teaches. It can be vague, can use qualifiers like “generally” or “typically.”
Ground truths cannot be vague. They are assertions. They either apply or they do not.
Consider the difference:
Documentation: “The API generally returns empty results when a symbol is not found, though fuzzy matching may be enabled in some configurations.”
Ground Truth: “Default behavior returns empty results for non-existent symbols; exact_match=true enforces strict matching and disables all fuzzy fallbacks.”
The first is helpful for humans learning the system. The second is actionable for agents making decisions.
Agent Guidance: Do and Avoid
Some ground truths include explicit agent guidance:
- id: GT-MAG-022
statement: |
Accuracy fixes must happen at extraction time via production code,
never via validator filters.
agent_guidance:
do:
- "Fix extraction bugs in YAML config, handlers, or tree-sitter queries"
- "Add test cases at the layer where the fix lives"
avoid:
- "Adding validator filters to mask production bugs"
- "Creating test-only workarounds for extraction issues"
This removes ambiguity. An agent reading this knows not just what is true, but what actions that truth implies.
Verification and Maintenance
Ground truths require maintenance. We track:
- last_verified: When someone confirmed the statement still holds
- evidence: Files that prove the statement (can be checked for existence)
- source: Where the truth originated (CLI inspection, architecture review, post-incident learning)
A ground truth with stale verification dates or broken evidence links is a signal to investigate. Either the truth is still valid and needs re-verification, or reality has changed and the truth needs updating.
Real Examples from Production
Security Boundary
- id: GT-MAG-014
statement: |
Maguyva queries are search patterns, not executable code.
SQL injection prevention is handled by PostgREST parameterization;
application-layer SQL keyword blocking must never be added.
rationale: |
Blocking SQL keywords breaks legitimate code search. Users search FOR
code containing patterns like 'DROP TABLE', they don't execute them.
This ground truth prevents a class of misguided “security improvements” that would break the product.
Extraction-Time Accuracy
- id: GT-MAG-022
statement: |
Accuracy fixes must happen at extraction time via production code
(YAML config, handlers, queries), never via validator filters.
rationale: |
Validator filters only run during tests. They can hide extractor bugs
while production responses remain wrong.
This came from painful experience. Agents would patch failing language packs by adding validator-only filters that made the test harness look greener, while the live Maguyva extractor still emitted the wrong edges. The rule forces fixes back into the real path: YAML config, queries, or handlers.
Multi-Tier Filtering
- id: GT-MAG-023
statement: |
Language engine uses three-tier filtering: external_method_patterns
(builtins/stdlib), allowlists (legitimate idioms), and expected_call_targets
(validation-time deduplication). Each tier serves a distinct purpose.
rationale: |
Conflating filter purposes leads to either over-filtering (missing real
relationships) or under-filtering (noise).
This prevents agents from adding filters in the wrong place, a common mistake that caused accuracy regressions.
Integration with the Orchestration System
Ground truths are one layer of a broader context system:
- Architectural Decisions (ADRs) - Record why we chose approach A over B
- Ground Truths - State what is definitively true right now
- Domain Patterns - Describe how to do things correctly
- Anti-Patterns - Describe what to avoid and why
An agent working in the system has access to all four. Ground truths provide the factual anchor, while decisions explain history, patterns guide implementation, and anti-patterns warn of pitfalls.
Measuring Impact
Since introducing ground truths, we have observed:
- Fewer “fix the hallucinated fix” cycles
- More confident agent decision-making when facts are clear
- Better PR reviews because expectations are explicit
- Reduced onboarding time for new agents (and humans)
The investment in maintaining ground truths pays off in reduced debugging and clearer system boundaries.
Getting Started
To add a ground truth to your system:
- Create a
ground_truths.yamlin your package’sai_assets/reference/directory - Define metadata and render configuration
- Add statements following the schema
- Run
uv run orkestra syncto generate documentation - Include the registry in agent context composition
Start with the facts that cause the most confusion or the constraints that get violated most often. Those are your highest-value ground truths.
Conclusion
AI agents will hallucinate. That is their nature. But we can create environments where hallucination is constrained, where certain facts are non-negotiable, where agents can check their assumptions against verified reality.
Ground truths are not a complete solution. They require maintenance. They can become stale. They add overhead to the development process.
But they provide something valuable: a shared vocabulary of facts that both humans and agents can trust. In a world where agents increasingly participate in software development, that shared foundation becomes essential.
The alternative is endless cycles of agents making confident mistakes and humans correcting them. Ground truths break that cycle by making the corrections explicit and durable.
Your agents deserve to know what is true. Tell them.
Related reading
More from the Maguyva build log
Agent Observability: Hooks, Alloy, and Grafana _
We wired Claude Code and Codex into one Grafana stack with OpenTelemetry and Alloy, then used traces and logs to find and fix agent behavior issues at the source.
Mining the Loop: How Changes Become Institutional Memory _
Git commits become structured changelog entries and architectural decision records, then feed back into AI agents as queryable institutional memory.
Skill Mining: From 3,500 Candidates to 466 Capabilities _
We screened 3,500 skill candidates and adopted 466. A systematic mining and ingestion loop for building a coherent AI agent skill library at scale.