$date:January 6, 2026$author:Maguyva Team$read:8 min

Orkestra: Orchestrating AI Agents at Scale

[Architecture][Open Source]

> One orchestrator routes work to specialist AI agents, each with distinct skills and memory. How Orkestra coordinates 46 agents and 466 skills in production.

Numbers in this post reflect the system at publication (January 2026). See our team page for current figures.

When we started building with Claude Code, we ran into a problem that every team using AI coding assistants eventually faces: a single agent cannot do everything well.

You can prompt an agent to be a database specialist. Or a security auditor. Or a frontend engineer. But the moment you ask it to be all three at once, quality suffers. Context gets diluted. Instructions conflict. The agent becomes a generalist that is mediocre at everything.

So we built Orkestra.

What is Orkestra?

Orkestra is an agent orchestration system for Claude Code and similar AI coding tools. It coordinates multiple specialized agents, each with distinct expertise, under a single orchestrator that routes work to the right specialist.

Think of it as a staffing agency for AI agents. The orchestrator receives a task, identifies which specialist should handle it, and delegates with the right context. When the work is done, results flow back to the orchestrator for synthesis.

The numbers tell the story:

Component	Count
Specialist agents	46
Reusable skills	466
Identity archetypes	27
Mindsets	11
Communication styles	10
Knowledge domains	21

The Character System: D&D for Agents

The core insight behind Orkestra is that agent behavior emerges from three composable primitives:

Identity defines what the agent is. An architect designs system structures. A debugger traces failures to root causes. A guardian enforces compliance and security boundaries. We have 27 identity archetypes that can be mixed.

Mindset defines how the agent thinks. An analytical mindset grounds assertions in evidence and quantifies uncertainty. A skeptical mindset questions assumptions and seeks disconfirming evidence. An exploratory mindset embraces ambiguity and tries multiple approaches.

Style defines how the agent communicates. A technical style includes exact values and references specific files. A concise style cuts fluff and leads with the answer. A diplomatic style balances honesty with tact.

An agent combines these primitives:

# architecture-advisor.yaml
identity:
  - knowledge-architect
  - architect
  - strategist
mindset: analytical
style: concise

This composition creates an agent that designs systems (architect), connects knowledge across domains (knowledge-architect), sets strategic direction (strategist), thinks in evidence and data (analytical), and communicates without fluff (concise).

The power is in combinatorial explosion. 27 identities times 11 mindsets times 10 styles yields nearly 3,000 possible agent personalities. But you only define the combinations that matter for your work.

Skills: Reusable Capability Modules

Skills are the knowledge and workflows that agents can invoke. They follow a tiered system based on scope:

Tier	Name	Scope	Example
K0	Foundations	Universal methodology	Test-first discipline, evidence-based completion
K1	Identities	Role-based workflows	CLI interface standards, performance playbook
K2	Domains	Domain-specific knowledge	Database migration patterns, authentication validation
K3	Stacks	Technology-specific	Cloudflare deployment, Supabase operations
K4	Project	This codebase only	Project-specific workflows and conventions

Skills are lazy-loaded. An agent sees skill names and descriptions at startup, but full skill content only loads when triggered. This keeps context lean while making hundreds of skills discoverable.

Each skill includes:

Clear trigger conditions (“Use when migrating database schemas”)
Step-by-step guidance
Allowed tools for the workflow
Success criteria and failure recovery paths

The 466 skills in our registry cover everything from git worktree isolation to web research workflows to deployment health validation.

Why Orchestration Matters

Single-agent architectures hit walls quickly:

Context dilution. A 200k token context window sounds large until you load database schemas, API docs, test fixtures, and domain knowledge. Specialists can work with targeted context.

Instruction conflicts. Telling an agent to “be thorough but fast” and “verify everything but don’t over-engineer” creates tension. Specialists resolve these by having clear scope.

Expertise depth. A generalist agent knows a little about everything. A specialist agent, composed with the right identity and skills, knows its domain deeply.

Orkestra implements flat orchestration: one orchestrator coordinates multiple specialists. Specialists cannot spawn sub-specialists. This prevents complexity explosion while enabling parallel work.

The orchestrator has access to 2.2 million tokens of effective capacity: its own 200k window plus 10 concurrent subagents with 200k each. Work that would exhaust a single agent runs comfortably across the fleet.

The Rendering Pipeline

Agent definitions live in YAML. Claude Code reads Markdown. Orkestra bridges this gap with a deterministic rendering pipeline:

YAML Registries → Jinja Templates → .claude/agents/*.md

Operators edit YAML source. Run orkestra sync. Rendered Markdown appears in .claude/agents/. Claude Code picks up the changes.

This separation serves different audiences:

YAML source includes lifecycle metadata, tags, validation rules, and deprecation notes for tooling
Rendered Markdown includes only what the model needs: description, tools, skills, and behavioral guidance

The pipeline composes identities, mindsets, styles, and skills into a single coherent prompt. An architect-analytical-concise agent gets a very different system prompt than a debugger-skeptical-technical agent, even if they share some underlying skills.

Domain Knowledge: The Four-File Pattern

Every knowledge domain follows a consistent structure:

domain-name/
  decisions.md      # Key choices, rationale, consequences
  patterns.md       # Step-by-step guidance and examples
  anti-patterns.md  # Failure modes and remediation
  evolution.md      # Dated log of changes

This structure serves agent context loading. An agent working on authentication loads authentication/patterns.md for guidance and authentication/anti-patterns.md to avoid known pitfalls. The files are sized for efficient context loading: focused enough to be useful, comprehensive enough to be authoritative.

We maintain 21 top-level domains including analytics, authentication, data science, infrastructure, machine learning, performance, security, and more. Each domain can have sub-domains for finer granularity.

Values: The Operating System

All agents share a base layer of values that define how they operate:

Simplicity first. Use the simplest solution that works. Add complexity only when justified.

Fix root causes. Never patch around failures. If a pipeline fails, debug the pipeline. If a test fails, fix the code or the test.

Evidence-based. Label claims as “verified” (with benchmarks) or “estimated” (with assumptions). Pattern detected does not equal problem confirmed.

Context economics. MCP tools cost 0.1% of context. File reads cost 2% each. Apply domain expertise before exploring code.

These values propagate to every specialist through the rendering pipeline. An agent cannot bypass them by composition.

CLI: The Control Plane

Orkestra ships with a CLI for managing the agent ecosystem:

# Discovery
orkestra agents search "database"
orkestra agents info database-architect

# Validation
orkestra validate --show-warnings

# Rendering
orkestra sync --dry-run
orkestra sync

# Skills
orkestra skills list
orkestra skills info schema-migration-workflow

# Decisions
orkestra decisions search "authentication"

The CLI is the source of truth for what agents exist, what skills they have, and whether the system is healthy. It runs validation before sync to catch problems early.

Open Source Considerations

We built Orkestra to solve our own problem: coordinating AI agents at scale for a complex codebase. The patterns we discovered are not specific to our domain.

The character composition system (identity + mindset + style) applies to any team defining agent personalities.

The skill tier system (K0-K4) provides a mental model for organizing reusable capabilities by scope.

The rendering pipeline pattern (YAML source + templates + generated artifacts) separates concerns between tooling and model consumption.

The flat orchestration model (one coordinator, many specialists) avoids complexity while enabling parallelism.

Whether Orkestra becomes open source depends on whether these patterns have value for others building with Claude Code. If you are hitting the walls we described, the architecture might help.

What We Learned

Building Orkestra taught us that orchestration is not about making agents smarter. It is about making them more focused.

A single agent with perfect instructions still runs out of context. A single agent with all the skills still gets confused about which to apply. A single agent trying to be everything produces mediocre results everywhere.

Forty specialists, each excellent in their domain, coordinated by an orchestrator that knows when to delegate: that is how we ship.

The numbers matter less than the architecture. You might need five agents or fifty. The principle remains: composition over capability, specialization over generalization, coordination over individual heroics.

Orkestra powers the agent ecosystem behind Maguyva, our code intelligence platform. Want to learn more? Reach out to the team.

More from the Maguyva build log

Jun 2026

Why We Upgraded Code Search to voyage-4-large_

We moved our code embeddings to voyage-4-large — currently top of the public RTEB code retrieval leaderboard. The honest version: the trade we make, what we actually index, and why we pay for premium embeddings.

[Embeddings][Search][Architecture]

May 2026

Language Recursive Self-Improvement: Grinding Code Intelligence Across ~280 Languages_

We support code intelligence for ~280 languages. No human can hand-audit that. So we built a language recursive self-improvement loop — spot-check, LLM-as-judge, fix one thing, re-validate — and run it with a fleet of isolated agents until extraction is actually right, not just green.

[Architecture][Languages][Agents]

Apr 2026

Multi-Modal Fusion Search: Picking the Right Retriever For Every Query_

A query like 'where is parseConfig defined' wants a different search than 'how does auth work'. Maguyva classifies the intent, weights four retrieval modalities accordingly, and fuses the results with weighted Reciprocal Rank Fusion.

[Search][Architecture]