Orkestra: Orchestrating AI Agents at Scale
> One orchestrator routes work to specialist AI agents, each with distinct skills and memory. How Orkestra coordinates 46 agents and 466 skills in production.
Numbers in this post reflect the system at publication (January 2026). See our team page for current figures.
When we started building with Claude Code, we ran into a problem that every team using AI coding assistants eventually faces: a single agent cannot do everything well.
You can prompt an agent to be a database specialist. Or a security auditor. Or a frontend engineer. But the moment you ask it to be all three at once, quality suffers. Context gets diluted. Instructions conflict. The agent becomes a generalist that is mediocre at everything.
So we built Orkestra.
What is Orkestra?
Orkestra is an agent orchestration system for Claude Code and similar AI coding tools. It coordinates multiple specialized agents, each with distinct expertise, under a single orchestrator that routes work to the right specialist.
Think of it as a staffing agency for AI agents. The orchestrator receives a task, identifies which specialist should handle it, and delegates with the right context. When the work is done, results flow back to the orchestrator for synthesis.
The numbers tell the story:
| Component | Count |
|---|---|
| Specialist agents | 46 |
| Reusable skills | 466 |
| Identity archetypes | 27 |
| Mindsets | 11 |
| Communication styles | 10 |
| Knowledge domains | 21 |
The Character System: D&D for Agents
The core insight behind Orkestra is that agent behavior emerges from three composable primitives:
Identity defines what the agent is. An architect designs system structures. A debugger traces failures to root causes. A guardian enforces compliance and security boundaries. We have 27 identity archetypes that can be mixed.
Mindset defines how the agent thinks. An analytical mindset grounds assertions in evidence and quantifies uncertainty. A skeptical mindset questions assumptions and seeks disconfirming evidence. An exploratory mindset embraces ambiguity and tries multiple approaches.
Style defines how the agent communicates. A technical style includes exact values and references specific files. A concise style cuts fluff and leads with the answer. A diplomatic style balances honesty with tact.
An agent combines these primitives:
# architecture-advisor.yaml
identity:
- knowledge-architect
- architect
- strategist
mindset: analytical
style: concise
This composition creates an agent that designs systems (architect), connects knowledge across domains (knowledge-architect), sets strategic direction (strategist), thinks in evidence and data (analytical), and communicates without fluff (concise).
The power is in combinatorial explosion. 27 identities times 11 mindsets times 10 styles yields nearly 3,000 possible agent personalities. But you only define the combinations that matter for your work.
Skills: Reusable Capability Modules
Skills are the knowledge and workflows that agents can invoke. They follow a tiered system based on scope:
| Tier | Name | Scope | Example |
|---|---|---|---|
| K0 | Foundations | Universal methodology | Test-first discipline, evidence-based completion |
| K1 | Identities | Role-based workflows | CLI interface standards, performance playbook |
| K2 | Domains | Domain-specific knowledge | Database migration patterns, authentication validation |
| K3 | Stacks | Technology-specific | Cloudflare deployment, Supabase operations |
| K4 | Project | This codebase only | Project-specific workflows and conventions |
Skills are lazy-loaded. An agent sees skill names and descriptions at startup, but full skill content only loads when triggered. This keeps context lean while making hundreds of skills discoverable.
Each skill includes:
- Clear trigger conditions (“Use when migrating database schemas”)
- Step-by-step guidance
- Allowed tools for the workflow
- Success criteria and failure recovery paths
The 466 skills in our registry cover everything from git worktree isolation to web research workflows to deployment health validation.
Why Orchestration Matters
Single-agent architectures hit walls quickly:
Context dilution. A 200k token context window sounds large until you load database schemas, API docs, test fixtures, and domain knowledge. Specialists can work with targeted context.
Instruction conflicts. Telling an agent to “be thorough but fast” and “verify everything but don’t over-engineer” creates tension. Specialists resolve these by having clear scope.
Expertise depth. A generalist agent knows a little about everything. A specialist agent, composed with the right identity and skills, knows its domain deeply.
Orkestra implements flat orchestration: one orchestrator coordinates multiple specialists. Specialists cannot spawn sub-specialists. This prevents complexity explosion while enabling parallel work.
The orchestrator has access to 2.2 million tokens of effective capacity: its own 200k window plus 10 concurrent subagents with 200k each. Work that would exhaust a single agent runs comfortably across the fleet.
The Rendering Pipeline
Agent definitions live in YAML. Claude Code reads Markdown. Orkestra bridges this gap with a deterministic rendering pipeline:
YAML Registries → Jinja Templates → .claude/agents/*.md
Operators edit YAML source. Run orkestra sync. Rendered Markdown appears in .claude/agents/. Claude Code picks up the changes.
This separation serves different audiences:
- YAML source includes lifecycle metadata, tags, validation rules, and deprecation notes for tooling
- Rendered Markdown includes only what the model needs: description, tools, skills, and behavioral guidance
The pipeline composes identities, mindsets, styles, and skills into a single coherent prompt. An architect-analytical-concise agent gets a very different system prompt than a debugger-skeptical-technical agent, even if they share some underlying skills.
Domain Knowledge: The Four-File Pattern
Every knowledge domain follows a consistent structure:
domain-name/
decisions.md # Key choices, rationale, consequences
patterns.md # Step-by-step guidance and examples
anti-patterns.md # Failure modes and remediation
evolution.md # Dated log of changes
This structure serves agent context loading. An agent working on authentication loads authentication/patterns.md for guidance and authentication/anti-patterns.md to avoid known pitfalls. The files are sized for efficient context loading: focused enough to be useful, comprehensive enough to be authoritative.
We maintain 21 top-level domains including analytics, authentication, data science, infrastructure, machine learning, performance, security, and more. Each domain can have sub-domains for finer granularity.
Values: The Operating System
All agents share a base layer of values that define how they operate:
Simplicity first. Use the simplest solution that works. Add complexity only when justified.
Fix root causes. Never patch around failures. If a pipeline fails, debug the pipeline. If a test fails, fix the code or the test.
Evidence-based. Label claims as “verified” (with benchmarks) or “estimated” (with assumptions). Pattern detected does not equal problem confirmed.
Context economics. MCP tools cost 0.1% of context. File reads cost 2% each. Apply domain expertise before exploring code.
These values propagate to every specialist through the rendering pipeline. An agent cannot bypass them by composition.
CLI: The Control Plane
Orkestra ships with a CLI for managing the agent ecosystem:
# Discovery
orkestra agents search "database"
orkestra agents info database-architect
# Validation
orkestra validate --show-warnings
# Rendering
orkestra sync --dry-run
orkestra sync
# Skills
orkestra skills list
orkestra skills info schema-migration-workflow
# Decisions
orkestra decisions search "authentication"
The CLI is the source of truth for what agents exist, what skills they have, and whether the system is healthy. It runs validation before sync to catch problems early.
Open Source Considerations
We built Orkestra to solve our own problem: coordinating AI agents at scale for a complex codebase. The patterns we discovered are not specific to our domain.
The character composition system (identity + mindset + style) applies to any team defining agent personalities.
The skill tier system (K0-K4) provides a mental model for organizing reusable capabilities by scope.
The rendering pipeline pattern (YAML source + templates + generated artifacts) separates concerns between tooling and model consumption.
The flat orchestration model (one coordinator, many specialists) avoids complexity while enabling parallelism.
Whether Orkestra becomes open source depends on whether these patterns have value for others building with Claude Code. If you are hitting the walls we described, the architecture might help.
What We Learned
Building Orkestra taught us that orchestration is not about making agents smarter. It is about making them more focused.
A single agent with perfect instructions still runs out of context. A single agent with all the skills still gets confused about which to apply. A single agent trying to be everything produces mediocre results everywhere.
Forty specialists, each excellent in their domain, coordinated by an orchestrator that knows when to delegate: that is how we ship.
The numbers matter less than the architecture. You might need five agents or fifty. The principle remains: composition over capability, specialization over generalization, coordination over individual heroics.
Orkestra powers the agent ecosystem behind Maguyva, our code intelligence platform. Want to learn more? Reach out to the team.
Related reading
More from the Maguyva build log
Agent Observability: Hooks, Alloy, and Grafana _
We wired Claude Code and Codex into one Grafana stack with OpenTelemetry and Alloy, then used traces and logs to find and fix agent behavior issues at the source.
Mining the Loop: How Changes Become Institutional Memory _
Git commits become structured changelog entries and architectural decision records, then feed back into AI agents as queryable institutional memory.
Skill Mining: From 3,500 Candidates to 466 Capabilities _
We screened 3,500 skill candidates and adopted 466. A systematic mining and ingestion loop for building a coherent AI agent skill library at scale.