$date:January 27, 2026$author:Maguyva Team$read:11 min

Skill Mining: From 3,500 Candidates to 466 Capabilities

[Architecture][Skills]

> We screened 3,500 skill candidates and adopted 466. A systematic mining and ingestion loop for building a coherent AI agent skill library at scale.

Numbers in this post reflect the system at publication (January 2026). See our team page for current figures.

The 3,500 Skill Problem

When we started building an agent orchestration system, we faced an interesting challenge: there are thousands of potential skills scattered across the AI ecosystem. GitHub repositories, vendor documentation, community projects, internal patterns - skills exist everywhere. But which ones matter? Which ones work? And how do you maintain a coherent skill library that agents can actually use?

Our answer: a systematic mining and ingestion loop.

The Numbers Today

Just over three months after Anthropic launched Agent Skills on October 16, 2025, here’s where we stood when this post was published on January 27, 2026:

Metric	Count
Candidates identified	3,500+
Skills adopted	466
Vendor skills	373
Internal skills	93
Active vendors	25+
Average tokens per skill	2,834
Tools referenced	89
Unique tags	635

We’ve reviewed over 3,500 skill candidates. We’ve adopted 466. That’s a 13% adoption rate - and that selectivity is by design.

The Knowledge Tier System

Not all skills are created equal. We organize them into five knowledge layers (K0-K4), each representing a different scope of applicability:

K0: Foundations (Universal)

Skills every agent should have. These represent “good thinker” capabilities that work anywhere.

foundations/
├── test-first-discipline       # TDD: Red-Green-Refactor
├── evidence-based-completion   # Verify before claiming done
├── systematic-debugging        # Root cause methodology
├── structured-planning         # Break work into tasks
└── context-budget-awareness    # Manage token consumption

K0 skills are portable to any project, any domain, any stack. They encode universal cognitive patterns.

K1: Identities (Discipline)

“Good engineer” or “good researcher” skills that apply across projects within a discipline.

identities/
├── research-workflows          # Multi-source research
├── web-extraction-playbook     # Content extraction
├── code-review                 # PR review patterns
└── cli-interface-standards     # CLI design patterns

K2: Domains (Subject Expertise)

“Good database expert” or “good security engineer” skills portable within a field.

domains/
├── schema-migration-workflow   # Safe migration patterns
├── rpc-validation-checklist    # RPC health checks
├── auth-validation-checklist   # JWT/OAuth patterns
└── secrets-audit-checklist     # Credential scanning

K3: Stacks (Technology)

“Good Supabase user” or “good Cloudflare developer” skills for specific technology stacks.

stacks/
├── maguyva-quickstart          # Our semantic search patterns
├── cloudflare-deployment       # Workers/Pages deployment
└── mcp-tool-best-practices     # MCP tool selection

K4: Project (Organization)

Skills specific to our organization and workflows.

project/
├── agent-creation-workflow     # How we build agents
├── skill-authoring-workflow    # How we write skills
├── mining-session-workflow     # This very process
└── vendor-skill-evaluation     # Evaluation rubrics

The Mining Loop

Phase 1: Discovery

Skills come from everywhere:

Vendor Repositories: AWS, Anthropic, Cloudflare, Supabase, and community contributors publish skill collections. We track 25+ vendor roots.

Community Projects: GitHub is full of Claude Code templates, agent patterns, and workflow definitions.

Internal Patterns: As our team solves problems, patterns emerge. These get formalized into skills.

Documentation Mining: Technical documentation often contains implicit skills - procedures, checklists, decision trees.

Discovery is continuous. We use a skill backlog to track candidates before formal evaluation.

Phase 2: Evaluation

Every candidate goes through the same rubric:

adoption_criteria:
  - fills_real_gap: true      # We lack this capability
  - well_structured: true     # Progressive disclosure
  - actively_maintained: true # Commits in last 6 months
  - portable: true            # Not hyper-specific
  - tested: true              # Evidence of usage

All five product criteria must pass. This is why 87% of candidates are rejected.

Then every candidate goes through a separate trust review. We do not treat an official vendor repository, a well-known community maintainer, and a random GitHub repo as equivalent sources of truth.

trust_review:
  vendor_credibility:
    - ownership_verified       # Official vendor, known maintainer, or internal source
    - maintenance_signal       # Recent commits, issue response, release history
    - adoption_signal          # Evidence of real use, stars alone are not enough
    - provenance_clear         # We can trace where the skill came from
  prompt_injection_scan:
    - hidden_instruction_check # Buried "ignore previous instructions" patterns
    - exfiltration_check       # Prompts that try to leak files, secrets, or context
    - authority_check          # Claims of priority over system or developer rules
  script_audit:
    - inspect_scripts          # Read shell/python/js helpers before adoption
    - network_and_exec_review  # curl|bash, remote downloads, subprocess execution
    - file_and_secret_review   # Env vars, credential access, broad file writes
    - destructive_action_check # rm, reset, overwrite, or unsafe automation

Trusted vendors get a lighter provenance review, but not a free pass. Untrusted or unknown sources get a deeper manual audit, and we do not execute bundled scripts until they have been read, scoped, and classified as safe.

Gap Analysis: Before adopting, we search our registry:

uv run orkestra skills search "<capability>"

If we already have it, we don’t need it. If we have something close, we might merge rather than adopt.

Depth and Standards Scoring: We also score how fully a candidate uses the agent-skill model. A lone SKILL.md can still be useful, but deeper skills are more valuable when they separate instructions from references, scripts, and assets in the way agentskills.io encourages.

skill_depth:
  - level_1: SKILL.md only                     # Single instruction file
  - level_2: SKILL.md + strong description     # Clear triggers and scope
  - level_3: adds references/                  # Load docs only when needed
  - level_4: adds atomic scripts/              # Small, reviewable helpers
  - level_5: adds assets/examples/templates    # Full progressive disclosure
depth_signals:
  - standards_adherence        # Structure aligns with agentskills.io conventions
  - reference_quality          # Curated references, not giant context dumps
  - script_atomicity           # Focused helpers, not opaque monoliths
  - tool_boundary_clarity      # Clear limits on what the skill can execute
  - community_signal           # Stars/forks/users help, but only as a weak boost

GitHub stars can elevate a credibility score a bit, but they never rescue a shallow or unsafe skill. A high-star repo with one vague SKILL.md and opaque scripts scores below a smaller repo with a precise description, curated references/, and atomic helpers that actually leverage the full skill capability.

Structural Analysis: We check skill quality:

wc -l vendor/<repo>/<skill>/SKILL.md  # Size check
ls vendor/<repo>/<skill>/scripts/     # Supporting files
ls vendor/<repo>/<skill>/references/  # Bundled docs

If a candidate includes scripts, that review gets more stringent. A good skill is not just useful; it has to be legible, bounded, and safe to hand to an agent. That safety filter alone disqualifies a meaningful slice of otherwise interesting candidates.

Phase 3: Ingestion

When a skill passes evaluation, it enters the registry. But skills are never adopted unchanged. They’re modified to fit our system.

Modifications on Adoption:

Metadata Normalization: Every skill gets our frontmatter schema
K-Tier Assignment: Skills are placed in the appropriate knowledge layer
Tag Enrichment: Tags are added for discovery
Tool Declaration: Allowed tools are explicitly declared
Section Alignment: Content is restructured to match our template

A typical skill YAML after ingestion:

metadata:
  identifier: vendor-skill-evaluation
  name: vendor-skill-evaluation
  description: Systematic evaluation of vendor skills for adoption.
  type: workflow
  layer: K4
  semantic_folder: project
  source: core
  last_updated: '2026-01-17'

frontmatter:
  tags:
    - agents
    - meta
    - skill-adoption
    - vendor
  allowed_tools:
    - Bash
    - Read
    - Write
    - Edit
    - Grep
    - Glob
    - Task

Phase 4: Scope Assignment

Skills are assigned to scopes - categories that determine which agents load which skills:

scopes:
  database:
    primary_skills:
      - domains/schema-migration-workflow
      - domains/rpc-validation-checklist
      - vendor/supabase/supabase-database
      - vendor/supabase/supabase-auth

  research:
    primary_skills:
      - identities/research-workflows
      - identities/web-extraction-playbook
      - identities/dataset-discovery-quickstart

Agents declare their scopes, and skills are automatically assigned:

# Agent definition
scopes: [database, research]
# Gets: all database skills + all research skills + universal skills

Phase 5: Materialization

Skills don’t live as YAML in production. They’re rendered into SKILL.md files that Claude Code can load:

uv run orkestra sync

This command:

Reads all skill YAML definitions
Renders them through Jinja templates
Writes SKILL.md files to .claude/skills/
Organizes by K-tier (foundations/, identities/, domains/, stacks/, project/)

The final output structure:

.claude/skills/
├── foundations/     # K0: Universal
├── identities/      # K1: Discipline
├── domains/         # K2: Subject
├── stacks/          # K3: Technology
├── project/         # K4: Organization
└── vendor/          # External skills

The Dormant Scope Pattern

One of our most powerful patterns is “adopted-but-not-loaded” skills. We call these dormant scopes.

Consider scientific computing skills from the k-dense-scientific repository. We’ve adopted 120+ skills covering bioinformatics, chemistry, quantum computing, and clinical informatics. But most of our agents don’t need molecular docking or gene expression analysis.

Instead of loading all 120 skills into every agent (bloating context windows), we:

Adopt the skills with a specific scope (e.g., bioinformatics)
Keep them dormant - registered but not loaded
Enable them only when an agent declares that scope

# In scopes.yaml - dormant scope
bioinformatics:
  description: "Bioinformatics and genomics"
  primary_skills: []  # Empty - skills exist but aren't loaded

# When an agent needs bioinformatics:
# Agent YAML
scopes: [research, bioinformatics]  # Now loads bioinformatics skills

This pattern lets us have 466 skills in the registry while typical agents load only 40-60 relevant ones.

Skill Types

Skills come in three cognitive patterns:

Workflow

Ordered procedural steps: “1. Do X, 2. Then Y, 3. Finally Z”

type: workflow
# Examples: schema-migration-workflow, mining-session-workflow

Discipline

Behavioral guardrails: “Always X”, “Never Y”, “Prefer Z”

type: discipline
# Examples: test-first-discipline, evidence-based-completion

Checklist

Verification criteria: “Confirm X”, “Verify Y”, “Check Z”

type: checklist
# Examples: auth-validation-checklist, secrets-audit-checklist

Quality Gates

Every skill must pass validation before it ships:

validation:
  file_exists: true           # Skill file at declared path
  frontmatter_valid: true     # Frontmatter parses correctly
  sections_complete: true     # Expected sections present
  tools_registered: true      # Declared tools exist in registry

Descriptions must be 50-400 characters with trigger phrases (“Use when…”, “When you need…”) so Claude Code knows when to suggest them.

We validate continuously:

uv run orkestra validate --show-warnings

The Vendor Ecosystem

Our 373 vendor skills come from:

Provider	Skills	Domain
AWS Agent	19	Cloud services
Anthropic	12	Document generation
Cloudflare	8	Edge computing
Supabase	5	Database
k-dense	100+	Scientific computing
silvainfm	4	Data science
Java Developer Kit	45+	Spring/Java
Vercel	1	Browser automation

Each vendor root is declared in metadata.yaml:

vendor_roots:
  - path: vendor/aws-agent-skills
    provider: aws
  - path: vendor/k-dense-scientific/scientific-skills
    provider: k-dense
  - path: vendor/supabase-skills
    provider: supabase

When orkestra sync runs, vendor skills are symlinked into .claude/skills/vendor/ with their provider namespace.

What We’ve Learned

Selectivity pays off. It’s tempting to adopt everything that looks useful. But each skill costs tokens. With 2,834 average tokens per skill, bloat hurts fast. Our 13% adoption rate keeps agents lean.

Structure enables discovery. The K-tier system isn’t just organization - it’s about portability. K0 skills can be reused anywhere. K4 skills are intentionally project-specific. This clarity helps both humans and agents find what they need.

Dormant scopes scale. You can adopt hundreds of skills without loading them all. Scopes let you build a comprehensive registry while keeping individual agent context windows manageable.

Modification on adoption is essential. Raw vendor skills rarely fit your system. The ingestion process - adding metadata, assigning tiers, enriching tags - makes external skills work internally.

Mining is continuous. The 3,500 number keeps growing. New vendor repos appear. Community patterns emerge. Internal workflows solidify. The loop never stops.

What’s Next

We’re working on several improvements:

Automated gap detection: Alert when common agent failures could be addressed by an unadopted skill
Skill deprecation workflows: Formal process for retiring skills that are superseded or unused
Cross-skill dependencies: Explicit declaration of skill prerequisites
Usage analytics: Track which skills agents actually invoke vs. just load

The skill mining loop is infrastructure. It’s not glamorous. But it’s what makes 41 agents work coherently with 466 capabilities while staying within context limits.

That’s the story of how 3,500 becomes 466. Not by ignoring 3,000 - by evaluating them systematically and adopting only what works.

Want to see the skill system in action? Check out uv run orkestra skills list to explore our current registry.

More from the Maguyva build log

Jun 2026

Why We Upgraded Code Search to voyage-4-large_

We moved our code embeddings to voyage-4-large — currently top of the public RTEB code retrieval leaderboard. The honest version: the trade we make, what we actually index, and why we pay for premium embeddings.

[Embeddings][Search][Architecture]

May 2026

Language Recursive Self-Improvement: Grinding Code Intelligence Across ~280 Languages_

We support code intelligence for ~280 languages. No human can hand-audit that. So we built a language recursive self-improvement loop — spot-check, LLM-as-judge, fix one thing, re-validate — and run it with a fleet of isolated agents until extraction is actually right, not just green.

[Architecture][Languages][Agents]

Apr 2026

Multi-Modal Fusion Search: Picking the Right Retriever For Every Query_

A query like 'where is parseConfig defined' wants a different search than 'how does auth work'. Maguyva classifies the intent, weights four retrieval modalities accordingly, and fuses the results with weighted Reciprocal Rank Fusion.

[Search][Architecture]