Claude Academy

Thinking in Systems

Building one agent is straightforward. Building a system of agents that work together reliably is an engineering challenge. This lesson covers the architecture patterns, communication protocols, and error handling strategies that make multi-agent systems work.

The Decision Framework

Before reaching for agents, ask:

Do I need an agent, a skill, or a command?

Is it a recurring specialized ROLE with its own context needs? → Agent (security reviewer, test writer) Is it specialized KNOWLEDGE that enhances the current session? → Skill (migration patterns, API conventions) Is it a repeatable TASK sequence? → Command (deploy checklist, code review) Is it a one-off request?

→ Just ask Claude directly

How many agents do I need?

Task touches 1-3 files → Single agent (or no agent — just ask Claude) Task touches 5-15 files independently → /batch with parallel agents Task has 2-3 distinct concerns → 2-3 specialist agents Task needs planning + implementation → Supervisor + worker agents

Task is a full project → Agent team

The rule of thumb: use the minimum number of agents that achieves the goal. Each agent adds complexity and token cost.

Architecture Patterns

Pattern 1: Specialist Pool

Multiple independent specialists, each handling their domain:

┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Security │ │ Performance │ │ Testing │ │ Reviewer │ │ Analyst │ │ Writer │ │ (Opus) │ │ (Sonnet) │ │ (Sonnet) │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────┴──────────────────┘ │

Your Session

Each specialist works independently on its concern. Results are aggregated in your session. No coordination between specialists needed.

Best for: Multi-dimensional code review, where each dimension is independent.

Pattern 2: Supervisor + Workers

A lead agent coordinates worker agents:

┌─────────────────────┐ │ Supervisor (Opus) │ │ Plans, assigns, │ │ reviews, integrates│ └──────────┬──────────┘ │ ┌──────┼──────┐ ▼ ▼ ▼ ┌──────┐┌──────┐┌──────┐ │Work 1││Work 2││Work 3│ (Sonnet)

└──────┘└──────┘└──────┘

The supervisor decomposes the task, assigns subtasks, reviews results, and integrates. Workers implement their assigned subtask.

Best for: Large tasks with interdependencies that need coordinated planning.

Pattern 3: Pipeline

Agents process work sequentially, each adding value:

Code → [Implementer] → [Reviewer] → [Tester] → [Documenter] → Done
       (Write code)    (Review it)   (Test it)   (Document it)

Each agent's output is the next agent's input. The pipeline ensures every change goes through a consistent quality process.

Best for: Workflows with sequential quality gates.

Pattern 4: Mapper + Reducer

Distribute work across many agents, then aggregate:

[Mapper] → Agent 1 → Result 1 ─┐ → Agent 2 → Result 2 ──┤→ [Reducer] → Final Report → Agent 3 → Result 3 ──┤

→ Agent N → Result N ──┘

The mapper distributes (each agent analyzes one module). The reducer aggregates (combines all analyses into a single report).

Best for: Codebase-wide analysis where each module is independent.

Communication Protocols

Direct Messages

Agent-to-agent communication for specific coordination:

Security Agent → Lead: "Found a critical auth bypass in payment.ts. This blocks the performance optimization

because the fix changes the API contract."

Broadcasts

Messages to all agents for shared context:

Lead → All: "The database schema has changed. All agents working

on repository files need to regenerate their Prisma client."

Status Reports

Periodic updates from workers to supervisor:

Worker 1 → Lead: "Status: 3/5 files converted. On track." Worker 2 → Lead: "Status: Blocked. Type error in shared utility."

Worker 3 → Lead: "Status: Complete. All tests passing."

Error Handling Strategies

Retry

For transient failures:

# Agent configuration maxTurns: 25 # In the agent's instructions: If a tool use fails, retry once. If it fails again, report the error

to the lead agent instead of retrying indefinitely.

Escalation

When an agent can't solve the problem:

Worker: "I can't fix the type error in shared-utils.ts because it requires changes to the interface that other workers are using." Lead: "Understood. I'll coordinate the interface change with

all affected workers and provide the updated interface."

Graceful Degradation

When part of the system fails, the rest continues:

Batch of 10 agents: Agent 1: ✓ Success Agent 2: ✓ Success Agent 3: ✗ Failed (test timeout) Agent 4: ✓ Success ...

Result: 9 successful, 1 failed. Failed unit queued for retry.

Circuit Breaker

If too many failures occur, stop and report:

If more than 3 out of 10 agents fail: → Stop remaining agents → Report to human: "30% failure rate. Likely a systemic issue

(shared dependency, infrastructure problem). Human review needed."

Isolation Boundaries

Every agent needs clear boundaries:

File Boundaries

# Agent A: only touches auth files permissionMode: acceptEdits You may ONLY read and write files in src/services/auth/ and tests/auth/.

Do not touch any other directory.

Tool Boundaries

{
  "permissions": {
    "allow": ["Read(src/auth/)", "Write(src/auth/)", "Bash(pnpm test auth)"],
    "deny": ["Write(src/payments/*)", "Bash(pnpm deploy )"]
  }
}

Context Boundaries

Each agent gets only the context it needs:

Security agent: gets security-related files and instructions
Performance agent: gets performance-related files and metrics
Test agent: gets source files and test conventions

Don't give every agent the full codebase context.

Design Principles

1. Start simple: One agent first. Add more only when complexity demands it.

2. Clear boundaries: Each agent knows exactly what it can and can't do.

3. Minimal communication: Agents that need to communicate constantly should probably be one agent.

4. Fail gracefully: Plan for agent failures from the start.

5. Human in the loop: Keep a human checkpoint for critical decisions.

6. Match model to role: Opus for reasoning, Sonnet for implementation, Haiku for exploration.

7. Isolate experiments: Use worktrees for agents making changes.

Key Takeaway

Agentic systems succeed through clear isolation boundaries, appropriate pattern selection (specialist pool, supervisor+workers, pipeline, or mapper+reducer), and robust error handling (retry, escalate, degrade gracefully). Before adding agents, ask: do I need an agent, a skill, or a command? Use the minimum number of agents that achieves the goal. Each agent should have well-defined scope, restricted tools, and clear communication interfaces. Start simple, add complexity only when justified.

Designing Agentic Systems

Learning Objectives