Learn/Phase 4/Multi-Agent Systems

Multi-Agent Systems

Ch 10 · Multi-Agent Systems 55 min

Role boundariesSequential handoffCritic-revision loop

Hands-on:MESHFLOW_MOCK=1 python3 hands_on/05_multi_agent_team.py

Lesson 10: Multi-Agent Systems

Lesson Goal

This lesson teaches how to design multi-agent systems in a starter-friendly way. A multi-agent system uses more than one agent to solve a task. Each agent should have a clear role, limited tools, clear inputs, and explicit outputs.

The main lesson:

Multiple agents can improve work only when their responsibilities are clear.
Without structure, they create more confusion, cost, and risk.

1. What Is A Multi-Agent System?

A multi-agent system is an AI application where two or more agents collaborate, debate, divide work, review each other, or hand off tasks.

Example:

researcher_agent -> writer_agent -> reviewer_agent -> approval_gate

Each agent uses an LLM, but each has a different responsibility.

2. When Multiple Agents Help

Use multiple agents when different responsibilities benefit from separation.

Good reasons:

Research and writing require different behavior.
One agent should critique another agent's output.
Different tools should be isolated by role.
Parallel work can save time.
A specialist role improves quality.
You need debate before a decision.

Weak reasons:

"More agents sounds advanced."
The same agent role is copied three times.
Agents chat without producing artifacts.
No one knows when the conversation should stop.
Every agent has every tool.

Beginner rule: add a second agent only when you can explain what it does that the first agent should not do.

3. Common Multi-Agent Roles

Role	Responsibility	Typical Tools
Planner	Breaks goal into steps	None or read-only project context
Researcher	Finds and summarizes evidence	Search, retrieval, file read
Analyst	Compares options and tradeoffs	Calculator, data query
Writer	Drafts final content	Usually no external tools
Reviewer	Checks quality and policy	Rubric, validator
Safety guard	Looks for risk	Policy checker
Executor	Performs approved actions	Restricted action tools

Keep risky tools away from exploratory agents.

4. Coordination Patterns

Sequential Handoff

One agent produces an artifact for the next.

researcher -> writer -> reviewer

Best for beginner workflows because it is easy to inspect.

Parallel Specialists

Several agents work independently, then a merge step combines results.

technical_researcher
market_researcher
risk_researcher
  -> synthesizer

Best when subtopics are independent.

Debate

Agents argue different positions before a judge decides.

pro_agent -> con_agent -> judge_agent

Useful for decisions, but needs a strict stopping rule.

Manager-Worker

A manager agent delegates tasks to worker agents.

manager -> researcher
manager -> writer
manager -> reviewer

Powerful, but harder to debug. Use after you understand sequential handoff.

Critic-Revision Loop

A critic reviews a draft, then a writer revises.

writer -> critic -> writer -> approval

Always set a maximum number of revisions.

5. Context Boundaries

Not every agent needs the same context.

Researcher context:

Goal.
Research questions.
Search tool instructions.
Source limits.

Writer context:

Goal.
Evidence summary.
Required style.
Output format.

Reviewer context:

Goal.
Draft answer.
Quality rubric.
Policy requirements.

This separation reduces confusion and limits unnecessary exposure to data.

6. Memory Boundaries

Multi-agent memory needs clear rules.

Decide:

Which agents can read memory?
Which agents can write memory?
Which memories are shared?
Which memories are private to one agent?
Which memories need human approval before storing?

Simple design:

shared project memory: approved facts and decisions
agent scratchpad: temporary, not stored
final memory write: only after approval

Do not store every agent message as long-term memory. Store useful approved summaries.

7. Tool Boundaries

Tools should match roles.

Example:

Agent	Allowed Tools	Blocked Tools
Researcher	search, read docs	send email, publish
Writer	style guide lookup	database write
Reviewer	rubric scorer, policy checker	publish
Executor	publish action	search everything

This is least-privilege design for agents.

8. Artifacts In Multi-Agent Systems

Agents should communicate through artifacts, not only hidden chat.

Examples:

research_brief
technical_notes
risk_notes
draft_answer
review_findings
revision_plan
approval_record

Artifacts make collaboration inspectable.

9. Stopping Rules

Multi-agent systems need strong stopping rules because conversations can drift.

Examples:

Stop after each agent produces its required artifact.
Stop after 3 debate rounds.
Stop when reviewer score is at least 0.8.
Stop when budget is reached.
Stop when a human rejects the output.
Stop when required evidence is missing.

No stop rule means no production-ready agent system.

10. Safety And Governance

Add governance wherever agents can:

Call tools.
Use private data.
Produce external messages.
Make recommendations with real consequences.
Trigger actions.
Store memory.
Spend money.

Useful controls:

Tool schemas.
Input validation.
Role-specific permissions.
Gates.
Rate limits.
Cost budgets.
Trace logs.
Human review.

11. Hands-On: Multi-Agent Debate Example

Run:

python3 -m src.mini_meshflow run examples/07_multi_agent_debate.json

Then inspect:

What each agent produces.
Whether the debate has a clear final artifact.
Whether the workflow has a stopping point.
What you would add before using it in production.

12. Design Exercise

Design a multi-agent lesson-builder:

planner_agent
  -> researcher_agent
  -> writer_agent
  -> reviewer_agent
  -> approval_gate
  -> final_lesson

For each agent, define:

Role.
Goal.
Allowed tools.
Input artifacts.
Output artifact.
Stop condition.
Failure behavior.

13. Common Beginner Mistakes

Mistake 1: Too many agents.

Correction: Start with two or three roles.

Mistake 2: Every agent sees everything.

Correction: Give each agent only the context it needs.

Mistake 3: Every agent has every tool.

Correction: Match tools to roles.

Mistake 4: Agents talk but do not produce artifacts.

Correction: Require named outputs.

Mistake 5: No final authority.

Correction: Add a judge, reviewer, gate, or workflow rule.

Mistake 6: No stopping rule.

Correction: Add turn limits, artifact requirements, and budget limits.

14. Multi-Agent Design Checklist

Before building, answer:

Why do we need more than one agent?
What is each agent's role?
What artifact does each agent produce?
Which tools can each agent use?
What context does each agent receive?
What memory can each agent read or write?
How do agents coordinate?
What happens if agents disagree?
What stops the system?
What is logged in the trace?
What requires human approval?

15. Summary

Multi-agent systems are powerful when they divide responsibility clearly. The safe beginner pattern is:

specialized roles + limited tools + explicit artifacts + gates + traces

Do not measure sophistication by the number of agents. Measure it by clarity, control, and quality.

Exercises

Exercise 1: Run The Debate

python3 -m src.mini_meshflow run examples/07_multi_agent_debate.json

Write down each agent-like step and the artifact it produces.

Exercise 2: Role Boundaries

Design three agents for a training-course builder:

Researcher.
Writer.
Reviewer.

For each, list allowed tools and blocked tools.

Exercise 3: Context Boundary

Write the exact context each agent should receive. Keep each context package short and role-specific.

Exercise 4: Add A Gate

Choose one action that should require approval. Explain what artifact the gate checks and what happens if approval is denied.

Ch 07 — Debugging, Traces, and Observability

Ch 09 — Framework Study and Patterns

Learn/Phase 4/Multi-Agent Systems

Multi-Agent Systems

Ch 10 · Multi-Agent Systems 55 min

Role boundariesSequential handoffCritic-revision loop

Hands-on:MESHFLOW_MOCK=1 python3 hands_on/05_multi_agent_team.py

Lesson 10: Multi-Agent Systems

Lesson Goal

The main lesson:

Multiple agents can improve work only when their responsibilities are clear.
Without structure, they create more confusion, cost, and risk.

1. What Is A Multi-Agent System?

A multi-agent system is an AI application where two or more agents collaborate, debate, divide work, review each other, or hand off tasks.

Example:

researcher_agent -> writer_agent -> reviewer_agent -> approval_gate

Each agent uses an LLM, but each has a different responsibility.

2. When Multiple Agents Help

Use multiple agents when different responsibilities benefit from separation.

Good reasons:

Research and writing require different behavior.
One agent should critique another agent's output.
Different tools should be isolated by role.
Parallel work can save time.
A specialist role improves quality.
You need debate before a decision.

Weak reasons:

"More agents sounds advanced."
The same agent role is copied three times.
Agents chat without producing artifacts.
No one knows when the conversation should stop.
Every agent has every tool.

Beginner rule: add a second agent only when you can explain what it does that the first agent should not do.

3. Common Multi-Agent Roles

Role	Responsibility	Typical Tools
Planner	Breaks goal into steps	None or read-only project context
Researcher	Finds and summarizes evidence	Search, retrieval, file read
Analyst	Compares options and tradeoffs	Calculator, data query
Writer	Drafts final content	Usually no external tools
Reviewer	Checks quality and policy	Rubric, validator
Safety guard	Looks for risk	Policy checker
Executor	Performs approved actions	Restricted action tools

Keep risky tools away from exploratory agents.

4. Coordination Patterns

Sequential Handoff

One agent produces an artifact for the next.

researcher -> writer -> reviewer

Best for beginner workflows because it is easy to inspect.

Parallel Specialists

Several agents work independently, then a merge step combines results.

technical_researcher
market_researcher
risk_researcher
  -> synthesizer

Best when subtopics are independent.

Debate

Agents argue different positions before a judge decides.

pro_agent -> con_agent -> judge_agent

Useful for decisions, but needs a strict stopping rule.

Manager-Worker

A manager agent delegates tasks to worker agents.

manager -> researcher
manager -> writer
manager -> reviewer

Powerful, but harder to debug. Use after you understand sequential handoff.

Critic-Revision Loop

A critic reviews a draft, then a writer revises.

writer -> critic -> writer -> approval

Always set a maximum number of revisions.

5. Context Boundaries

Not every agent needs the same context.

Researcher context:

Goal.
Research questions.
Search tool instructions.
Source limits.

Writer context:

Goal.
Evidence summary.
Required style.
Output format.

Reviewer context:

Goal.
Draft answer.
Quality rubric.
Policy requirements.

This separation reduces confusion and limits unnecessary exposure to data.

6. Memory Boundaries

Multi-agent memory needs clear rules.

Decide:

Which agents can read memory?
Which agents can write memory?
Which memories are shared?
Which memories are private to one agent?
Which memories need human approval before storing?

Simple design:

shared project memory: approved facts and decisions
agent scratchpad: temporary, not stored
final memory write: only after approval

Do not store every agent message as long-term memory. Store useful approved summaries.

7. Tool Boundaries

Tools should match roles.

Example:

Agent	Allowed Tools	Blocked Tools
Researcher	search, read docs	send email, publish
Writer	style guide lookup	database write
Reviewer	rubric scorer, policy checker	publish
Executor	publish action	search everything

This is least-privilege design for agents.

8. Artifacts In Multi-Agent Systems

Agents should communicate through artifacts, not only hidden chat.

Examples:

research_brief
technical_notes
risk_notes
draft_answer
review_findings
revision_plan
approval_record

Artifacts make collaboration inspectable.

9. Stopping Rules

Multi-agent systems need strong stopping rules because conversations can drift.

Examples:

Stop after each agent produces its required artifact.
Stop after 3 debate rounds.
Stop when reviewer score is at least 0.8.
Stop when budget is reached.
Stop when a human rejects the output.
Stop when required evidence is missing.

No stop rule means no production-ready agent system.

10. Safety And Governance

Add governance wherever agents can:

Call tools.
Use private data.
Produce external messages.
Make recommendations with real consequences.
Trigger actions.
Store memory.
Spend money.

Useful controls:

Tool schemas.
Input validation.
Role-specific permissions.
Gates.
Rate limits.
Cost budgets.
Trace logs.
Human review.

11. Hands-On: Multi-Agent Debate Example

Run:

python3 -m src.mini_meshflow run examples/07_multi_agent_debate.json

Then inspect:

What each agent produces.
Whether the debate has a clear final artifact.
Whether the workflow has a stopping point.
What you would add before using it in production.

12. Design Exercise

Design a multi-agent lesson-builder:

planner_agent
  -> researcher_agent
  -> writer_agent
  -> reviewer_agent
  -> approval_gate
  -> final_lesson

For each agent, define:

Role.
Goal.
Allowed tools.
Input artifacts.
Output artifact.
Stop condition.
Failure behavior.

13. Common Beginner Mistakes

Mistake 1: Too many agents.

Correction: Start with two or three roles.

Mistake 2: Every agent sees everything.

Correction: Give each agent only the context it needs.

Mistake 3: Every agent has every tool.

Correction: Match tools to roles.

Mistake 4: Agents talk but do not produce artifacts.

Correction: Require named outputs.

Mistake 5: No final authority.

Correction: Add a judge, reviewer, gate, or workflow rule.

Mistake 6: No stopping rule.

Correction: Add turn limits, artifact requirements, and budget limits.

14. Multi-Agent Design Checklist

Before building, answer:

Why do we need more than one agent?
What is each agent's role?
What artifact does each agent produce?
Which tools can each agent use?
What context does each agent receive?
What memory can each agent read or write?
How do agents coordinate?
What happens if agents disagree?
What stops the system?
What is logged in the trace?
What requires human approval?

15. Summary

Multi-agent systems are powerful when they divide responsibility clearly. The safe beginner pattern is:

specialized roles + limited tools + explicit artifacts + gates + traces

Do not measure sophistication by the number of agents. Measure it by clarity, control, and quality.

Exercises

Exercise 1: Run The Debate

python3 -m src.mini_meshflow run examples/07_multi_agent_debate.json

Write down each agent-like step and the artifact it produces.

Exercise 2: Role Boundaries

Design three agents for a training-course builder:

Researcher.
Writer.
Reviewer.

For each, list allowed tools and blocked tools.

Exercise 3: Context Boundary

Write the exact context each agent should receive. Keep each context package short and role-specific.

Exercise 4: Add A Gate

Choose one action that should require approval. Explain what artifact the gate checks and what happens if approval is denied.

Ch 07 — Debugging, Traces, and Observability

Ch 09 — Framework Study and Patterns