Learn/Phase 8/Cross-Run Learning with CORAL

Cross-Run Learning with CORAL

Ch 17 · Advanced Systems 50 min

Task fingerprintCosine similarityStrategy recommendation

Hands-on:MESHFLOW_MOCK=1 python3 hands_on/17_cross_run_learning.py

Lesson 17: Cross-Run Learning With CORAL

Lesson Goal

By the end of this lesson, you should be able to:

Explain what CORAL learns and how it stores run patterns.
Enable cross-run learning and configure the pattern database.
Read strategy recommendations from run results.
Understand what task fingerprinting and cosine similarity mean.
Connect CORAL to cost and carbon optimization.

Estimated time: 40 to 55 minutes.

1. The Problem CORAL Solves

You have a pipeline that runs thousands of similar tasks. Each time, it uses the same agent configuration — the same roles, model, and number of steps — regardless of whether a cheaper configuration would have produced equally good results for that task type.

CORAL (Cross-run Optimized Reasoning and Learning) observes your runs over time, stores performance patterns, and recommends the most efficient agent strategy for new tasks that resemble past ones.

The result: after enough runs, CORAL can steer the pipeline toward configurations that cost less, emit less carbon, and finish faster — without sacrificing quality.

2. How CORAL Works

CORAL runs as part of the MeshFlow governance stack. When a run completes:

CORAL extracts a task fingerprint — a compact representation of the task

content (word vectors or token hashes).

It stores the fingerprint alongside the run's: cost, tokens, carbon, duration,

agent roles, quality score (if available), and whether the run succeeded.

On the next run, CORAL searches the pattern store for past fingerprints that

are similar to the current task (using cosine similarity).

If similar past runs are found, CORAL ranks them by cost-efficiency and

recommends the agent configuration that produced the best outcome at the lowest cost.

The recommendation is advisory: you can read it from the run result and decide whether to follow it. CORAL does not automatically change the pipeline — you stay in control.

3. Enabling CORAL

policy = Policy(
    enable_cross_run_learning=True,
    # Persist the pattern store to disk across restarts.
    # Defaults to in-memory (:memory:) if not set.
)

Pass cross_run_db to persist across process restarts:

result = await Mesh(
    agents=agents,
    policy=policy,
).run(task, cross_run_db="coral.db")

After the run, any recommendations appear in the run result:

# result.context may contain CORAL recommendation keys
recommended = result.context.get("coral_recommendation")
if recommended:
    print(f"CORAL recommends: {recommended}")

4. Task Fingerprinting In Plain Language

A task fingerprint answers: "What kind of task is this?"

Two tasks are similar if they have the same kind of content — both ask for contract summaries, both request technical analysis of code, both ask for financial risk assessment. The fingerprint does not care about specific names or numbers; it cares about the shape of the request.

In practice, CORAL computes a vector embedding of the task text and stores it. Cosine similarity between two vectors measures how similar they are in meaning. A similarity of 1.0 means identical; 0.0 means completely different.

You do not need to understand the mathematics. What matters is:

CORAL improves with more runs. The first 10 runs are learning; by run 50 you

start seeing useful recommendations.

CORAL is only as good as the variety of past runs. If all past runs used the

same agent configuration, CORAL has nothing to compare.

5. Strategy Recommendation Format

A CORAL recommendation looks like this:

CORAL recommendation:
  Based on 12 similar past tasks:
  - Efficient strategy (3 agents, researcher+executor+critic) → avg cost $0.003, avg carbon 0.4gCO₂
  - Thorough strategy  (5 agents, full pipeline)              → avg cost $0.021, avg carbon 1.2gCO₂
  Recommendation: use efficient strategy (saves 86% cost and 67% carbon)

The recommendation shows you the options, their historical performance, and which one CORAL thinks is best for the current task. You decide.

6. CORAL And Carbon Budgets

CORAL is most powerful when combined with carbon budgets. If a run has a tight carbon budget, CORAL will prioritize strategies that historically ran within similar budgets:

policy = Policy(
    enable_cross_run_learning=True,
    enable_environmental=True,
    carbon_budget_g=50.0,
)

Over time, CORAL learns which agent configurations can consistently complete similar tasks within 50g CO₂ and recommends those preferentially.

7. What CORAL Cannot Do

It cannot improve the output quality of a cheaper configuration. If the

efficient strategy produces worse results for your task, CORAL will recommend it until you provide quality score feedback that teaches it otherwise.

It does not automatically select the cheaper configuration — you must read

and act on the recommendation.

The pattern store requires persistence (cross_run_db pointing to a real

database) to be useful across restarts.

With fewer than ~10 similar past runs, recommendations are not statistically

meaningful.

8. Hands-On Lab

MESHFLOW_MOCK=1 python3 hands_on/17_cross_run_learning.py

Observe:

The first run produces no recommendation (no history yet).
After several runs, CORAL begins producing strategy comparisons.
Running the efficient agent repeatedly shifts CORAL's recommendation toward it.
Inspect coral.db with sqlite3 to see the stored pattern table.

9. Summary

CORAL learns from past runs by storing task fingerprints and performance metrics. It uses cosine similarity to find similar past tasks and recommends the most efficient agent strategy. Enable it with enable_cross_run_learning=True and provide a cross_run_db path for persistence. Recommendations improve with volume. Combine with enable_environmental=True and carbon_budget_g to steer toward carbon-efficient strategies over time.

Exercises

Exercise 1: Run the Script and Find the First Recommendation

Goal: Observe CORAL's behavior on a cold start (no prior runs) and on a warm start (with prior run data).

Instructions:

Delete the CORAL database file if it exists from a previous session:

   rm -f coral.db

Run the hands-on script for the first time:

   python hands_on/17_cross_run_learning.py

Read the output and find the CORAL recommendation section. On a cold start (no prior runs), CORAL has nothing to compare against. Record:

- What recommendation (if any) CORAL returns when the pattern store is empty - What message or indicator CORAL uses to signal "no prior data" - Which agent strategy was actually chosen and used for this first run

Run the script a second time without deleting coral.db:

   python hands_on/17_cross_run_learning.py

Now CORAL has one prior run to compare against. Read the output again and find:

- The cosine similarity score between the current task fingerprint and the first run's fingerprint - The recommended strategy (should match the first run's strategy if similarity is high) - Whether the recommendation was accepted or whether CORAL chose a different strategy

Record the exact text of the recommendation message. What fields does it contain (e.g., recommended agent, expected cost, expected tokens, confidence score)?

Expected output: On the first run, a "no prior data" message and a default strategy choice. On the second run, a recommendation based on the first run's data, with a cosine similarity score and recommended agent configuration.

Exercise 2: Run the Same Task 5 Times and Observe Recommendation Drift

Goal: Watch CORAL's recommendation improve and potentially change as the pattern store accumulates more runs.

Instructions:

Ensure coral.db is empty (delete it):

   rm -f coral.db

Run the script five times in sequence, keeping coral.db between runs:

   for i in 1 2 3 4 5; do
       echo "=== Run $i ===" && python hands_on/17_cross_run_learning.py
   done

For each run, record in a table:

- Run number - Number of past runs in the pattern store at recommendation time - Cosine similarity score of the best match - Recommended strategy (which agent or configuration was recommended) - Actual strategy used (did CORAL's recommendation change the decision?) - Estimated cost from the recommendation vs. actual cost after the run

After all five runs, answer:

- Did the recommended strategy change between runs 2 and 5? If so, at which run did it change? - Did the confidence of the recommendation increase as more data accumulated? - Was there any run where CORAL recommended a strategy that performed worse than the default?

If the script supports different agent configurations (e.g., a "fast but expensive" agent and a "slow but cheap" agent), alternate between them in runs 3, 4, and 5. Observe whether CORAL learns to recommend the cheaper agent for similar future tasks.

Expected output: A 5-row table showing recommendation drift over time, with a clear narrative of how CORAL's confidence and accuracy changed as the pattern store grew.

Exercise 3: Inspect coral.db with sqlite3

Goal: Understand the internal schema of the CORAL pattern store by reading it directly.

Instructions:

Run the script at least twice to populate coral.db with at least two runs.
Open the database:

   sqlite3 coral.db

List all tables:

   .tables

Inspect the schema of each table:

   .schema

Query the main pattern store table (the name may vary — look for a table related to "patterns", "runs", or "fingerprints"):

   .mode column
   .headers on
   SELECT * FROM coral_patterns ORDER BY created_at DESC LIMIT 10;

Find and record:

- The fingerprint column: what data type is it stored as? (text, blob, JSON?) - The strategy column: what does a strategy entry look like? What fields does it contain? - The outcome column: what metrics are recorded? (cost, tokens, carbon, duration, quality score?) - The created_at column: is this a Unix timestamp, ISO8601, or another format?

Manually compute whether two fingerprints in the database are similar: copy the fingerprint values and describe (in plain language) what they represent. Are they embeddings (float vectors)? Hash strings? Structured JSON?
Run a query to find the run with the lowest actual cost:

   SELECT run_id, strategy, actual_cost_usd
   FROM coral_patterns
   ORDER BY actual_cost_usd ASC
   LIMIT 1;

Exit sqlite3 and write three observations about the schema design. What would you change to make this production-ready (e.g., indexing, normalization, migration strategy)?

Expected output: A description of every table and column in coral.db, the results of at least three SQL queries, and three schema improvement observations.

Exercise 4: Combine CORAL with carbon_budget_g

Goal: See how CORAL and the carbon budget guard interact when optimizing for both cost and emissions.

Instructions:

Modify the script (or find the configuration option) to set a carbon budget in grams of CO2-equivalent:

   app = MeshFlow(
       enable_cross_run_learning=True,
       cross_run_db="coral.db",
       carbon_budget_g=5.0  # 5 grams CO2e per run maximum
   )

Populate coral.db with at least 3 runs using a high-carbon agent strategy (one that exceeds 5g CO2e per run). This simulates a history where past runs were expensive in carbon terms.
Now set the carbon budget to 2.0 grams and run the script again. Observe:

- Does CORAL recommend the same high-carbon strategy as before? - Does the carbon budget guard override CORAL's recommendation? - What message does the system emit when a recommendation violates the budget?

If CORAL's recommended strategy would exceed the carbon budget, what happens? Does the pipeline fall back to a lower-carbon strategy, halt with an error, or proceed anyway with a warning?
Find the lowest-carbon past run in the pattern store:

   SELECT run_id, strategy, carbon_g FROM coral_patterns ORDER BY carbon_g ASC LIMIT 1;

Does CORAL recommend this run's strategy when the carbon budget is tight? Why or why not (consider the similarity score)?

Write a paragraph describing the interaction between CORAL's similarity-based optimization and the carbon budget constraint. Is this a "soft" constraint (CORAL prefers low-carbon but can exceed the budget) or a "hard" constraint (the budget is never exceeded regardless of what CORAL recommends)?

Expected output: Evidence of the carbon budget overriding a CORAL recommendation, the fallback strategy that was used, and a clear description of whether the carbon budget is a hard or soft constraint.

Exercise 5: Design a Strategy Comparison for Your Own Use Case

Goal: Apply CORAL's recommendation mechanism to a real problem from your own domain by designing a strategy comparison experiment.

Instructions:

Think of a repetitive task from your own work or domain that could be performed by an LLM agent. Examples:

- Summarizing customer support tickets - Classifying incoming emails into categories - Generating first-draft marketing copy for product listings - Extracting structured data from PDF invoices

Define at least three distinct agent strategies for this task. Each strategy should differ in at least one of: model size (fast/cheap vs. slow/powerful), prompt style (zero-shot vs. few-shot vs. chain-of-thought), or pipeline structure (single agent vs. multi-agent with a validator). Write a brief description of each strategy, for example:

- Strategy A: GPT-4o-mini, zero-shot prompt, single agent. Expected: fast and cheap, medium quality. - Strategy B: Claude Sonnet, few-shot prompt with 3 examples, single agent. Expected: slower, higher quality. - Strategy C: Claude Sonnet planner + Claude Haiku writer + Claude Sonnet reviewer, multi-agent. Expected: highest quality, highest cost and carbon.

Design a CORAL experiment:

- What task fingerprint features would you use? (e.g., input length, topic category, urgency flag, customer tier) - What outcome metrics would you track? (e.g., quality score from a human rater, cost, latency, carbon_g) - How many runs per strategy would you need before CORAL's recommendations converge? - What similarity threshold would you use to consider two tasks "similar enough" for a recommendation?

Write out your experiment plan as a structured document with: Task Description, Strategy Definitions, Fingerprint Features, Outcome Metrics, Convergence Criterion, and Expected CORAL Behavior After 20 Runs.
Optionally, implement your experiment using the hands-on script as a template. Run it with mock agents (functions that return fake outputs and fake costs) to verify the experimental design before committing to real API calls.

Expected output: A written experiment plan (minimum one page) covering all six sections, with clear justification for the fingerprint features and outcome metrics chosen.

Ch 16 — Production Pipeline Capstone

Ch 18 — OpenTelemetry and Observability

Learn/Phase 8/Cross-Run Learning with CORAL

Cross-Run Learning with CORAL

Ch 17 · Advanced Systems 50 min

Task fingerprintCosine similarityStrategy recommendation

Hands-on:MESHFLOW_MOCK=1 python3 hands_on/17_cross_run_learning.py

Lesson 17: Cross-Run Learning With CORAL

Lesson Goal

By the end of this lesson, you should be able to:

Explain what CORAL learns and how it stores run patterns.
Enable cross-run learning and configure the pattern database.
Read strategy recommendations from run results.
Understand what task fingerprinting and cosine similarity mean.
Connect CORAL to cost and carbon optimization.

Estimated time: 40 to 55 minutes.

1. The Problem CORAL Solves

CORAL (Cross-run Optimized Reasoning and Learning) observes your runs over time, stores performance patterns, and recommends the most efficient agent strategy for new tasks that resemble past ones.

The result: after enough runs, CORAL can steer the pipeline toward configurations that cost less, emit less carbon, and finish faster — without sacrificing quality.

2. How CORAL Works

CORAL runs as part of the MeshFlow governance stack. When a run completes:

CORAL extracts a task fingerprint — a compact representation of the task

content (word vectors or token hashes).

It stores the fingerprint alongside the run's: cost, tokens, carbon, duration,

agent roles, quality score (if available), and whether the run succeeded.

On the next run, CORAL searches the pattern store for past fingerprints that

are similar to the current task (using cosine similarity).

If similar past runs are found, CORAL ranks them by cost-efficiency and

recommends the agent configuration that produced the best outcome at the lowest cost.

The recommendation is advisory: you can read it from the run result and decide whether to follow it. CORAL does not automatically change the pipeline — you stay in control.

3. Enabling CORAL

policy = Policy(
    enable_cross_run_learning=True,
    # Persist the pattern store to disk across restarts.
    # Defaults to in-memory (:memory:) if not set.
)

Pass cross_run_db to persist across process restarts:

result = await Mesh(
    agents=agents,
    policy=policy,
).run(task, cross_run_db="coral.db")

After the run, any recommendations appear in the run result:

# result.context may contain CORAL recommendation keys
recommended = result.context.get("coral_recommendation")
if recommended:
    print(f"CORAL recommends: {recommended}")

4. Task Fingerprinting In Plain Language

A task fingerprint answers: "What kind of task is this?"

You do not need to understand the mathematics. What matters is:

CORAL improves with more runs. The first 10 runs are learning; by run 50 you

start seeing useful recommendations.

CORAL is only as good as the variety of past runs. If all past runs used the

same agent configuration, CORAL has nothing to compare.

5. Strategy Recommendation Format

A CORAL recommendation looks like this:

CORAL recommendation:
  Based on 12 similar past tasks:
  - Efficient strategy (3 agents, researcher+executor+critic) → avg cost $0.003, avg carbon 0.4gCO₂
  - Thorough strategy  (5 agents, full pipeline)              → avg cost $0.021, avg carbon 1.2gCO₂
  Recommendation: use efficient strategy (saves 86% cost and 67% carbon)

The recommendation shows you the options, their historical performance, and which one CORAL thinks is best for the current task. You decide.

6. CORAL And Carbon Budgets

CORAL is most powerful when combined with carbon budgets. If a run has a tight carbon budget, CORAL will prioritize strategies that historically ran within similar budgets:

policy = Policy(
    enable_cross_run_learning=True,
    enable_environmental=True,
    carbon_budget_g=50.0,
)

Over time, CORAL learns which agent configurations can consistently complete similar tasks within 50g CO₂ and recommends those preferentially.

7. What CORAL Cannot Do

It cannot improve the output quality of a cheaper configuration. If the

efficient strategy produces worse results for your task, CORAL will recommend it until you provide quality score feedback that teaches it otherwise.

It does not automatically select the cheaper configuration — you must read

and act on the recommendation.

The pattern store requires persistence (cross_run_db pointing to a real

database) to be useful across restarts.

With fewer than ~10 similar past runs, recommendations are not statistically

meaningful.

8. Hands-On Lab

MESHFLOW_MOCK=1 python3 hands_on/17_cross_run_learning.py

Observe:

The first run produces no recommendation (no history yet).
After several runs, CORAL begins producing strategy comparisons.
Running the efficient agent repeatedly shifts CORAL's recommendation toward it.
Inspect coral.db with sqlite3 to see the stored pattern table.

9. Summary

Exercises

Exercise 1: Run the Script and Find the First Recommendation

Goal: Observe CORAL's behavior on a cold start (no prior runs) and on a warm start (with prior run data).

Instructions:

Delete the CORAL database file if it exists from a previous session:

   rm -f coral.db

Run the hands-on script for the first time:

   python hands_on/17_cross_run_learning.py

Read the output and find the CORAL recommendation section. On a cold start (no prior runs), CORAL has nothing to compare against. Record:

Run the script a second time without deleting coral.db:

   python hands_on/17_cross_run_learning.py

Now CORAL has one prior run to compare against. Read the output again and find:

Record the exact text of the recommendation message. What fields does it contain (e.g., recommended agent, expected cost, expected tokens, confidence score)?

Exercise 2: Run the Same Task 5 Times and Observe Recommendation Drift

Goal: Watch CORAL's recommendation improve and potentially change as the pattern store accumulates more runs.

Instructions:

Ensure coral.db is empty (delete it):

   rm -f coral.db

Run the script five times in sequence, keeping coral.db between runs:

   for i in 1 2 3 4 5; do
       echo "=== Run $i ===" && python hands_on/17_cross_run_learning.py
   done

For each run, record in a table:

After all five runs, answer:

If the script supports different agent configurations (e.g., a "fast but expensive" agent and a "slow but cheap" agent), alternate between them in runs 3, 4, and 5. Observe whether CORAL learns to recommend the cheaper agent for similar future tasks.

Expected output: A 5-row table showing recommendation drift over time, with a clear narrative of how CORAL's confidence and accuracy changed as the pattern store grew.

Exercise 3: Inspect coral.db with sqlite3

Goal: Understand the internal schema of the CORAL pattern store by reading it directly.

Instructions:

Run the script at least twice to populate coral.db with at least two runs.
Open the database:

   sqlite3 coral.db

List all tables:

   .tables

Inspect the schema of each table:

   .schema

Query the main pattern store table (the name may vary — look for a table related to "patterns", "runs", or "fingerprints"):

   .mode column
   .headers on
   SELECT * FROM coral_patterns ORDER BY created_at DESC LIMIT 10;

Find and record:

Manually compute whether two fingerprints in the database are similar: copy the fingerprint values and describe (in plain language) what they represent. Are they embeddings (float vectors)? Hash strings? Structured JSON?
Run a query to find the run with the lowest actual cost:

   SELECT run_id, strategy, actual_cost_usd
   FROM coral_patterns
   ORDER BY actual_cost_usd ASC
   LIMIT 1;

Exit sqlite3 and write three observations about the schema design. What would you change to make this production-ready (e.g., indexing, normalization, migration strategy)?

Expected output: A description of every table and column in coral.db, the results of at least three SQL queries, and three schema improvement observations.

Exercise 4: Combine CORAL with carbon_budget_g

Goal: See how CORAL and the carbon budget guard interact when optimizing for both cost and emissions.

Instructions:

Modify the script (or find the configuration option) to set a carbon budget in grams of CO2-equivalent:

   app = MeshFlow(
       enable_cross_run_learning=True,
       cross_run_db="coral.db",
       carbon_budget_g=5.0  # 5 grams CO2e per run maximum
   )

Populate coral.db with at least 3 runs using a high-carbon agent strategy (one that exceeds 5g CO2e per run). This simulates a history where past runs were expensive in carbon terms.
Now set the carbon budget to 2.0 grams and run the script again. Observe:

If CORAL's recommended strategy would exceed the carbon budget, what happens? Does the pipeline fall back to a lower-carbon strategy, halt with an error, or proceed anyway with a warning?
Find the lowest-carbon past run in the pattern store:

   SELECT run_id, strategy, carbon_g FROM coral_patterns ORDER BY carbon_g ASC LIMIT 1;

Does CORAL recommend this run's strategy when the carbon budget is tight? Why or why not (consider the similarity score)?

Write a paragraph describing the interaction between CORAL's similarity-based optimization and the carbon budget constraint. Is this a "soft" constraint (CORAL prefers low-carbon but can exceed the budget) or a "hard" constraint (the budget is never exceeded regardless of what CORAL recommends)?

Exercise 5: Design a Strategy Comparison for Your Own Use Case

Goal: Apply CORAL's recommendation mechanism to a real problem from your own domain by designing a strategy comparison experiment.

Instructions:

Think of a repetitive task from your own work or domain that could be performed by an LLM agent. Examples:

- Summarizing customer support tickets - Classifying incoming emails into categories - Generating first-draft marketing copy for product listings - Extracting structured data from PDF invoices

Define at least three distinct agent strategies for this task. Each strategy should differ in at least one of: model size (fast/cheap vs. slow/powerful), prompt style (zero-shot vs. few-shot vs. chain-of-thought), or pipeline structure (single agent vs. multi-agent with a validator). Write a brief description of each strategy, for example:

Design a CORAL experiment:

Write out your experiment plan as a structured document with: Task Description, Strategy Definitions, Fingerprint Features, Outcome Metrics, Convergence Criterion, and Expected CORAL Behavior After 20 Runs.
Optionally, implement your experiment using the hands-on script as a template. Run it with mock agents (functions that return fake outputs and fake costs) to verify the experimental design before committing to real API calls.

Expected output: A written experiment plan (minimum one page) covering all six sections, with clear justification for the fingerprint features and outcome metrics chosen.

Ch 16 — Production Pipeline Capstone

Ch 18 — OpenTelemetry and Observability