Cross-Run Learning with CORAL
MESHFLOW_MOCK=1 python3 hands_on/17_cross_run_learning.pyLesson 17: Cross-Run Learning With CORAL
Lesson Goal
By the end of this lesson, you should be able to:
- Explain what CORAL learns and how it stores run patterns.
- Enable cross-run learning and configure the pattern database.
- Read strategy recommendations from run results.
- Understand what task fingerprinting and cosine similarity mean.
- Connect CORAL to cost and carbon optimization.
Estimated time: 40 to 55 minutes.
1. The Problem CORAL Solves
You have a pipeline that runs thousands of similar tasks. Each time, it uses the same agent configuration — the same roles, model, and number of steps — regardless of whether a cheaper configuration would have produced equally good results for that task type.
CORAL (Cross-run Optimized Reasoning and Learning) observes your runs over time, stores performance patterns, and recommends the most efficient agent strategy for new tasks that resemble past ones.
The result: after enough runs, CORAL can steer the pipeline toward configurations that cost less, emit less carbon, and finish faster — without sacrificing quality.
2. How CORAL Works
CORAL runs as part of the MeshFlow governance stack. When a run completes:
- CORAL extracts a task fingerprint — a compact representation of the task
content (word vectors or token hashes).
- It stores the fingerprint alongside the run's: cost, tokens, carbon, duration,
agent roles, quality score (if available), and whether the run succeeded.
- On the next run, CORAL searches the pattern store for past fingerprints that
are similar to the current task (using cosine similarity).
- If similar past runs are found, CORAL ranks them by cost-efficiency and
recommends the agent configuration that produced the best outcome at the lowest cost.
The recommendation is advisory: you can read it from the run result and decide whether to follow it. CORAL does not automatically change the pipeline — you stay in control.
3. Enabling CORAL
policy = Policy(
enable_cross_run_learning=True,
# Persist the pattern store to disk across restarts.
# Defaults to in-memory (:memory:) if not set.
)
Pass cross_run_db to persist across process restarts:
result = await Mesh(
agents=agents,
policy=policy,
).run(task, cross_run_db="coral.db")
After the run, any recommendations appear in the run result:
# result.context may contain CORAL recommendation keys
recommended = result.context.get("coral_recommendation")
if recommended:
print(f"CORAL recommends: {recommended}")
4. Task Fingerprinting In Plain Language
A task fingerprint answers: "What kind of task is this?"
Two tasks are similar if they have the same kind of content — both ask for contract summaries, both request technical analysis of code, both ask for financial risk assessment. The fingerprint does not care about specific names or numbers; it cares about the shape of the request.
In practice, CORAL computes a vector embedding of the task text and stores it. Cosine similarity between two vectors measures how similar they are in meaning. A similarity of 1.0 means identical; 0.0 means completely different.
You do not need to understand the mathematics. What matters is:
- CORAL improves with more runs. The first 10 runs are learning; by run 50 you
start seeing useful recommendations.
- CORAL is only as good as the variety of past runs. If all past runs used the
same agent configuration, CORAL has nothing to compare.
5. Strategy Recommendation Format
A CORAL recommendation looks like this:
CORAL recommendation:
Based on 12 similar past tasks:
- Efficient strategy (3 agents, researcher+executor+critic) → avg cost $0.003, avg carbon 0.4gCO₂
- Thorough strategy (5 agents, full pipeline) → avg cost $0.021, avg carbon 1.2gCO₂
Recommendation: use efficient strategy (saves 86% cost and 67% carbon)
The recommendation shows you the options, their historical performance, and which one CORAL thinks is best for the current task. You decide.
6. CORAL And Carbon Budgets
CORAL is most powerful when combined with carbon budgets. If a run has a tight carbon budget, CORAL will prioritize strategies that historically ran within similar budgets:
policy = Policy(
enable_cross_run_learning=True,
enable_environmental=True,
carbon_budget_g=50.0,
)
Over time, CORAL learns which agent configurations can consistently complete similar tasks within 50g CO₂ and recommends those preferentially.
7. What CORAL Cannot Do
- It cannot improve the output quality of a cheaper configuration. If the
efficient strategy produces worse results for your task, CORAL will recommend it until you provide quality score feedback that teaches it otherwise.
- It does not automatically select the cheaper configuration — you must read
and act on the recommendation.
- The pattern store requires persistence (
cross_run_dbpointing to a real
database) to be useful across restarts.
- With fewer than ~10 similar past runs, recommendations are not statistically
meaningful.
8. Hands-On Lab
MESHFLOW_MOCK=1 python3 hands_on/17_cross_run_learning.py
Observe:
- The first run produces no recommendation (no history yet).
- After several runs, CORAL begins producing strategy comparisons.
- Running the efficient agent repeatedly shifts CORAL's recommendation toward it.
- Inspect
coral.dbwith sqlite3 to see the stored pattern table.
9. Summary
CORAL learns from past runs by storing task fingerprints and performance metrics. It uses cosine similarity to find similar past tasks and recommends the most efficient agent strategy. Enable it with enable_cross_run_learning=True and provide a cross_run_db path for persistence. Recommendations improve with volume. Combine with enable_environmental=True and carbon_budget_g to steer toward carbon-efficient strategies over time.
Exercises
Exercises
Exercise 1: Run the Script and Find the First Recommendation
Goal: Observe CORAL's behavior on a cold start (no prior runs) and on a warm start (with prior run data).
Instructions:
- Delete the CORAL database file if it exists from a previous session:
rm -f coral.db
- Run the hands-on script for the first time:
python hands_on/17_cross_run_learning.py
- Read the output and find the CORAL recommendation section. On a cold start (no prior runs), CORAL has nothing to compare against. Record:
- What recommendation (if any) CORAL returns when the pattern store is empty - What message or indicator CORAL uses to signal "no prior data" - Which agent strategy was actually chosen and used for this first run
- Run the script a second time without deleting
coral.db:
python hands_on/17_cross_run_learning.py
- Now CORAL has one prior run to compare against. Read the output again and find:
- The cosine similarity score between the current task fingerprint and the first run's fingerprint - The recommended strategy (should match the first run's strategy if similarity is high) - Whether the recommendation was accepted or whether CORAL chose a different strategy
- Record the exact text of the recommendation message. What fields does it contain (e.g., recommended agent, expected cost, expected tokens, confidence score)?
Expected output: On the first run, a "no prior data" message and a default strategy choice. On the second run, a recommendation based on the first run's data, with a cosine similarity score and recommended agent configuration.
Exercise 2: Run the Same Task 5 Times and Observe Recommendation Drift
Goal: Watch CORAL's recommendation improve and potentially change as the pattern store accumulates more runs.
Instructions:
- Ensure
coral.dbis empty (delete it):
rm -f coral.db
- Run the script five times in sequence, keeping
coral.dbbetween runs:
for i in 1 2 3 4 5; do
echo "=== Run $i ===" && python hands_on/17_cross_run_learning.py
done
- For each run, record in a table:
- Run number - Number of past runs in the pattern store at recommendation time - Cosine similarity score of the best match - Recommended strategy (which agent or configuration was recommended) - Actual strategy used (did CORAL's recommendation change the decision?) - Estimated cost from the recommendation vs. actual cost after the run
- After all five runs, answer:
- Did the recommended strategy change between runs 2 and 5? If so, at which run did it change? - Did the confidence of the recommendation increase as more data accumulated? - Was there any run where CORAL recommended a strategy that performed worse than the default?
- If the script supports different agent configurations (e.g., a "fast but expensive" agent and a "slow but cheap" agent), alternate between them in runs 3, 4, and 5. Observe whether CORAL learns to recommend the cheaper agent for similar future tasks.
Expected output: A 5-row table showing recommendation drift over time, with a clear narrative of how CORAL's confidence and accuracy changed as the pattern store grew.
Exercise 3: Inspect coral.db with sqlite3
Goal: Understand the internal schema of the CORAL pattern store by reading it directly.
Instructions:
- Run the script at least twice to populate
coral.dbwith at least two runs. - Open the database:
sqlite3 coral.db
- List all tables:
.tables
- Inspect the schema of each table:
.schema
- Query the main pattern store table (the name may vary — look for a table related to "patterns", "runs", or "fingerprints"):
.mode column
.headers on
SELECT * FROM coral_patterns ORDER BY created_at DESC LIMIT 10;
- Find and record:
- The fingerprint column: what data type is it stored as? (text, blob, JSON?) - The strategy column: what does a strategy entry look like? What fields does it contain? - The outcome column: what metrics are recorded? (cost, tokens, carbon, duration, quality score?) - The created_at column: is this a Unix timestamp, ISO8601, or another format?
- Manually compute whether two fingerprints in the database are similar: copy the fingerprint values and describe (in plain language) what they represent. Are they embeddings (float vectors)? Hash strings? Structured JSON?
- Run a query to find the run with the lowest actual cost:
SELECT run_id, strategy, actual_cost_usd
FROM coral_patterns
ORDER BY actual_cost_usd ASC
LIMIT 1;
- Exit sqlite3 and write three observations about the schema design. What would you change to make this production-ready (e.g., indexing, normalization, migration strategy)?
Expected output: A description of every table and column in coral.db, the results of at least three SQL queries, and three schema improvement observations.
Exercise 4: Combine CORAL with carbon_budget_g
Goal: See how CORAL and the carbon budget guard interact when optimizing for both cost and emissions.
Instructions:
- Modify the script (or find the configuration option) to set a carbon budget in grams of CO2-equivalent:
app = MeshFlow(
enable_cross_run_learning=True,
cross_run_db="coral.db",
carbon_budget_g=5.0 # 5 grams CO2e per run maximum
)
- Populate
coral.dbwith at least 3 runs using a high-carbon agent strategy (one that exceeds 5g CO2e per run). This simulates a history where past runs were expensive in carbon terms. - Now set the carbon budget to
2.0grams and run the script again. Observe:
- Does CORAL recommend the same high-carbon strategy as before? - Does the carbon budget guard override CORAL's recommendation? - What message does the system emit when a recommendation violates the budget?
- If CORAL's recommended strategy would exceed the carbon budget, what happens? Does the pipeline fall back to a lower-carbon strategy, halt with an error, or proceed anyway with a warning?
- Find the lowest-carbon past run in the pattern store:
SELECT run_id, strategy, carbon_g FROM coral_patterns ORDER BY carbon_g ASC LIMIT 1;
Does CORAL recommend this run's strategy when the carbon budget is tight? Why or why not (consider the similarity score)?
- Write a paragraph describing the interaction between CORAL's similarity-based optimization and the carbon budget constraint. Is this a "soft" constraint (CORAL prefers low-carbon but can exceed the budget) or a "hard" constraint (the budget is never exceeded regardless of what CORAL recommends)?
Expected output: Evidence of the carbon budget overriding a CORAL recommendation, the fallback strategy that was used, and a clear description of whether the carbon budget is a hard or soft constraint.
Exercise 5: Design a Strategy Comparison for Your Own Use Case
Goal: Apply CORAL's recommendation mechanism to a real problem from your own domain by designing a strategy comparison experiment.
Instructions:
- Think of a repetitive task from your own work or domain that could be performed by an LLM agent. Examples:
- Summarizing customer support tickets - Classifying incoming emails into categories - Generating first-draft marketing copy for product listings - Extracting structured data from PDF invoices
- Define at least three distinct agent strategies for this task. Each strategy should differ in at least one of: model size (fast/cheap vs. slow/powerful), prompt style (zero-shot vs. few-shot vs. chain-of-thought), or pipeline structure (single agent vs. multi-agent with a validator). Write a brief description of each strategy, for example:
- Strategy A: GPT-4o-mini, zero-shot prompt, single agent. Expected: fast and cheap, medium quality. - Strategy B: Claude Sonnet, few-shot prompt with 3 examples, single agent. Expected: slower, higher quality. - Strategy C: Claude Sonnet planner + Claude Haiku writer + Claude Sonnet reviewer, multi-agent. Expected: highest quality, highest cost and carbon.
- Design a CORAL experiment:
- What task fingerprint features would you use? (e.g., input length, topic category, urgency flag, customer tier) - What outcome metrics would you track? (e.g., quality score from a human rater, cost, latency, carbon_g) - How many runs per strategy would you need before CORAL's recommendations converge? - What similarity threshold would you use to consider two tasks "similar enough" for a recommendation?
- Write out your experiment plan as a structured document with: Task Description, Strategy Definitions, Fingerprint Features, Outcome Metrics, Convergence Criterion, and Expected CORAL Behavior After 20 Runs.
- Optionally, implement your experiment using the hands-on script as a template. Run it with mock agents (functions that return fake outputs and fake costs) to verify the experimental design before committing to real API calls.
Expected output: A written experiment plan (minimum one page) covering all six sections, with clear justification for the fingerprint features and outcome metrics chosen.