Learn/Phase 7/Environmental Cost and Carbon Budgets

Environmental Cost and Carbon Budgets

Ch 13 · Production Engineering 45 min
MARLINcarbon_budget_gESG reportingCarbon-aware routing
Hands-on:MESHFLOW_MOCK=1 python3 hands_on/13_environmental.py

Lesson 13: Environmental Cost And Carbon Budgets

Lesson Goal

By the end of this lesson, you should be able to:

  • Explain why AI energy consumption is a measurable and reportable cost.
  • Read carbon footprint data from run results and per-agent states.
  • Set a carbon budget and understand what happens when a run exceeds it.
  • Describe MARLIN and its estimation approach at a conceptual level.
  • Connect environmental tracking to responsible AI and ESG reporting.

Estimated time: 35 to 50 minutes.

1. Why Track AI Carbon Cost?

Running an LLM call consumes electricity. That electricity produces CO₂ depending on the energy mix of the data center region. At small scale — one developer, a few hundred calls per day — the footprint is negligible. At production scale — millions of calls per day across a global fleet — it becomes material.

More importantly, it is now expected. Investors, regulators, and customers increasingly ask organizations to report the carbon footprint of their technology operations. AI workloads are one of the fastest-growing contributors to data center energy use. If you cannot measure it, you cannot report it.

Environmental tracking in MeshFlow serves three practical purposes:

  1. Visibility: see how much carbon each run and each agent consumes.
  2. Budget enforcement: stop a run before it exceeds a carbon cap.
  3. Optimization signal: choose cheaper models or routes when carbon matters.

2. MARLIN: MeshFlow's Environmental Cost Component

MARLIN (the MeshFlow environmental cost module) estimates carbon and water consumption per run based on:

  • Token count: more tokens burned → more compute → more energy.
  • Model family: larger models consume more energy per token.
  • Cloud region: the energy mix (renewable vs. fossil) varies by region.

MARLIN produces:

  • gCO₂ (grams of CO₂) per agent and per run.
  • Water intensity estimates in milliliters per 1,000 tokens.

These are estimates, not measurements. MARLIN uses published energy intensity figures from cloud providers and academic research. They will not exactly match your provider's actual consumption, but they are accurate enough for trending, budgeting, and relative comparison between model choices.

3. Enabling Environmental Tracking

policy = Policy(
    enable_environmental=True,
    carbon_budget_g=500.0,   # hard cap: 500 grams CO₂ per run
)

After the run:

result = await mesh.run(task="Analyze the quarterly report")

print(result.total_carbon_g)          # total gCO₂ for the run
print(result.total_cost_usd)          # total USD cost

for state in result.agent_states:
    print(state.agent_id, state.carbon_g)  # per-agent carbon

4. Carbon Budget Enforcement

When carbon_budget_g is set, the runtime tracks cumulative carbon across agents. If the budget is exceeded mid-run, the remaining agents are blocked and the run status is set to aborted with the reason carbon_budget_exceeded.

policy = Policy(
    enable_environmental=True,
    carbon_budget_g=0.001,   # very small — will trigger on any real run
)
result = await mesh.run(task="...")
# result.status == RunStatus.ABORTED if budget exceeded

Setting carbon_budget_g=None (the default) disables budget enforcement while still tracking and reporting carbon.

5. Using Carbon Data In Routing Decisions

Carbon tracking is most powerful when combined with conditional routing. You can route a workflow toward a lighter model after observing that a heavier model's output was good enough for the task:

# Check carbon after a research step
if context.get("research_carbon_g", 0) > 100:
    # Route to summarization-only mode
    pass

Or you can pre-select a carbon-efficient model for tasks that do not require the most capable model:

# Tasks that need fast, cheap, low-carbon output
lite_policy = Policy(enable_environmental=True, carbon_budget_g=50.0)

6. Water Intensity

In addition to CO₂, MARLIN estimates water consumption. Data centers use water for cooling. Regions with low renewable energy fractions often have higher water intensity per compute unit. The total_water_ml field in the run result reports the estimated water consumption in milliliters.

Water intensity is directional: you cannot measure it exactly without access to your provider's infrastructure data. Use it as a relative comparison tool, not an absolute reporting metric.

7. ESG Reporting Integration

If your organization reports under GHG Protocol or Scope 3 emissions, AI inference costs fall under Scope 3 Category 1 (purchased goods and services). MARLIN's per-run carbon estimates can feed into your internal ESG reporting pipeline:

  • Export run data with ledger.export_run_csv(run_id)
  • Sum total_carbon_g across all production runs per reporting period
  • Convert grams to metric tons for GHG reports

This does not replace a full carbon accounting system, but it provides the data provenance needed to make AI emissions visible.

8. Hands-On Lab

MESHFLOW_MOCK=1 python3 hands_on/13_environmental.py

Observe:

  • The total_carbon_g difference between a light and a heavy pipeline
  • How the carbon budget triggers an early stop
  • The per-agent carbon breakdown in result.agent_states
  • The streaming event output showing live carbon tracking

Try modifying carbon_budget_g to different values and observe when the budget triggers.

9. Summary

MARLIN tracks estimated gCO₂ and water consumption per agent and per run. You set enable_environmental=True to activate it and carbon_budget_g to enforce a hard cap. Carbon data flows through result.total_carbon_g and agent_state.carbon_g. Use it for visibility, budget enforcement, routing decisions, and ESG reporting. All figures are estimates — use them for trends and relative comparisons, not for precise emission reporting.


Exercises

Exercises

Exercise 1: Run the Environmental Script and Read Carbon Metrics

Goal: Observe end-to-end environmental tracking output for a multi-agent workflow.

Instructions:

  1. Run the hands-on script:
   python hands_on/13_environmental.py
  1. Read through the output and locate the environmental summary block (look for labels like total_carbon_g, water_ml, or ESG Report).
  2. Record the following values from the output:

- total_carbon_g for the entire run. - The agent with the highest carbon_g (per-agent breakdown). - The agent with the lowest carbon_g. - The total water usage in milliliters (if shown).

  1. Answer: Which agent consumed the most carbon? Does this make intuitive sense given what that agent does (e.g., a large LLM call vs. a simple data lookup)?

Expected output: A labeled summary block showing per-agent and total carbon values, plus water intensity metrics. All values should be non-negative floats.


Exercise 2: Set a Carbon Budget and Trigger a Budget Exceeded Event

Goal: Experience how carbon_budget_g limits workflow execution.

Instructions:

  1. Open hands_on/13_environmental.py (or create a short test script) and set a very low carbon budget to force an early stop:
   from meshflow import MeshFlow

   app = MeshFlow(
       enable_environmental=True,
       carbon_budget_g=0.001  # 1 milligram — almost certainly exceeded immediately
   )
  1. Run the script and observe what happens:

- Does the workflow raise an exception, return a partial result, or log a warning? - At which step did the budget get exceeded? - What is the exact exception or event type raised?

  1. Now set carbon_budget_g=1000.0 (a generous budget) and run again. Confirm the workflow completes normally and the total carbon used is well under budget.
  2. Find the threshold where the workflow "just fits" — gradually lower the budget until the workflow starts failing, then set it just slightly above the last successful value.

Expected output: With a tiny budget, the workflow stops at or near step 1 with a CarbonBudgetExceededError (or similar). With a large budget, it completes normally.


Exercise 3: Compare Carbon Across Model Sizes

Goal: Understand how model selection affects carbon emissions.

Instructions:

  1. The hands-on script (or a modified version) should demonstrate running the same task with different model sizes (e.g., a small 7B model vs. a large 70B model). If the script does not include this, write a short comparison:
   from meshflow import MeshFlow

   for model in ["small-7b", "medium-13b", "large-70b"]:
       app = MeshFlow(enable_environmental=True, default_model=model)
       result = app.run({"task": "summarize this paragraph in one sentence"})
       env = app.last_run.environmental_summary()
       print(f"{model}: {env['total_carbon_g']:.4f} g CO2e, "
             f"water: {env['water_ml']:.2f} mL")
  1. Record the carbon and water values for each model.
  2. Calculate the carbon ratio between the largest and smallest model. Is it roughly proportional to the parameter count ratio, or is it different?
  3. Write two to three sentences reflecting on the trade-off between model capability and environmental impact.

Expected output: A three-row comparison table showing model name, carbon grams, and water milliliters. Larger models should show higher emissions.


Exercise 4: Generate an ESG Report

Goal: Produce a machine-readable ESG report from environmental tracking data.

Instructions:

  1. Run the hands-on script and capture the run ID printed in the output.
  2. In a Python REPL, generate an ESG report:
   from meshflow import MeshFlow
   from meshflow.environmental import ESGReporter

   reporter = ESGReporter()
   report = reporter.generate(run_id="<your_run_id>", format="json")
   import json
   print(json.dumps(report, indent=2))
  1. Examine the JSON structure. Identify:

- The GHG Protocol scope classification (Scope 1, 2, or 3) applied to AI compute emissions. - The carbon intensity of the compute region used (grams CO2e per kWh). - Any water stress classification for the data center region.

  1. If format="csv" is supported, generate that version too and compare the level of detail between JSON and CSV formats.

Expected output: A structured JSON report with fields like total_emissions_g_co2e, energy_kwh, carbon_intensity_g_kwh, water_withdrawn_ml, reporting_period, and GHG Protocol metadata.


Exercise 5: Observe Carbon-Aware Routing in Action

Goal: See how MeshFlow selects a compute region based on real-time carbon intensity.

Instructions:

  1. If the hands-on script includes a carbon-aware routing demo, run it and note which region was selected and why.
  2. To explore the routing logic directly:
   from meshflow.environmental import CarbonAwareRouter

   router = CarbonAwareRouter()
   candidates = ["us-east-1", "eu-west-1", "ap-southeast-1", "us-west-2"]
   selected = router.select_region(candidates, strategy="lowest_carbon")
   print(f"Selected region: {selected.region}")
   print(f"Current carbon intensity: {selected.carbon_intensity_g_kwh} gCO2e/kWh")
   print(f"Forecast valid until: {selected.forecast_valid_until}")
  1. Call router.select_region with strategy="lowest_water" and compare the selection.
  2. Call it with strategy="balanced" and observe how it trades off carbon vs. latency.
  3. Write a short paragraph (3–4 sentences) explaining when you would choose lowest_carbon vs. balanced routing in a real production system.

Expected output: Region selection output with the chosen region name, current carbon intensity value, and the strategy reasoning. Different strategies should sometimes select different regions.