OpenTelemetry and Observability
MESHFLOW_MOCK=1 python3 hands_on/18_opentelemetry.pyLesson 18: OpenTelemetry And Production Observability
Lesson Goal
By the end of this lesson, you should be able to:
- Explain the difference between logs, metrics, and distributed traces.
- Describe what OpenTelemetry spans are and how they relate to MeshFlow runs.
- Enable console telemetry for local development.
- Configure an OTLP exporter to send spans to Jaeger, Grafana, or Datadog.
- Read trace_id from a run result and look up the trace in your backend.
- Identify what MeshFlow attributes appear on each span type.
Estimated time: 45 to 60 minutes.
1. Logs, Metrics, And Traces
Production observability is built on three pillars:
Logs answer: what happened? They are unstructured event records with a timestamp and a message. Easy to write, hard to correlate across services.
Metrics answer: how much? They are numeric measurements over time — request rate, error rate, latency percentiles, cost per hour. Good for alerting and dashboards.
Distributed traces answer: how did this specific request travel through the system? A trace is a tree of spans, one per operation, each with start time, end time, and attributes. Traces are the most useful tool for understanding the behavior of a specific AI run.
MeshFlow generates distributed traces. Every run is a root span. Every agent execution, gate evaluation, and tool call is a child span.
2. OpenTelemetry
OpenTelemetry (OTEL) is the open standard for generating, collecting, and exporting telemetry data. It works with any backend: Jaeger, Grafana Tempo, Datadog, Honeycomb, New Relic, and others.
Key concepts:
- Span: a single timed operation with a name, start/end times, status, and
key-value attributes.
- Trace: a tree of spans representing one complete operation.
- trace_id: the unique identifier shared by all spans in one trace.
- Exporter: sends spans to a backend (OTLP over HTTP or gRPC).
- Collector: an intermediate service that receives spans and routes them.
3. Console Telemetry For Local Development
The simplest setup requires no backend:
policy = Policy(
telemetry_console=True, # print spans to stdout
)
result = await mesh.run(task)
Console output for each span looks like:
[SPAN] meshflow.run
trace_id : 4a8f2b1c...
run_id : run_abc123
duration_ms : 1243
cost_usd : 0.0024
total_tokens: 312
status : OK
[SPAN] meshflow.agent
trace_id : 4a8f2b1c... ← same trace
agent_id : researcher-agent
role : researcher
tokens : 148
cost_usd : 0.0012
uncertainty : 0.18
verdict : allowed
Console telemetry is ideal for development and debugging. For production, use an OTLP exporter.
4. OTLP Exporter
policy = Policy(
telemetry_otlp_endpoint="http://localhost:4318",
telemetry_otlp_protocol="http/protobuf", # or "grpc"
)
Or via environment variable:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
MESHFLOW_MOCK=1 python3 hands_on/18_opentelemetry.py
The OTLP endpoint can point to:
- Jaeger:
docker run -d -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one - Grafana Tempo:
docker run -d -p 3200:3200 -p 4318:4318 grafana/tempo - OTEL Collector: a collector that fans out to multiple backends
5. MeshFlow Span Attributes
Every MeshFlow span carries attributes that identify the run and the step:
| Attribute | Span type | Description |
|---|---|---|
meshflow.run_id | all | Unique run identifier |
meshflow.trace_id | all | Links to the OTEL trace |
meshflow.agent_id | agent | Agent identifier |
meshflow.role | agent | AgentRole value |
meshflow.cost_usd | agent, run | Cost in USD |
meshflow.tokens | agent, run | Token count |
meshflow.uncertainty | agent | Confidence score (0-1) |
meshflow.verdict | agent | allowed or blocked |
meshflow.carbon_g | agent, run | Carbon footprint |
meshflow.gate_result | gate | passed or blocked |
meshflow.compliance | run | Active compliance mode |
6. Linking trace_id To Your Backend
After a run:
result = await mesh.run(task)
print(result.trace_id) # e.g. "4a8f2b1c9d3e7f2a..."
Use this trace_id in your backend to look up all spans for the run:
In Jaeger: http://localhost:16686/trace/4a8f2b1c9d3e7f2a
In Grafana: search by meshflow.run_id or meshflow.trace_id in the trace explorer.
7. Filtering By Blocked Spans
One of the most useful queries in production: find all runs where a guardian blocked an agent. In Jaeger or Grafana, filter by:
meshflow.verdict = blocked
This shows you every span where the safety guardian rejected an agent output — invaluable for monitoring what kinds of content your pipeline is blocking and whether you need to tune the guardian.
8. Cost And Latency Analysis
With spans exported to a backend, you can aggregate:
- p95 latency per agent role: which role takes the longest 95% of the time?
- Total cost per pipeline run: are costs growing over time?
- Token usage by workflow version: did a prompt change increase token usage?
- Carbon per pipeline variant: which configuration is most efficient?
These aggregations require a backend with query capability (Grafana, Datadog, Honeycomb). Console telemetry alone cannot answer them.
9. Hands-On Lab
MESHFLOW_MOCK=1 python3 hands_on/18_opentelemetry.py
Observe the console span output. For each span, note:
- The trace_id (same across all spans in one run)
- The agent_id and role
- The verdict (allowed vs. blocked)
- The cost and token count
For the full Jaeger experience:
docker run -d -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one
MESHFLOW_MOCK=1 python3 hands_on/18_opentelemetry.py --jaeger
open http://localhost:16686
10. Summary
OpenTelemetry transforms MeshFlow runs into queryable distributed traces. Each run produces a root span; each agent, gate, and tool produces a child span with attributes for cost, tokens, role, verdict, and carbon. Enable console output with telemetry_console=True for local development. Export to a backend with telemetry_otlp_endpoint for production monitoring. Use result.trace_id to look up the full trace. Filter by meshflow.verdict=blocked to monitor safety events.
Exercises
Exercises
Exercise 1: Run with Console Telemetry and Find All Span Types
Goal: Identify every distinct span type that MeshFlow emits by reading the console telemetry output.
Instructions:
- Run the hands-on script with console telemetry enabled:
python hands_on/18_opentelemetry.py
The script configures telemetry_console=True, which prints span data to stdout in a human-readable format.
- Read the complete output. Every line (or block) that begins with
[SPAN]or similar is one span. Copy the output into your notes. - Go through every span and record its
namefield (this is the span type). Common span types you should find include:
- meshflow.run — the root span covering the entire pipeline execution - meshflow.node — one per node execution - meshflow.gate — for each gate evaluation (HITL, content policy, etc.) - meshflow.ledger_write — for each ledger record written - meshflow.policy_check — for each policy evaluation
- For each span type, record:
- The span name - Whether it has a parent span (and which span is the parent) - The key attributes attached to it (e.g., run_id, node_id, cost_usd, verdict) - The approximate duration in milliseconds
- How many total spans were emitted for a single pipeline run? Does this number match the number of nodes in the pipeline, or are there additional spans for ledger writes, policy checks, and gate evaluations?
Expected output: A complete inventory of span types with their names, parent relationships, key attributes, and durations — covering every span emitted during the run.
Exercise 2: Identify the trace_id and Confirm It Is the Same Across All Spans
Goal: Verify that all spans in a single run share one trace_id, confirming that they belong to the same logical trace.
Instructions:
- Run the script and capture the full telemetry output.
- Find the
trace_idfield in the rootmeshflow.runspan. It should be a 32-character hexadecimal string, for example:4bf92f3577b34da6a3ce929d0e0e4736. - Now search every other span in the output for their
trace_idfield. Use a simple text search:
python hands_on/18_opentelemetry.py 2>&1 | grep trace_id
- Verify that every span has the same
trace_id. Record the trace_id and the count of spans that share it. - Now look at the
span_idfield. Every span should have a uniquespan_id. The root span'sspan_idshould appear as theparent_span_idof all direct child spans. - Draw a tree in your notes showing the parent-child span relationships. Use indentation to show nesting depth. The root span should be at the top; direct children one indent level below; grandchildren two levels below.
- If you run the script twice, do the two runs share the same
trace_id? Why or why not?
Expected output: Confirmation that all N spans share one trace_id, a mapping of span_id to parent_span_id, and a hand-drawn span tree showing the nesting structure.
Exercise 3: Find the Blocked Span in the Output
Goal: Locate the span with verdict=blocked and understand what it represents.
Instructions:
- The hands-on script includes a pipeline stage that triggers a policy block — for example, an agent that produces content flagged by a content classifier, or a node whose output exceeds a cost threshold. Run the script:
python hands_on/18_opentelemetry.py
- Search the output for the word "blocked":
python hands_on/18_opentelemetry.py 2>&1 | grep -i blocked
- Find the complete span that contains
verdict=blocked. Record all of its attributes:span_id,parent_span_id,trace_id,node_id,agent_id,role,verdict,cost_usd,tokens,uncertainty,carbon_g,gate_result, and any others present. - Answer the following questions about the blocked span:
- Which node generated the blocked span? What was this node's role in the pipeline? - What attribute triggered the block? Was it the content, the cost, the uncertainty level, or something else? - Did the pipeline halt after the block, or did it continue on an alternate path? - Is there a subsequent span showing what happened after the block (e.g., a fallback node, a HITL escalation, or a rejection terminal)?
- In a production monitoring system (Jaeger, Grafana), you would set up an alert that fires whenever a span with
verdict=blockedappears. Write the query you would use in a Jaeger UI or PromQL expression to find all blocked spans from the last hour:
# Jaeger search (conceptual)
service: meshflow tag: verdict=blocked lookback: 1h
Expected output: The complete attribute list of the blocked span, answers to all four questions about the block context, and a written alert query for a monitoring system.
Exercise 4: Design a Trace Query to Find All Runs Over $0.01
Goal: Write and explain a trace query that finds expensive runs using span attribute filtering.
Instructions:
- The
meshflow.runroot span has acost_usdattribute recording the total cost of the entire pipeline run. You want to find all runs that cost more than $0.01. - In Jaeger's UI (or the query language of your chosen backend), write the query:
service: meshflow
operation: meshflow.run
tag: cost_usd > 0.01
lookback: 24h
Note: Jaeger's tag filter syntax may require exact key-value matching. For range queries, you may need to use a backend that supports them (Grafana Tempo with TraceQL, or Honeycomb with their query builder).
- Write the equivalent query in three different backends:
- Jaeger search UI: tag filter approach - Grafana Tempo TraceQL:
{ span.meshflow.run_cost_usd > 0.01 }
- Honeycomb:
{"column": "span.cost_usd", "op": ">", "value": 0.01}
- For each query, explain:
- At what span level is cost_usd available? (root run span only, or also on individual node spans?) - Would the query return individual node spans or the root run span? - How would you drill down from the root span to find which node was the most expensive?
- Design an alert rule (in Prometheus alert syntax or plain English) that fires if more than 3 runs in the last 10 minutes exceeded $0.01:
alert: HighCostRunRate
expr: count(meshflow_run_cost_usd > 0.01)[10m] > 3
for: 0m
labels:
severity: warning
annotations:
summary: "More than 3 runs exceeded $0.01 in the last 10 minutes"
Expected output: Written queries in three backends, explanations of span-level attribute availability, a drill-down strategy for finding the most expensive node, and a complete alert rule definition.
Exercise 5: Set Up Jaeger with Docker and Search for a Specific Run
Goal: Export real OTEL spans to a local Jaeger instance and use the Jaeger UI to find a specific run by trace_id.
Instructions:
- Start Jaeger using Docker (the all-in-one image includes the collector, query service, and UI):
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
- Port 16686: Jaeger UI - Port 4317: OTLP gRPC receiver - Port 4318: OTLP HTTP receiver
- Configure the hands-on script to send spans to Jaeger's OTLP HTTP endpoint:
app = MeshFlow(
telemetry_otlp_endpoint="http://localhost:4318/v1/traces"
)
Or set the environment variable:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
- Run the pipeline:
python hands_on/18_opentelemetry.py
- Note the
trace_idprinted in the console output (or the run_id from the pipeline output — they may differ; use the trace_id from the OTEL data). - Open the Jaeger UI in your browser:
http://localhost:16686 - In the Jaeger UI:
- Select service: meshflow (or whatever service name the script uses) - Click "Find Traces" - Click on the most recent trace to open the span waterfall view
- In the span waterfall, answer the following:
- How many spans are shown for this trace? - Which span is the root (longest bar)? - Which spans ran in parallel (overlapping bars)? - Find the span with verdict=blocked (if present) — what color does Jaeger use to highlight error spans?
- Use "Search by Tag" in Jaeger to find a run by a specific attribute:
- Search for run_id=<the run_id from the output> - Confirm that exactly one trace appears
- Stop and remove the Jaeger container when finished:
docker stop jaeger && docker rm jaeger
Expected output: A description of the Jaeger UI span waterfall view for your run, answers to the seven waterfall questions, confirmation that the run_id search returns exactly one trace, and a note about the parallel spans' visual representation.