Learn/Phase 7/Ledger Audit and Tamper Detection

Ledger Audit and Tamper Detection

Ch 11 · Production Engineering 50 min
SHA-256 chainverify_chainTime-travelGDPR anonymize
Hands-on:MESHFLOW_MOCK=1 python3 hands_on/11_ledger_audit.py

Lesson 11: Ledger Audit And Tamper Detection

Lesson Goal

By the end of this lesson, you should be able to:

  • Explain why a tamper-evident ledger is essential for governed AI.
  • Describe how SHA-256 hash chaining makes tampering detectable.
  • Use the ReplayLedger API to read, verify, diff, and export runs.
  • Travel back to any step in a run and fork a new run from that point.
  • Anonymize a run to comply with GDPR right-to-erasure requirements.

Estimated time: 40 to 55 minutes.

1. Why Audit Matters In AI Systems

An AI workflow can produce wrong, biased, or harmful output. When that happens, you need to answer three questions:

  1. What exactly did the system do?
  2. Why did it do that?
  3. Has anyone tampered with the record since the run completed?

A standard application log answers the first two questions but not the third. Logs can be edited silently. In regulated environments — healthcare, finance, legal — silent editing is unacceptable. You need a ledger that makes tampering visible.

2. SHA-256 Hash Chaining

MeshFlow records every workflow step as a StepRecord. After writing each record, it computes a SHA-256 hash of the record content combined with the hash of the previous record. This creates a chain:

step_0:  content_hash_0
step_1:  SHA-256(content_1 + content_hash_0) = hash_1
step_2:  SHA-256(content_2 + hash_1)         = hash_2
step_N:  SHA-256(content_N + hash_{N-1})     = hash_N

To verify the chain, the verifier recomputes every hash from scratch. If any record was modified — even a single character — the chain breaks at that step. The verifier reports exactly which step was tampered with.

What SHA-256 hash chaining proves:

  • Completeness: no steps were deleted from the middle.
  • Integrity: no step content was changed after it was written.
  • Order: steps cannot be reordered without breaking the chain.

What it does NOT prove:

  • The original record was correct (garbage in, garbage out).
  • The system was running the right code version.
  • Access was properly controlled before the record was written.

3. The ReplayLedger API

from meshflow.core.ledger import ReplayLedger

ledger = ReplayLedger("my_pipeline.db")

# List all run IDs stored in this ledger
run_ids = ledger.list_runs()

# Read all step records for a run
steps = ledger.get_run(run_id)

# Aggregated metrics for a run
summary = ledger.run_summary(run_id)
# summary contains: total_cost_usd, total_tokens, total_carbon_g, duration_s, step_count

# Verify the hash chain — returns True or raises on tamper
ok = ledger.verify_chain(run_id)

# Export to JSON string
json_text = ledger.export_run(run_id)

# Export to CSV string
csv_text = ledger.export_run_csv(run_id)

4. Time-Travel And Forking

The ledger stores every intermediate state. You can reconstruct the workflow context at any step — this is called time-travel:

# Load the state as it was after step 3
state_at_step_3 = ledger.load_state(run_id, step=3)

# Fork: start a new run from step 3 with a different policy
new_run_id = ledger.fork(run_id, step=3, new_run_id="fork_run_001")

Time-travel is useful for:

  • Debugging: reproduce the exact state that caused a failure.
  • Counterfactual analysis: what would have happened with a different policy?
  • Incremental correction: fix a bad step without re-running the whole pipeline.

5. Comparing Runs With diff

After changing a prompt or policy, you can compare two runs step by step:

delta = ledger.diff(run_id_before, run_id_after)
# Returns a list of per-step differences in output, cost, tokens, and carbon

This is particularly useful for regression testing: run a workflow before and after a change, then confirm the outputs changed only where expected.

6. GDPR Anonymization

Under GDPR's right to erasure, you may need to remove personally identifiable information (PII) from audit records without breaking the chain. MeshFlow provides anonymize_run():

ledger.anonymize_run(run_id)
# Overwrites PII fields with [REDACTED] markers
# Recomputes the chain hashes so verify_chain still passes

Important: anonymization is a destructive operation. The original PII cannot be recovered after anonymization. Run it only when legally required.

7. Tamper Detection In Practice

When you call verify_chain, the ledger walks every step in the run and recomputes each hash. If it finds a mismatch it raises an exception identifying the tampered step:

TamperDetectedError: step 4 hash mismatch
  expected: a3f2b1...
  found:    d7e9c3...

In production, run verify_chain on a schedule (for example, after every batch completes) so you detect tampering quickly. Store the chain hash externally (for example, in a read-only object store) for an additional independent verification point.

8. Hands-On Lab

Run the ledger audit demo:

MESHFLOW_MOCK=1 python3 hands_on/11_ledger_audit.py

Observe:

  • How many runs are listed by ledger.list_runs()
  • The step count and cost summary from run_summary
  • The verification result from verify_chain
  • The JSON export structure from export_run
  • What the diff shows between two runs

Then open one of the audit JSON files in the repository root:

cat audit_run_480c.json | python3 -m json.tool | head -40

Identify the step_records array and read the hash field on each record.

9. Summary

The ReplayLedger records every workflow step with a SHA-256 hash chain that makes tampering visible. You can read, verify, export, diff, time-travel, fork, and anonymize runs. In regulated environments, the ledger is not optional — it is the proof that the system did what it claims to have done.

Key operations:

  • get_run → read all steps
  • verify_chain → detect tampering
  • load_state → reconstruct context at any step
  • fork → branch from a past state
  • diff → compare two runs
  • anonymize_run → GDPR-compliant redaction

Exercises

Exercises

Exercise 1: Run the Script and Read the Output

Goal: Familiarize yourself with the full output of the ledger audit hands-on script.

Instructions:

  1. Open a terminal and navigate to the meshflow_tutorial project root.
  2. Run the hands-on script:
   python hands_on/11_ledger_audit.py
  1. Read through every line of output carefully. The script creates several agent runs, writes them to the ledger, and then calls various ledger API methods.
  2. Answer the following questions in a short notepad or comment block:

- How many runs were created during the script execution? - What was the SHA-256 hash of the first step in the first run? (Look for a field like hash or step_hash in the printed output.) - Did verify_chain() return True or False on the first check? - Which run ID was used for the time-travel load_state demo?

Expected output: The script should print a chain-verification result of True, a run summary table, and at least one exported JSON block. No Python tracebacks should appear.


Exercise 2: Open an Audit JSON File and Read the Hash Field

Goal: Understand the raw structure of a ledger record on disk.

Instructions:

  1. After running the script from Exercise 1, locate the exported audit JSON file. The script writes it to a path printed in the output (look for a line like Exported run to: ...).
  2. Open the file in any text editor or with python -m json.tool <filename> for pretty-printing.
  3. Find and record:

- The top-level run_id field. - The steps array. How many steps are present? - The hash field on the first step. This is the SHA-256 of (previous_hash + step_payload). - The hash field on the second step. Notice that it incorporates the first step's hash.

  1. Manually verify the chain by copying the first step's hash and confirming it appears embedded in the data used to compute the second step's hash (the tutorial explains the exact concatenation formula).

Expected output: A clear view of the nested hash values and a conceptual understanding that each hash depends on all prior hashes.


Exercise 3: Call verify_chain and Then Manually Edit the DB to Test Tamper Detection

Goal: Experience how hash chaining detects tampering.

Instructions:

  1. Run the script and note a run_id that verify_chain() confirms as valid (True).
  2. Locate the ledger SQLite database file (the script prints its path, or check ~/.meshflow/ledger.db by default).
  3. Open the database with the SQLite CLI:
   sqlite3 ~/.meshflow/ledger.db
  1. Inspect the steps table:
   SELECT * FROM steps LIMIT 5;
  1. Pick any step and update a field — for example, change the output column of step 1 for your chosen run:
   UPDATE steps SET output = '{"tampered": true}' WHERE run_id = '<your_run_id>' AND step_index = 1;
  1. Exit SQLite (.quit) and run verify_chain(run_id) again in a Python script or REPL:
   from meshflow.ledger import ReplayLedger
   ledger = ReplayLedger()
   print(ledger.verify_chain("<your_run_id>"))
  1. Confirm the result is now False. Record which step index is reported as the first tampered step.

Expected output: verify_chain returns False and identifies step 1 (or the step you edited) as the integrity violation.

Clean-up: Restore the original value or re-run the hands-on script to generate a fresh run.


Exercise 4: Compare Two Runs with diff

Goal: Use diff to understand how two runs diverged.

Instructions:

  1. Run 11_ledger_audit.py twice (or find two existing runs in your ledger with list_runs()).
  2. Record the run_id values of both runs. They should be runs of the same workflow but may have different inputs or outputs.
  3. In a Python REPL or short script, call:
   from meshflow.ledger import ReplayLedger
   ledger = ReplayLedger()
   runs = ledger.list_runs(limit=5)
   run_a = runs[0]["run_id"]
   run_b = runs[1]["run_id"]
   delta = ledger.diff(run_a, run_b)
   print(delta)
  1. Examine the diff output. Identify:

- Which steps are present in run A but not run B (or vice versa). - Which steps have the same name but different outputs. - Any changes in timing or token counts.

  1. Write two to three sentences explaining what the diff tells you about how the workflow behaved differently between the two runs.

Expected output: A structured diff object (dict or dataclass) showing added, removed, and changed steps between the two runs.


Exercise 5: Anonymize a Run and Verify the Chain Still Passes

Goal: Confirm that GDPR anonymization does not break ledger integrity.

Instructions:

  1. Pick a run ID from your ledger (use list_runs() to find one).
  2. Call anonymize_run:
   from meshflow.ledger import ReplayLedger
   ledger = ReplayLedger()
   run_id = "<your_run_id>"
   ledger.anonymize_run(run_id)
  1. Immediately call verify_chain on the same run:
   result = ledger.verify_chain(run_id)
   print("Chain valid after anonymization:", result)
  1. Inspect the exported JSON with export_run(run_id) to confirm that PII fields (names, emails, IP addresses, or any fields marked as personal data) have been replaced with placeholder values (e.g., "[REDACTED]" or null).
  2. Try to call verify_chain once more after you manually edit a non-anonymized field (repeat the tamper test from Exercise 3) to confirm tampering is still detectable even after anonymization.

Expected output: verify_chain returns True after anonymization and False after the manual tamper. The export shows redacted PII fields while preserving structural fields like step_index, agent_id, and timestamp.

Reflection question: Why does anonymizing PII not invalidate the hash chain? Think about which fields are included in the hash computation versus which are treated as metadata.