Part 4 · TLM & Analysis · Intermediate

Triage Playbook: Step-by-Step Missing Transaction Debug

A deterministic playbook for diagnosing missing transactions from source generation through TLM transport and scoreboard consumption.

Playbook philosophy

Missing transaction debug should be hypothesis-driven and stage-isolated. The fastest path is to prove or disprove each boundary with one deterministic sample.

Use this playbook whenever expected traffic does not reach a checker, especially in monitor -> analysis -> scoreboard flows.

diagram

[UVM][TLM] boundary model

Stage 1: generation      (sequence/driver produced item?)
Stage 2: observation     (monitor captured item?)
Stage 3: publication     (analysis write executed?)
Stage 4: transport       (connections + types valid?)
Stage 5: consumption     (scoreboard write invoked?)
Stage 6: checking        (item used by checker path?)

first failing stage = root-cause zone

diagram

[UVM] success criteria for triage run

- single deterministic seed
- one or few known transactions
- fixed verbosity profile
- connection map snapshot
- stage counters enabled

Diagnose one boundary at a time instead of scanning full logs blindly.
Start with deterministic minimal stimulus before scaling to random stress.
Define root-cause zone as first stage that violates expected evidence.

Step-by-step execution recipe

This sequence is intentionally strict. Skipping steps usually causes circular debug where teams blame multiple components without proof.

diagram

[TLM] triage recipe

Step 0: isolate
  - run one seed, one testcase, minimal traffic

Step 1: structural proof
  - print_connections for all relevant endpoints
  - verify mandatory links/fanout

Step 2: source proof
  - log transaction creation at sequence/driver/monitor

Step 3: transport proof
  - log publish + consume IDs
  - verify type names and cast outcomes

Step 4: sink/check proof
  - confirm scoreboard write and checker invocation

Step 5: summarize
  - report first mismatch stage and candidate fixes

systemverilog

task run_tlm_triage();
  // Step 1: structural
  env.audit_connections("triage");

  // Step 2/3/4: enable focused traces
  env.mon.set_report_id_verbosity("TLM_FLOW", UVM_HIGH);
  env.sb.set_report_id_verbosity("TLM_FLOW", UVM_HIGH);
  env.sb.set_report_id_verbosity("TLM_TYPE", UVM_HIGH);

  // Execute minimal scenario
  run_single_txn_scenario();
endtask

systemverilog

function void report_phase(uvm_phase phase);
  if (counters.mon_published != counters.sb_received)
    `uvm_error("TLM_TRIAGE", "loss between publish and receive")
  else if (counters.sb_received != counters.sb_checked)
    `uvm_error("TLM_TRIAGE", "loss between receive and check")
  else
    `uvm_info("TLM_TRIAGE", "pipeline intact", UVM_LOW)
endfunction

Keep triage task scripted so every failure is investigated consistently.
Use the same report IDs in all components to align evidence.
Separate transport loss from checker-logic loss with counters.

Decision tree for common outcomes

Map observations to likely causes quickly using a decision tree. This reduces repeated dead-end experiments.

diagram

Legend: [UVM] [TLM] [SB]

[UVM][TLM] missing transaction decision tree

Q1: monitor captured expected transaction?
  no  -> generation/monitor bug (sequence, driver, protocol decode)
  yes -> Q2

Q2: monitor.ap fanout >= required?
  no  -> unconnected-port issue (connect_phase/audit)
  yes -> Q3

Q3: sink write() invoked?
  no  -> type mismatch or wrong endpoint binding
  yes -> Q4

Q4: checker consumed transaction?
  no  -> scoreboard queue/routing logic bug
  yes -> compare policy/model mismatch

diagram

[UVM] evidence-to-cause matrix

evidence                                      probable cause
-----------------------------------------------------------------------
mon logs present, sb logs absent              analysis path broken/type mismatch
sb write called, check never called           queue routing bug in scoreboard
cast warnings present                         transaction type contract mismatch
audit fails before run                        structural connect bug
all traces present, compare fails             model/reference mismatch

systemverilog

function void classify_failure();
  if (!obs.monitor_seen)
    `uvm_error("TLM_CLASS", "stage=observation")
  else if (!obs.path_connected)
    `uvm_error("TLM_CLASS", "stage=transport-connection")
  else if (!obs.sink_seen)
    `uvm_error("TLM_CLASS", "stage=transport-type")
  else if (!obs.check_seen)
    `uvm_error("TLM_CLASS", "stage=checker-routing")
  else
    `uvm_error("TLM_CLASS", "stage=model-compare")
endfunction

Use decision-tree checkpoints to prevent random debugging order.
Classify failures by first missing evidence, not by intuition.
Turn classification into report IDs for regression analytics.

Operationalizing the playbook in regressions

A playbook only helps when embedded into daily flow. Encode it into env utilities, smoke tests, and CI parsing rules.

diagram

[UVM][TLM] regression integration plan

smoke suite:
  - always run audit_connections
  - always print stage counters

debug suite:
  - enable TLM_FLOW/TLM_TYPE high verbosity
  - run deterministic single-transaction tests

CI:
  - fail on TLM_AUDIT/TLM_DROP/TLM_CLASS errors
  - archive connection maps and counter summaries

diagram

[UVM] team workflow suggestions

on first failure:
  1) run playbook test locally
  2) attach stage-classification logs to issue
  3) link fix to specific boundary stage

on fix review:
  - verify added/updated audit checks
  - verify no new silent-drop paths

systemverilog

// Example summary emitted at end of triage test
`uvm_info("TLM_PLAYBOOK_SUMMARY",
  $sformatf("first_fail_stage=%s published=%0d received=%0d checked=%0d",
    first_fail_stage, cnt_pub, cnt_rcv, cnt_chk),
  UVM_LOW)

Key takeaways

A deterministic triage playbook turns missing-transaction debug into a repeatable process.
Always identify the first failing boundary stage before proposing fixes.
Decision-tree classification accelerates team-wide root-cause alignment.
Embedding the playbook into CI prevents repeated regressions.

Common pitfalls

Running high-random stress first instead of minimal deterministic reproduction.
Collecting logs without stage counters or connection snapshots.
Fixing suspected causes without proving first failing boundary.
Treating playbook as optional documentation instead of executable workflow.

Practice this lesson