Part 7 · Environment & Tests · Intermediate

End-of-Test Debug Hangs: Symptom-First Triage Workflow

Debug stuck simulations with a deterministic triage sequence: objection trace, queue health, handshake counters, readiness loops, and timeout evidence.

Symptom map for end-of-test failures

Most hangs are diagnosable quickly if you map symptoms to closure boundaries. Start with control-plane evidence before deep waveform archaeology.

diagram
[UVM][ENV] symptom map

hang with no timeout:
  timeout not configured or set too high

timeout with active objections:
  missing drop path

timeout with zero objections:
  readiness loop or stuck non-objection thread

instant pass:
  objection never raised
diagram
[TEST] boundary-first triage

Boundary 1: objection accounting
Boundary 2: readiness/drain behavior
Boundary 3: checker queues/counters
Boundary 4: protocol deadlock in traffic path
  • Use objective counters to locate first broken boundary.

  • Do not start with random waveform browsing.

  • Preserve failing seed and exact plusargs before changes.


Instrumentation and quick probes

systemverilog
task eot_hang_probe(my_env env);
  `uvm_info("HANG_PROBE",
    $sformatf("obj=%0d issued=%0d done=%0d mon=%0d cmp=%0d exp_q=%0d act_q=%0d",
      uvm_run_phase::get().get_objection().get_objection_total(),
      env.drv.issued_cnt,
      env.drv.done_cnt,
      env.mon.publish_cnt,
      env.sb.compare_cnt,
      env.sb.pending_expected(),
      env.sb.pending_actual()),
    UVM_LOW)
endtask
systemverilog
task periodic_probe();
  forever begin
    #50000ns;
    eot_hang_probe(env);
  end
endtask

task run_phase(uvm_phase phase);
  fork
    periodic_probe();
    run_main_scenario();
  join_none
  // normal objection handling omitted for brevity
endtask
bash
simv +UVM_TESTNAME=hang_seed        +UVM_OBJECTION_TRACE        +UVM_TIMEOUT=2000000,YES        +UVM_VERBOSITY=UVM_LOW
diagram
[ENV] probe interpretation

issued grows, done static:
  driver blocked waiting DUT readiness

done grows, mon static:
  monitor sampling/gating issue

mon grows, compare static:
  analysis wiring or scoreboard ingest issue

queues never drain:
  matcher deadlock or missing expected stream
  • Probe the full sequence->driver->monitor->checker chain.

  • Compare delta between probes, not just absolute counts.

  • Keep probe verbosity low and always-on for failing reruns.


Deterministic hang triage playbook

Ordered triage steps

  1. Confirm timeout configuration and trigger point.

  2. Enable objection trace and identify last unmatched raise.

  3. Capture periodic boundary counters at fixed interval.

  4. Check readiness hooks for bounded exit and timeout logs.

  5. Inspect scoreboard queue pairing logic for starvation.

  6. Only then move to protocol waveform for deadlock root cause.

diagram
[UVM][ENV] fast decision tree

if objection_total > 0 at timeout:
  objection leak class
else if queues non-empty forever:
  checker convergence class
else if driver stalled:
  DUT/protocol deadlock class
else:
  readiness or phase control bug
diagram
[TEST] closure debug exit criteria

triage is complete when:
  root boundary identified
  minimal reproducer seed captured
  owning component and fix plan assigned

Key takeaways

  • A fixed triage order reduces mean-time-to-root-cause for hangs.

  • Counters and objection traces localize issues before waveform deep dives.

  • Most hangs are closure-control bugs, not random simulation instability.

  • Keep reproducible probes in base infrastructure for future failures.

Common pitfalls

  • Changing many knobs before preserving baseline failing evidence.

  • Assuming DUT deadlock without validating objection/readiness state.

  • No periodic probes, forcing blind post-timeout reasoning.

  • Closing ticket after timeout increase instead of root-cause fix.