Part 11 · Senior Prep · Intermediate
Night Triage Interview Q&A
Model answers on nightly failure bucketing, seed reproduction, log mining, owner assignment, and escalation when closure is at risk.
Triage process questions
Q: Regression red at 2 AM — triage process?
[INT][SENIOR][UVM] MODEL ANSWER
Q: Nightly red — triage?
A:
MECHANISM: Classify failure bucket: hang, mismatch, config, stimulus, predict.
MOTIVATION: Random scrolling logs wastes farm time and human attention.
PROCESS: 1) Identify failing test+seed from manifest.
2) Bucket symptom (objection / UVM_ERROR / config fatal).
3) Repro locally with same seed+plusargs.
4) Assign owner by component (RAL, scb, agent, test).
5) Block closure-sensitive merges until root cause known.
PITFALL: Re-running entire regression to 'see if green' without repro.
EXAMPLE: axi_rand fails seed 4099 — objection leak in new vseq fork.Q: Hang vs mismatch — first diagnostic move for each?
[INT][SENIOR][UVM] MODEL ANSWER
Q: Hang vs mismatch first move?
A:
HANG: display_objections() → driver get_next_item/item_done → vif check.
MISMATCH: Identify first UVM_ERROR timestamp → scoreboard predict path →
compare expected/actual item fields → seed stability rerun.
MOTIVATION: Different buckets — different tools; mixing them wastes time.
PITFALL: Opening waves on hang before objection trace.
EXAMPLE: Hang: vseq fork never joined. Mismatch: byte order wrong on wide beat.// hang triage — interviewers expect you to name these
phase.phase_done.display_objections();
// mismatch triage — enable targeted compare
`uvm_info("SCB", $sformatf("exp=%0p act=%0p", exp, act), UVM_LOW)Key takeaways
Night triage: bucket → repro seed → assign owner — not rerun farm blindly.
Hang = objections first; mismatch = first error timestamp + field compare.
Block risky merges until P0 failure root-caused.
Common pitfalls
Assigning triage to one person — seniors distribute buckets fast.
Rerunning without same seed — cannot confirm fix.
Log mining and escalation
Q: How do you mine a 50 GB nightly log without drowning?
[INT][SENIOR][UVM] MODEL ANSWER
Q: Large log triage?
A:
MECHANISM: Manifest gives test+seed+exit code; grep UVM_ERROR/FATAL first;
sim.log tail for objection summary; bisect if multi-commit window.
MOTIVATION: Full log read is impossible at scale — structured extraction first.
COMMANDS: grep -E "UVM_ERROR|UVM_FATAL|FATAL" sim.log | head
grep "display_objections" -A20 sim.log
PITFALL: Starting waveform for every failure — reserve for repro confirmed.
EXAMPLE: 12 failures all same test — one root cause, not 12 independent bugs.Q: New failure after merge — bisect strategy?
[INT][SENIOR][UVM] MODEL ANSWER
Q: Bisect regression failure?
A:
MECHANISM: Identify last green SHA, binary search commits with same test+seed.
MOTIVATION: Nightly may span 20 commits — bisect finds culprit fast.
WHEN: Failure not obvious from manifest diff (new test vs new RTL).
PITFALL: Blaming random seed when failure is deterministic on that seed.
EXAMPLE: axi_rand seed 8821 fails from commit abc123 — bisect lands on scb fix.TEST=axi_rand_test
SEED=4099
LOG="triage_${TEST}_${SEED}.log"
simv +UVM_TESTNAME="${TEST}" +ntb_random_seed="${SEED}" \
+UVM_VERBOSITY=UVM_MEDIUM -l "${LOG}"
grep -E "UVM_ERROR|UVM_FATAL|objection" "${LOG}" | head -40
echo "TEST=${TEST} SEED=${SEED}" > "${LOG}.meta"Q: When do you escalate to leads during night triage?
[INT][SENIOR][UVM] MODEL ANSWER
Q: Escalation criteria?
A:
MECHANISM: Escalate when P0 test broken, closure DB regresses >5%, flake rate
spikes, or suspected RTL regression blocks morning commit window.
MOTIVATION: Seniors shield team sleep but not schedule risk.
ESCALATE: P0 smoke red, widespread multi-test failure, cov drop on merge.
DON'T: Single directed test fail with known owner and morning fix plan.
EXAMPLE: All DMA tests fail after RTL sync — escalate; one closure test — queue.[INT][SENIOR][UVM] triage owner routing
SYMPTOM OWNER FIRST TOOL
objection leak test/seq display_objections
scb mismatch scoreboard exp/act compare
RAL mirror fail RAL/predictor mirror.check + adapter
config fatal env/integration config_db print
driver stall agent/driver handshake probeKey takeaways
Structured log mining — manifest, grep errors, objection summary.
Bisect when culprit commit unclear — same test+seed throughout.
Escalate on P0 smoke or widespread red — not every single failure.
Common pitfalls
Telling PM 'all red' without bucket count — causes panic.
Closing triage without .meta repro file — morning handoff fails.