Part 3 · Constraint Randomization · Intermediate
The Randomization Debug Checklist
The consolidated step-by-step checklist — the interview-ready answer — plus a fully annotated debugging session on a realistic multi-constraint failure.
The consolidated checklist
Everything from the previous five lessons compresses into one ordered procedure. This is the literal answer to the interview question “randomize() fails — what do you do?”, and it is also the procedure to actually run at your desk. The ordering matters: each step either resolves the problem or narrows the next step's search space.
Confirm the failure is visible: is the return value checked? If not, add assert/fatal wrapper first — the 'failure' you were told about may be a stale-value symptom of an earlier unchecked failure.
Classify the symptom: return 0 (contradiction) vs return 1 with wrong values (skip / skew) vs intermittent (seed). The branches share no tools.
Lock the repro: capture the seed, replay the exact (test, seed, build) triple, confirm bit-identical failure. No constraint debugging on a moving target.
Dump the solver's inputs: print the object with %p at the failure site — state variables, current rand values, and (via getters) rand_mode/constraint_mode status.
For return 0 — isolate: binary-search constraint blocks with constraint_mode(0); test suspect value sets with randomize(null); remember inline 'with' is ANDed and state variables are constants.
For return 0 — corner it: enable the vendor solver-debug dump on the reduced case; extract the UNSAT core; build a 20-line standalone repro that fails identically.
For wrong values — check the silent trio: null nested handles, rand_mode leftovers, fields that were never rand; then histogram the distribution over 10k calls before blaming the solver.
For skew — apply solver semantics: count solutions per branch (implication skew), check solve...before ordering, look for eaten soft constraints and width/sign truncation.
Fix at the right layer: state bug -> fix the test; contradiction between legality and test intent -> soft/disable/subclass, never delete the legality constraint; tool bug (rare) -> you already have the repro to file.
Harden: leave behind a randomize_or_fatal wrapper, a post_randomize validity check, and a covergroup on the affected rand fields so this class of failure can never be silent again.
THE 10-STEP FLOW AT A GLANCE
1 visible? -> 2 classify -> 3 lock seed -> 4 dump %p
|
+--------------+-----------------+
| | |
return 0 wrong values intermittent
| | |
5 isolate 7 silent trio (3 again:
(constraint_ null / rand_mode seed capture
mode binary / not-rand + replay,
search, then HISTOGRAM stability
randomize(null)) | rules)
| 8 semantics
6 UNSAT core (count solutions,
+ 20-line repro solve-before,
| soft, width)
+------+-------+
|
9 fix at the right layer
|
10 harden (wrapper, post_randomize
check, covergroup) - exit criteriaThe scenario for the worked session
A realistic compound failure, of the kind that actually burns an afternoon: a DMA descriptor class assembled from a base class, a project mixin, and a test-level inline constraint — and a randomize() that fails only in one test, only sometimes.
class dma_desc;
rand bit [31:0] src_addr, dst_addr;
rand bit [15:0] nbytes;
rand bit [1:0] burst; // 0:1-beat 1:4-beat 2:8-beat 3:rsvd
rand bit ring_wrap;
bit [15:0] ring_bytes_left; // STATE: set by sequence
constraint legal_burst_c { burst != 2'd3; }
constraint nbytes_c { nbytes inside {[1:16'd8192]}; }
constraint align_c { burst == 2'd1 -> src_addr[3:0] == 0;
burst == 2'd2 -> src_addr[4:0] == 0; }
constraint ring_c { ring_wrap -> nbytes <= ring_bytes_left; }
endclass
// In the failing test sequence:
// desc.ring_wrap defaults random; ring_bytes_left set per iteration
// assert(desc.randomize() with { nbytes >= 16'd4096; burst == 2'd2; });
// Symptom: fails on ~3% of regression seeds, always mid-test.Annotated debugging session
STEP 1-2 Return value was checked (assert) - good, failure is
real and immediate. Classify: returns 0 -> contradiction
branch. Intermittent across seeds -> ALSO a state/seed
dependency. Suspicion: a state variable makes it UNSAT
only sometimes.
STEP 3 Grep regression DB: fails on seeds churning the ring.
Replay: simv +ntb_random_seed=8841 -> fails at the same
iteration, every time. Repro locked.
STEP 4 Add at failure site:
$display("DESC %p", desc);
Replay prints (abridged):
ring_wrap:0(prev) nbytes:7300(prev) burst:1(prev)
ring_bytes_left:1492 <- STATE
Note: rand fields show PREVIOUS values (failed call
assigns nothing). The state value is the live clue:
ring_bytes_left = 1492.
STEP 5 Hand-solve before touching the simulator:
inline: nbytes >= 4096 (hard)
ring_c: ring_wrap -> nbytes <= 1492
If solver may pick ring_wrap=0, ring_c is vacuous and
all is well... so why fail? Re-read class: AH - the
SEQUENCE pinned it three lines up:
desc.ring_wrap = 1; desc.ring_wrap.rand_mode(0);
(left over from the wrap-stress phase - silent trio!)
Now ring_wrap is STATE = 1, so:
nbytes >= 4096 AND nbytes <= 1492 -> UNSAT.
Confirm by experiment, not faith:
desc.ring_c.constraint_mode(0);
assert(desc.randomize() with { ... }); // SUCCEEDS
desc.ring_c.constraint_mode(1); // restore
Also: desc.randomize(null) with planted nbytes=4096,
ring_wrap=1 -> returns 0. Core confirmed: ring_c +
inline nbytes + pinned ring_wrap + small ring_bytes_left.
STEP 6 20-line repro written (3 constraints, 1 state var,
1 inline) - fails on every seed. Kept for the ticket.
STEP 9 Fix at the right layer: the BUG is the leftover
rand_mode(0) from the previous phase, not ring_c
(legality) and not the inline (test intent). Fix:
desc.ring_wrap.rand_mode(1); // at phase boundary
Also taught the sequence to skip wrap descriptors when
ring_bytes_left < requested minimum - the 3% seeds were
exactly those that drained the ring before this point.
STEP 10 Hardening left behind:
- randomize_or_fatal(desc, "dma seq iter N") with %p dump
- post_randomize: if (ring_wrap && nbytes > ring_bytes_left)
$fatal - solver output cross-check
- phase-boundary audit: ring_wrap.rand_mode() getter check
- covergroup bin on ring_wrap so a future flatline shows
up in coverage, not in a 3am regression triage.Notice how the compound failure dissolved into two lessons' worth of primitives: a silent-trio leftover (rand_mode) converted a soluble system into a state-dependent contradiction, and seed-dependence was just the ring level crossing a threshold on some seeds. Neither half was exotic; the discipline was running the steps in order instead of guessing. Also note step 4's subtlety — the %p dump shows previous rand values after a failed call, and knowing that prevented a wild-goose chase after nbytes:7300.
Delivering this in an interview
When asked the question live, do not recite ten numbered steps like a poem. Compress to the four-beat structure: verify and classify (return value checked? 0 vs wrong-values vs intermittent), lock the repro (seed replay, %p state dump), isolate with the right tool (constraint_mode binary search and randomize(null) for UNSAT; silent trio then histogram for wrong values), and fix at the right layer and harden (never delete legality constraints; leave a wrapper, a post_randomize check, and coverage behind). Then offer one concrete war story — the session above is a perfect template — because a specific debugged failure with real mechanics is worth more than any amount of enumerated theory.
Interview angle
This checklist is the interview answer, so the angle here is delivery: lead with classification (it proves you know the failure modes are distinct), name the exact tools (constraint_mode(0) binary search, randomize(null), solver-debug dump, %p dump, seed replay, srandom) because vague “I'd debug the constraints” answers fail, and close with hardening — interviewers specifically listen for whether your process ends at “it works now” or at “this failure class can no longer hide.” If pressed for time, the one-sentence version: classify the failure, lock the seed, shrink the constraint set until the contradiction is obvious, fix the right layer, and leave defenses behind.
Key takeaways
The ordered checklist: visible -> classify -> lock seed -> dump state -> isolate -> repro -> fix right layer -> harden.
After a failed randomize, %p shows previous rand values but live state values — the state is the clue.
Compound failures decompose into the primitive modes; run the steps instead of guessing.
Fix the actual layer: state bugs in the test, intent conflicts via soft/disable/subclass, never by deleting legality.
Exit criteria is hardening — wrapper, post_randomize check, coverage — not just a passing rerun.
Common pitfalls
Skipping classification and diving into constraints — wrong-values and UNSAT problems share no tools.
Debugging an intermittent failure without locking the seed — the target moves under you.
Misreading the %p dump after failure — rand fields show stale values; only state values are current.
Declaring victory when the rerun passes — without hardening the same class of failure returns silently.
Reciting tools in an interview without the classification framework — sounds memorized, not practiced.