Part 3 · Constraint Randomization · Intermediate

The Randomization Debug Checklist

The consolidated step-by-step checklist — the interview-ready answer — plus a fully annotated debugging session on a realistic multi-constraint failure.

The consolidated checklist

Everything from the previous five lessons compresses into one ordered procedure. This is the literal answer to the interview question “randomize() fails — what do you do?”, and it is also the procedure to actually run at your desk. The ordering matters: each step either resolves the problem or narrows the next step's search space.

  1. Confirm the failure is visible: is the return value checked? If not, add assert/fatal wrapper first — the 'failure' you were told about may be a stale-value symptom of an earlier unchecked failure.

  2. Classify the symptom: return 0 (contradiction) vs return 1 with wrong values (skip / skew) vs intermittent (seed). The branches share no tools.

  3. Lock the repro: capture the seed, replay the exact (test, seed, build) triple, confirm bit-identical failure. No constraint debugging on a moving target.

  4. Dump the solver's inputs: print the object with %p at the failure site — state variables, current rand values, and (via getters) rand_mode/constraint_mode status.

  5. For return 0 — isolate: binary-search constraint blocks with constraint_mode(0); test suspect value sets with randomize(null); remember inline 'with' is ANDed and state variables are constants.

  6. For return 0 — corner it: enable the vendor solver-debug dump on the reduced case; extract the UNSAT core; build a 20-line standalone repro that fails identically.

  7. For wrong values — check the silent trio: null nested handles, rand_mode leftovers, fields that were never rand; then histogram the distribution over 10k calls before blaming the solver.

  8. For skew — apply solver semantics: count solutions per branch (implication skew), check solve...before ordering, look for eaten soft constraints and width/sign truncation.

  9. Fix at the right layer: state bug -> fix the test; contradiction between legality and test intent -> soft/disable/subclass, never delete the legality constraint; tool bug (rare) -> you already have the repro to file.

  10. Harden: leave behind a randomize_or_fatal wrapper, a post_randomize validity check, and a covergroup on the affected rand fields so this class of failure can never be silent again.

diagram
THE 10-STEP FLOW AT A GLANCE

  1 visible?  ->  2 classify  ->  3 lock seed  ->  4 dump %p
                       |
        +--------------+-----------------+
        |              |                 |
     return 0      wrong values     intermittent
        |              |                 |
   5 isolate      7 silent trio     (3 again:
   (constraint_     null / rand_mode   seed capture
    mode binary     / not-rand          + replay,
    search,         then HISTOGRAM      stability
    randomize(null))     |              rules)
        |              8 semantics
   6 UNSAT core        (count solutions,
   + 20-line repro      solve-before,
        |               soft, width)
        +------+-------+
               |
        9 fix at the right layer
               |
        10 harden (wrapper, post_randomize
           check, covergroup) - exit criteria

The scenario for the worked session

A realistic compound failure, of the kind that actually burns an afternoon: a DMA descriptor class assembled from a base class, a project mixin, and a test-level inline constraint — and a randomize() that fails only in one test, only sometimes.

systemverilog
class dma_desc;
  rand bit [31:0] src_addr, dst_addr;
  rand bit [15:0] nbytes;
  rand bit [1:0]  burst;          // 0:1-beat 1:4-beat 2:8-beat 3:rsvd
  rand bit        ring_wrap;
  bit  [15:0]     ring_bytes_left;     // STATE: set by sequence

  constraint legal_burst_c { burst != 2'd3; }
  constraint nbytes_c      { nbytes inside {[1:16'd8192]}; }
  constraint align_c       { burst == 2'd1 -> src_addr[3:0] == 0;
                             burst == 2'd2 -> src_addr[4:0] == 0; }
  constraint ring_c        { ring_wrap -> nbytes <= ring_bytes_left; }
endclass

// In the failing test sequence:
//   desc.ring_wrap defaults random; ring_bytes_left set per iteration
//   assert(desc.randomize() with { nbytes >= 16'd4096; burst == 2'd2; });
// Symptom: fails on ~3% of regression seeds, always mid-test.

Annotated debugging session

diagram
STEP 1-2  Return value was checked (assert) - good, failure is
          real and immediate. Classify: returns 0 -> contradiction
          branch. Intermittent across seeds -> ALSO a state/seed
          dependency. Suspicion: a state variable makes it UNSAT
          only sometimes.

STEP 3    Grep regression DB: fails on seeds churning the ring.
          Replay: simv +ntb_random_seed=8841 -> fails at the same
          iteration, every time. Repro locked.

STEP 4    Add at failure site:
            $display("DESC %p", desc);
          Replay prints (abridged):
            ring_wrap:0(prev) nbytes:7300(prev) burst:1(prev)
            ring_bytes_left:1492            <- STATE
          Note: rand fields show PREVIOUS values (failed call
          assigns nothing). The state value is the live clue:
          ring_bytes_left = 1492.

STEP 5    Hand-solve before touching the simulator:
            inline: nbytes >= 4096        (hard)
            ring_c: ring_wrap -> nbytes <= 1492
          If solver may pick ring_wrap=0, ring_c is vacuous and
          all is well... so why fail? Re-read class: AH - the
          SEQUENCE pinned it three lines up:
            desc.ring_wrap = 1; desc.ring_wrap.rand_mode(0);
          (left over from the wrap-stress phase - silent trio!)
          Now ring_wrap is STATE = 1, so:
            nbytes >= 4096  AND  nbytes <= 1492  -> UNSAT.
          Confirm by experiment, not faith:
            desc.ring_c.constraint_mode(0);
            assert(desc.randomize() with { ... });  // SUCCEEDS
            desc.ring_c.constraint_mode(1);         // restore
          Also: desc.randomize(null) with planted nbytes=4096,
          ring_wrap=1 -> returns 0. Core confirmed: ring_c +
          inline nbytes + pinned ring_wrap + small ring_bytes_left.

STEP 6    20-line repro written (3 constraints, 1 state var,
          1 inline) - fails on every seed. Kept for the ticket.

STEP 9    Fix at the right layer: the BUG is the leftover
          rand_mode(0) from the previous phase, not ring_c
          (legality) and not the inline (test intent). Fix:
            desc.ring_wrap.rand_mode(1);  // at phase boundary
          Also taught the sequence to skip wrap descriptors when
          ring_bytes_left < requested minimum - the 3% seeds were
          exactly those that drained the ring before this point.

STEP 10   Hardening left behind:
          - randomize_or_fatal(desc, "dma seq iter N") with %p dump
          - post_randomize: if (ring_wrap && nbytes > ring_bytes_left)
              $fatal - solver output cross-check
          - phase-boundary audit: ring_wrap.rand_mode() getter check
          - covergroup bin on ring_wrap so a future flatline shows
            up in coverage, not in a 3am regression triage.

Notice how the compound failure dissolved into two lessons' worth of primitives: a silent-trio leftover (rand_mode) converted a soluble system into a state-dependent contradiction, and seed-dependence was just the ring level crossing a threshold on some seeds. Neither half was exotic; the discipline was running the steps in order instead of guessing. Also note step 4's subtlety — the %p dump shows previous rand values after a failed call, and knowing that prevented a wild-goose chase after nbytes:7300.


Delivering this in an interview

When asked the question live, do not recite ten numbered steps like a poem. Compress to the four-beat structure: verify and classify (return value checked? 0 vs wrong-values vs intermittent), lock the repro (seed replay, %p state dump), isolate with the right tool (constraint_mode binary search and randomize(null) for UNSAT; silent trio then histogram for wrong values), and fix at the right layer and harden (never delete legality constraints; leave a wrapper, a post_randomize check, and coverage behind). Then offer one concrete war story — the session above is a perfect template — because a specific debugged failure with real mechanics is worth more than any amount of enumerated theory.

Interview angle

This checklist is the interview answer, so the angle here is delivery: lead with classification (it proves you know the failure modes are distinct), name the exact tools (constraint_mode(0) binary search, randomize(null), solver-debug dump, %p dump, seed replay, srandom) because vague “I'd debug the constraints” answers fail, and close with hardening — interviewers specifically listen for whether your process ends at “it works now” or at “this failure class can no longer hide.” If pressed for time, the one-sentence version: classify the failure, lock the seed, shrink the constraint set until the contradiction is obvious, fix the right layer, and leave defenses behind.

Key takeaways

  • The ordered checklist: visible -> classify -> lock seed -> dump state -> isolate -> repro -> fix right layer -> harden.

  • After a failed randomize, %p shows previous rand values but live state values — the state is the clue.

  • Compound failures decompose into the primitive modes; run the steps instead of guessing.

  • Fix the actual layer: state bugs in the test, intent conflicts via soft/disable/subclass, never by deleting legality.

  • Exit criteria is hardening — wrapper, post_randomize check, coverage — not just a passing rerun.

Common pitfalls

  • Skipping classification and diving into constraints — wrong-values and UNSAT problems share no tools.

  • Debugging an intermittent failure without locking the seed — the target moves under you.

  • Misreading the %p dump after failure — rand fields show stale values; only state values are current.

  • Declaring victory when the rerun passes — without hardening the same class of failure returns silently.

  • Reciting tools in an interview without the classification framework — sounds memorized, not practiced.