Part 3 · Constraint Randomization · Intermediate

Silent Failures & Defensive Idioms

Null nested handles, leftover rand_mode, randc state surprises, the randomize_or_fatal wrapper, and post_randomize validity checks.

Failures that don't fail

The worst randomization bugs return 1. The solver did exactly what the language requires — but what the language requires is not what you assumed. Three rules of the LRM produce most silent failures: a null nested handle is silently skipped (not an error), rand_mode(0) persists on the object until explicitly re-enabled (not per-call), and randc cycles are per-object state that interacts with constraints in ways that surprise after a state change. Each one yields a healthy return value and a quietly wrong transaction.


Silent failure 1: the null nested handle

systemverilog
class payload;
  rand bit [7:0] data[];
  constraint sz_c { data.size() inside {[1:64]}; }
endclass

class frame;
  rand bit [7:0] hdr;
  rand payload   pl;          // rand object handle

  function new();
    // BUG: forgot  pl = new();
  endfunction
endclass

frame f = new();
assert(f.randomize());        // PASSES. Returns 1.
// hdr is randomized; pl is null and the solver SKIPPED it silently.
// The LRM says: randomize() recursively randomizes rand sub-objects
// THAT ARE NON-NULL. Null handles are not an error - they are simply
// not part of the problem.
// Downstream: null deref crash in the driver, 500ns and three
// components away from the actual cause.

The fix is two-layered. Construct nested rand objects in the parent's constructor so the default state is always solvable, and add a validity check (last section) that fails loudly if a required handle is null at randomize time. Note the same skip rule applies to elements of an array of handles: any null element is ignored while non-null elements randomize — a partially-constructed array gives you a partially-randomized scenario with no warning.


Silent failure 2: rand_mode left off

systemverilog
class txn;
  rand bit [31:0] addr;
  rand bit [7:0]  len;
endclass

txn t = new();

// Phase 1 of the test: lock addr while tuning len behavior
t.addr = 32'h4000_0000;
t.addr.rand_mode(0);              // addr is now a STATE variable
repeat (100) assert(t.randomize());

// ... 800 lines later, phase 2 begins ...
// Author believes addr is random again. It is not: rand_mode is
// sticky per-object until re-enabled.
repeat (1000) assert(t.randomize());   // returns 1 every time
// addr == 32'h4000_0000 for all 1000 transactions.
// Coverage on addr quietly flatlines; no error anywhere.

// FIX at phase boundary:
t.addr.rand_mode(1);
// DEFENSIVE: audit before a critical phase
if (t.addr.rand_mode() == 0)
  $warning("addr rand_mode is OFF entering phase 2");

Two properties make this lethal: rand_mode is per-object (a fresh object is unaffected, so the bug hides in reused objects), and a rand_mode(0) variable still participates in constraints as a state variable — so constraints referencing addr still hold, masking the symptom further. The audit idiom — calling rand_mode() with no argument as a getter at phase boundaries — costs one line and catches the whole class of leftovers.


Silent failure 3: randc surprises after state change

systemverilog
class arb_req;
  randc bit [2:0] grant;          // cycle through 0..7 before repeating
  bit [7:0]       active_mask;    // state: which requesters exist
  constraint legal_c { active_mask[grant] == 1'b1; }
endclass

arb_req r = new();
r.active_mask = 8'b0000_1111;     // requesters 0..3
repeat (4) assert(r.randomize()); // grants 0..3 in some permutation - OK

r.active_mask = 8'b1111_0000;     // reconfigure: requesters 4..7
assert(r.randomize());
// SURPRISE ZONE: the randc cycle was mid-permutation over the OLD
// legal set. Behavior now is a fresh draw from the NEW legal set,
// but the perceived "cycle through everything before repeating"
// guarantee resets in ways users rarely predict; with tighter
// constraints randc can even make randomize FAIL where rand would
// not, because the remaining cycle values are all illegal.
// RULES OF THUMB:
//  - randc cycle state is per-object; new() gives a fresh cycle.
//  - constraints filter the cycle; changing state mid-cycle
//    invalidates your mental model of "what's left".
//  - never rely on randc ordering across a constraint/state change;
//    if exact permutation semantics matter, build an explicit
//    shuffled queue instead.

randc is a convenience for cycling coverage, not a guaranteed scheduling primitive. The defensive position: treat any change to state variables referenced by constraints on a randc field as invalidating the cycle, and if the test logic depends on exact exhaustive-before-repeat behavior, implement it explicitly with a queue and shuffle() where the semantics are in your hands and visible in the code.


Defensive idioms: the wrapper and the validity check

systemverilog
// IDIOM 1: randomize_or_fatal - never call randomize bare again
class base_txn;
  rand bit [31:0] addr;
  rand payload    pl;

  function void pre_randomize();
    // Guard the null-handle skip rule
    if (pl == null)
      $fatal(1, "%m: pl handle is null at randomize time");
  endfunction

  // IDIOM 2: post_randomize validity check - catch impossible
  // output even when the solver said yes (wrong constraints,
  // disabled blocks, rand_mode leftovers all surface here).
  function void post_randomize();
    if (addr[1:0] != 2'b00)
      $fatal(1, "%m: unaligned addr %h escaped constraints", addr);
  endfunction
endclass

// Wrapper lives in the bench utility package:
task automatic randomize_or_fatal(base_txn t, string who = "?");
  if (!t.randomize()) begin
    $display("---- RANDOMIZE FAILURE (%s) ----", who);
    $display("object state dump: %p", t);          // full field dump
    $display("addr rand_mode=%0d", t.addr.rand_mode());
    $fatal(1, "randomize failed for %s", who);
  end
endtask

// Usage everywhere:
//   randomize_or_fatal(txn, "seq body item 12");
// On failure you get: who called, the full object state (the inputs
// to the solver!), and rand_mode status - the first three things
// you would have asked for anyway.

The %p object dump is the underrated piece: a randomize failure is a function of the object's current state-variable values, so dumping the object at failure time captures the solver's inputs exactly. Combined with the seed, that is a complete repro kit in the log. The post_randomize validity check is the safety net for the entire silent-failure family — it asserts properties of the output , so it catches disabled constraints, rand_mode leftovers, and constraint bugs alike, regardless of which mechanism caused them.

Interview angle

Silent-failure questions probe LRM literacy: “randomize returns 1 but the field didn't change — name three causes.” Answer: null nested handle (skipped per LRM, not an error), rand_mode(0) left set from earlier (sticky per object, variable becomes state), and the field was never rand (or a randc cycle/state interaction). Follow with the defenses unprompted — construct nested objects in new(), audit rand_mode at phase boundaries, wrap randomize in a fatal-with-dump helper, and assert output validity in post_randomize. Interviewers read the defensive idioms as evidence you have been burned in production, which is exactly the experience they are hiring for.

Key takeaways

  • Null rand handles are silently skipped by randomize() — construct sub-objects in new() and guard in pre_randomize.

  • rand_mode(0) is sticky per object; the variable becomes a state variable until rand_mode(1).

  • randc cycle state is per-object and interacts unpredictably with mid-cycle constraint/state changes.

  • A randomize_or_fatal wrapper with a %p object dump turns every failure into a self-contained repro report.

  • post_randomize validity checks catch wrong output regardless of which silent mechanism produced it.

Common pitfalls

  • Forgetting pl = new() in the parent constructor — solver skips the null, driver crashes much later.

  • Treating rand_mode(0) as per-call — it persists across every later randomize on that object.

  • Relying on randc exhaustive-before-repeat semantics across a state change — use an explicit shuffled queue.

  • Wrapper that prints only 'randomize failed' — without the %p state dump the report is useless.

  • Putting validity checks in the test instead of post_randomize — only one call site is protected.