Part 3 · Constraint Randomization · Intermediate

Silent Failures & Defensive Idioms

Null nested handles, leftover rand_mode, randc state surprises, the randomize_or_fatal wrapper, and post_randomize validity checks.

Failures that don't fail

The worst randomization bugs return 1. The solver did exactly what the language requires — but what the language requires is not what you assumed. Three rules of the LRM produce most silent failures: a null nested handle is silently skipped (not an error), rand_mode(0) persists on the object until explicitly re-enabled (not per-call), and randc cycles are per-object state that interacts with constraints in ways that surprise after a state change. Each one yields a healthy return value and a quietly wrong transaction.

Silent failure 1: the null nested handle

systemverilog

class payload;
  rand bit [7:0] data[];
  constraint sz_c { data.size() inside {[1:64]}; }
endclass

class frame;
  rand bit [7:0] hdr;
  rand payload   pl;          // rand object handle

  function new();
    // BUG: forgot  pl = new();
  endfunction
endclass

frame f = new();
assert(f.randomize());        // PASSES. Returns 1.
// hdr is randomized; pl is null and the solver SKIPPED it silently.
// The LRM says: randomize() recursively randomizes rand sub-objects
// THAT ARE NON-NULL. Null handles are not an error - they are simply
// not part of the problem.
// Downstream: null deref crash in the driver, 500ns and three
// components away from the actual cause.

The fix is two-layered. Construct nested rand objects in the parent's constructor so the default state is always solvable, and add a validity check (last section) that fails loudly if a required handle is null at randomize time. Note the same skip rule applies to elements of an array of handles: any null element is ignored while non-null elements randomize — a partially-constructed array gives you a partially-randomized scenario with no warning.

Silent failure 2: rand_mode left off

systemverilog

class txn;
  rand bit [31:0] addr;
  rand bit [7:0]  len;
endclass

txn t = new();

// Phase 1 of the test: lock addr while tuning len behavior
t.addr = 32'h4000_0000;
t.addr.rand_mode(0);              // addr is now a STATE variable
repeat (100) assert(t.randomize());

// ... 800 lines later, phase 2 begins ...
// Author believes addr is random again. It is not: rand_mode is
// sticky per-object until re-enabled.
repeat (1000) assert(t.randomize());   // returns 1 every time
// addr == 32'h4000_0000 for all 1000 transactions.
// Coverage on addr quietly flatlines; no error anywhere.

// FIX at phase boundary:
t.addr.rand_mode(1);
// DEFENSIVE: audit before a critical phase
if (t.addr.rand_mode() == 0)
  $warning("addr rand_mode is OFF entering phase 2");

Two properties make this lethal: rand_mode is per-object (a fresh object is unaffected, so the bug hides in reused objects), and a rand_mode(0) variable still participates in constraints as a state variable — so constraints referencing addr still hold, masking the symptom further. The audit idiom — calling rand_mode() with no argument as a getter at phase boundaries — costs one line and catches the whole class of leftovers.

Silent failure 3: randc surprises after state change

systemverilog

class arb_req;
  randc bit [2:0] grant;          // cycle through 0..7 before repeating
  bit [7:0]       active_mask;    // state: which requesters exist
  constraint legal_c { active_mask[grant] == 1'b1; }
endclass

arb_req r = new();
r.active_mask = 8'b0000_1111;     // requesters 0..3
repeat (4) assert(r.randomize()); // grants 0..3 in some permutation - OK

r.active_mask = 8'b1111_0000;     // reconfigure: requesters 4..7
assert(r.randomize());
// SURPRISE ZONE: the randc cycle was mid-permutation over the OLD
// legal set. Behavior now is a fresh draw from the NEW legal set,
// but the perceived "cycle through everything before repeating"
// guarantee resets in ways users rarely predict; with tighter
// constraints randc can even make randomize FAIL where rand would
// not, because the remaining cycle values are all illegal.
// RULES OF THUMB:
//  - randc cycle state is per-object; new() gives a fresh cycle.
//  - constraints filter the cycle; changing state mid-cycle
//    invalidates your mental model of "what's left".
//  - never rely on randc ordering across a constraint/state change;
//    if exact permutation semantics matter, build an explicit
//    shuffled queue instead.

randc is a convenience for cycling coverage, not a guaranteed scheduling primitive. The defensive position: treat any change to state variables referenced by constraints on a randc field as invalidating the cycle, and if the test logic depends on exact exhaustive-before-repeat behavior, implement it explicitly with a queue and shuffle() where the semantics are in your hands and visible in the code.

Defensive idioms: the wrapper and the validity check

systemverilog

// IDIOM 1: randomize_or_fatal - never call randomize bare again
class base_txn;
  rand bit [31:0] addr;
  rand payload    pl;

  function void pre_randomize();
    // Guard the null-handle skip rule
    if (pl == null)
      $fatal(1, "%m: pl handle is null at randomize time");
  endfunction

  // IDIOM 2: post_randomize validity check - catch impossible
  // output even when the solver said yes (wrong constraints,
  // disabled blocks, rand_mode leftovers all surface here).
  function void post_randomize();
    if (addr[1:0] != 2'b00)
      $fatal(1, "%m: unaligned addr %h escaped constraints", addr);
  endfunction
endclass

// Wrapper lives in the bench utility package:
task automatic randomize_or_fatal(base_txn t, string who = "?");
  if (!t.randomize()) begin
    $display("---- RANDOMIZE FAILURE (%s) ----", who);
    $display("object state dump: %p", t);          // full field dump
    $display("addr rand_mode=%0d", t.addr.rand_mode());
    $fatal(1, "randomize failed for %s", who);
  end
endtask

// Usage everywhere:
//   randomize_or_fatal(txn, "seq body item 12");
// On failure you get: who called, the full object state (the inputs
// to the solver!), and rand_mode status - the first three things
// you would have asked for anyway.

The %p object dump is the underrated piece: a randomize failure is a function of the object's current state-variable values, so dumping the object at failure time captures the solver's inputs exactly. Combined with the seed, that is a complete repro kit in the log. The post_randomize validity check is the safety net for the entire silent-failure family — it asserts properties of the output , so it catches disabled constraints, rand_mode leftovers, and constraint bugs alike, regardless of which mechanism caused them.

Interview angle

Silent-failure questions probe LRM literacy: “randomize returns 1 but the field didn't change — name three causes.” Answer: null nested handle (skipped per LRM, not an error), rand_mode(0) left set from earlier (sticky per object, variable becomes state), and the field was never rand (or a randc cycle/state interaction). Follow with the defenses unprompted — construct nested objects in new(), audit rand_mode at phase boundaries, wrap randomize in a fatal-with-dump helper, and assert output validity in post_randomize. Interviewers read the defensive idioms as evidence you have been burned in production, which is exactly the experience they are hiring for.

Key takeaways

Null rand handles are silently skipped by randomize() — construct sub-objects in new() and guard in pre_randomize.
rand_mode(0) is sticky per object; the variable becomes a state variable until rand_mode(1).
randc cycle state is per-object and interacts unpredictably with mid-cycle constraint/state changes.
A randomize_or_fatal wrapper with a %p object dump turns every failure into a self-contained repro report.
post_randomize validity checks catch wrong output regardless of which silent mechanism produced it.

Common pitfalls

Forgetting pl = new() in the parent constructor — solver skips the null, driver crashes much later.
Treating rand_mode(0) as per-call — it persists across every later randomize on that object.
Relying on randc exhaustive-before-repeat semantics across a state change — use an explicit shuffled queue.
Wrapper that prints only 'randomize failed' — without the %p state dump the report is useless.
Putting validity checks in the test instead of post_randomize — only one call site is protected.

Practice this lesson