Part 3 · Constraint Randomization · Intermediate

Solver Performance

What makes constraints expensive — multiplication, division, wide vectors, cross-object coupling — how slow solves present, and restructuring fixes that work.

Why some constraints are expensive

Constraint solving is NP-hard in general; simulators ship clever engines (interval propagation, SAT/BDD hybrids) that handle typical verification constraints in microseconds — until a constraint lands outside the engine's easy cases. The expensive shapes are predictable: nonlinear arithmetic (multiplication or division between two random variables ), modulo by a random variable , very wide vectors in arithmetic relations, large unique sets , and cross-object coupling that fuses hundreds of variables into one giant problem. Multiplication by a constant is cheap; a * b == c with all three random is factoring — genuinely hard.

systemverilog
class costly;
  rand bit [31:0] a, b, c;
  rand bit [63:0] wide;

  constraint c_hard {
    a * b == c;            // EXPENSIVE: rand*rand (factoring)
    wide % a == 3;         // EXPENSIVE: modulo by a rand var
    c / b  inside {[2:9]}; // EXPENSIVE: rand/rand division
  }

  constraint c_cheap {
    a * 4 == c;            // cheap: constant multiply = shift
    wide[1:0] == 2'b11;    // cheap: bit-slice equality
    b inside {[1:100]};    // cheap: interval
  }
endclass
diagram
CONSTRAINT COST LADDER (typical engines)

  cheap   | equality, inside ranges, bit-slices,
          | add/sub, multiply/divide by CONSTANT
  --------+---------------------------------------------
  medium  | implications, foreach over moderate arrays,
          | sum() aggregates, small unique sets
  --------+---------------------------------------------
  costly  | rand*rand multiply, rand modulo/divide,
          | wide-vector (>64b) arithmetic relations,
          | unique over large tightly-ranged sets
  --------+---------------------------------------------
  blowup  | cross-object webs (100s of coupled vars),
          | nested implications forming case explosions,
          | aggregate equations over huge arrays

  Cost is also MULTIPLICATIVE: a medium constraint
  inside a foreach over 1000 elements is 1000 mediums
  fused into one problem.

Recognizing a slow solve

Symptoms: simulation wall-clock time grows while simulated time barely advances; profiles show time inside randomize() rather than evaluation or PLI; a specific test hangs at one sequence's generation loop; or memory climbs during randomization (BDD blowup). Measure before fixing — wrap suspicious calls with a realtime delta and log outliers:

systemverilog
class timed_gen;
  rand my_txn t;

  task gen_one();
    realtime t0, t1;
    t0 = $realtime;
    if (!t.randomize()) $error("randomize failed");
    t1 = $realtime;
    // $realtime measures SIM time; for wall-clock use vendor
    // profiling (e.g. simulator's solver-profile switches) or
    // $system("date +%s%N") bracketing in extreme cases.
    if (t1 - t0 > 0) $display("solver consumed sim-time?!");
  endtask
endclass

// Practical measurement: every major simulator has a solver
// profile mode that reports per-constraint-block solve time and
// failure/retry counts. Run it FIRST — guessing the hot
// constraint by eye fails routinely. Typical switches live under
// names like "solver profile" / "constraint profile" in the
// vendor docs (kept generic here deliberately).

The single most useful vendor-neutral signal is the retry/failure count : a constraint set that frequently fails and re-solves (e.g. randc interactions, soft-constraint backtracking) costs multiples of its nominal solve time.


Restructuring fixes that actually work

1. Precompute in pre_randomize / move work out of the solver

systemverilog
class fix_precompute;
  rand bit [31:0] addr;
  bit [31:0] base;             // state var — frozen input
  bit [31:0] region_tbl[8];    // lookup table, filled procedurally

  function void pre_randomize();
    base = select_base_region();      // arbitrary procedural code
  endfunction

  constraint c {
    // solver sees base as a CONSTANT — cheap interval math
    addr inside {[base : base + 32'h0FFF]};
  }
endclass

Anything computable without the solver — region selection, weight tables, derived parameters — belongs in pre_randomize() or plain test code, entering constraints as frozen state. Lookup tables replace arithmetic: instead of constraining size * stride == span (rand*rand), pick an index and read both legal values from a precomputed table of valid pairs.

2. Narrow fields and replace nonlinear forms

systemverilog
class fix_shapes;
  rand bit [7:0]  pages;       // was bit [31:0] — narrow to need
  rand bit [2:0]  stride_sel;
  bit [15:0] stride_tbl[8] = '{1,2,4,8,16,32,64,128};

  constraint c {
    // was: stride inside {1,2,4,...} && total == pages * stride
    // now: stride is a table lookup; multiply is gone from
    //      the solver entirely
    pages inside {[1:200]};
  }
  function bit [15:0] stride(); return stride_tbl[stride_sel]; endfunction
endclass

3. Split one giant randomize into stages

When a container couples hundreds of variables but the real dependencies are layered, split the solve: randomize configuration first, then randomize payloads with the config frozen as state. You lose only constraints that genuinely span the cut — and those you re-express on the second call via with clauses referencing the now-frozen stage-1 results. Use randomize(subset_var) argument lists to scope a call to specific variables when stages share a class.


Interview angle

What interviewers probe

  • "Which constraints are expensive?" — expected: rand*rand multiply/divide/modulo, wide-vector arithmetic, big tight unique sets, cross-object webs; constant arithmetic is cheap.

  • "A test's generation loop got 100x slower after a constraint edit — first step?" — expected: run the simulator's constraint/solver profile, look at per-block time and retry counts; do not guess.

  • "How do you fix a slow randomize?" — expected: precompute to state vars in pre_randomize, replace nonlinear relations with lookup tables, narrow field widths, split staged randomize calls.

  • "Why is a*b==c hard but a*4==c easy?" — expected: constant multiply is a linear/shift relation propagated by intervals; rand*rand is factoring — search, not propagation.

Mentioning retry counts as a profiling signal — not just solve time — is a senior tell: it shows you know failed-and-retried solves (soft constraints, randc interplay) are a hidden multiplier.

Key takeaways

  • Expensive shapes are predictable: rand*rand nonlinear arithmetic, rand modulo, wide vectors, big tight unique sets, cross-object coupling.

  • Profile before fixing — vendor solver-profile modes report per-block time and retry counts; guessing fails.

  • Move computable work to pre_randomize state vars and lookup tables — the solver should see constants, not derivations.

  • Narrow fields to the width the spec needs; width inflates every arithmetic relation's cost.

  • Split layered problems into staged randomize calls with earlier results frozen as state.

Common pitfalls

  • Constraining products/quotients of two rand variables when a lookup table of legal pairs would do.

  • 32- or 64-bit fields for quantities that fit in 8 — every relation pays the width tax.

  • One mega-randomize over a deep object tree when dependencies are actually layered — fuse only what truly couples.

  • Optimizing by intuition without the solver profile — the hot constraint is rarely the one you suspect.

  • Ignoring retry counts — heavy soft-constraint or randc backtracking multiplies cost invisibly.