Part 7 · Advanced & Integration · Intermediate
Testbench Hotspots
Polling loops, hot-path string building, transaction copy storms, covergroup sample storms, and solver-heavy randomize — with before/after fixes.
Hotspot #1: polling loops instead of event waits
The single most expensive testbench sin is a loop that wakes up every cycle (or worse, every timestep) to check a condition. Each wake-up forces the scheduler to run your process, evaluate the condition, and reschedule — millions of times per test. The cure is to wait on the event itself , so the scheduler wakes you exactly once, when the condition becomes true.
// BEFORE — polling: wakes every cycle for the whole sim
task wait_for_done_polling();
while (vif.done !== 1'b1) begin
@(posedge vif.clk); // scheduler wake-up EVERY cycle
end
endtask
// AFTER — event wait: scheduler wakes this process ONCE
task wait_for_done_event();
@(posedge vif.done); // value-change sensitivity, zero polling
endtask
// AFTER (level condition) — wait() blocks until expression is true
task wait_for_fifo_space();
wait (vif.fifo_count < FIFO_DEPTH); // no loop, no per-cycle wake-up
endtaskWhy it is so expensive
A 10M-cycle sim with one polling loop = 10M scheduler wake-ups for zero information most cycles.
Five polling monitors = five processes thrashing the scheduler every cycle.
@(posedge sig) and wait(expr) cost nothing while the condition is false — the kernel owns the sensitivity.
Symptom in a profile: a monitor or scoreboard run task near the top with huge call counts.
Hotspot #2: string building and copying in hot paths
String formatting is invisible until it is in a per-transaction or per-cycle path. $sformatf allocates and formats on every call — even when the resulting message is filtered out by verbosity and never printed. Similarly, deep-copying a large transaction (payload arrays, nested objects) on every hop through the environment multiplies allocation cost.
// BEFORE — formats the string even when nothing is printed
task monitor_loop();
forever begin
collect_txn(tr);
msg = $sformatf("Saw txn %s at %0t", tr.convert2string(), $time);
if (verbosity >= V_HIGH) $display(msg); // usually false!
end
endtask
// AFTER — guard BEFORE formatting; format only when it will print
task monitor_loop_fixed();
forever begin
collect_txn(tr);
if (verbosity >= V_HIGH)
$display("Saw txn %s at %0t", tr.convert2string(), $time);
end
endtask
// BEFORE — full deep copy of a 4KB-payload txn on every hop
mon_ap_txn = new tr; // shallow won't do? then copy() ...
scb.push(tr.copy()); // ... but per-hop deep copies add up
// AFTER — copy ONCE at the observation point; pass the handle after
collect_txn(tr);
tr_frozen = tr.copy(); // single defensive copy
ap.write(tr_frozen); // everyone downstream shares the handle
// (downstream must treat it as read-only)Hotspot #3: sample storms and solver abuse
Covergroup sample storms
A covergroup sampled @(posedge clk) fires every cycle whether or not anything interesting happened. Sample per transaction from the monitor instead — same coverage information, a tiny fraction of the calls.
// BEFORE — fires every clock, ~10M samples per test
covergroup cg @(posedge vif.clk);
cp_addr : coverpoint vif.addr;
endgroup
// AFTER — fires once per completed transaction, ~50K samples
covergroup cg with function sample(bus_txn t);
cp_addr : coverpoint t.addr { bins lo = {[0:255]}; bins hi = {[256:$]}; }
endgroup
// monitor calls cg.sample(tr) when a txn completesSolver-heavy randomize in tight loops
// BEFORE — full constraint solve per loop iteration, heavy class
forever begin
big_txn = new();
assert(big_txn.randomize()); // solver visits ALL constraints,
drive(big_txn); // including complex cross-field ones
end
// AFTER — randomize only what varies; reuse the object
big_txn = new();
assert(big_txn.randomize()); // full solve once
forever begin
assert(big_txn.randomize(addr, len)); // narrow re-solve per iteration
drive(big_txn);
end
// Also consider: std::randomize(addr, len) for trivial fields,
// or pre-generating a pool of solved txns outside the hot loop.HOTSPOT CHECKLIST — symptoms in a profile
Symptom Likely sin Fix
─────────────────────────────────── ────────────────────── ─────────────────────
monitor/scb task, huge call count polling loop @(edge) / wait(expr)
$sformatf high in profile hot-path formatting guard before format
memory mgr / new() high per-hop deep copies copy once, share handle
covergroup sample high per-clock sampling per-txn sample(t)
solver time high randomize in tight loop narrow randomize / poolKey takeaways
Replace every polling loop with @(edge) or wait(expr) — the #1 fix by impact.
Guard verbosity before $sformatf; never format strings that will not print.
Copy transactions once at the observation point; pass handles downstream as read-only.
Sample covergroups per transaction, and keep full randomize() out of tight loops.
Common pitfalls
while (!done) @(posedge clk) in five monitors — the scheduler spends the sim babysitting your TB.
$sformatf inside a logging macro that is verbosity-filtered after formatting.
Deep-copying a 4KB payload at monitor, scoreboard, coverage, and logger — four copies per txn.
randomize() on a 40-constraint class per loop iteration when only two fields vary.