Part 7 · Advanced & Integration · Intermediate
Where Simulation Time Goes
Cost breakdown of RTL activity, TB overhead, assertions, coverage, and waves — and how to profile before optimizing.
The cost breakdown
Simulation wall-clock time splits across several consumers, and the split shifts dramatically with design size and TB style. In a class-based environment on a mid-size block, it is common for the testbench plus instrumentation to cost more than the RTL itself . You cannot fix what you have not measured, so the first skill is knowing the categories and their typical ranking.
WHERE WALL-CLOCK TIME GOES (typical class-based TB, mid-size block)
┌────────────────────────────────────────────────────────────┐
│ RTL event evaluation ████████████ ~30-40% │
│ TB procedural code ████████ ~20-30% │
│ (drivers, monitors, │
│ scoreboard, randomize) │
│ Wave dumping (full hier) ██████ ~15-25% │
│ Coverage sampling ███ ~5-10% │
│ Assertions (SVA) ██ ~5-10% │
│ Logging / file I/O ██ ~2-10% │
└────────────────────────────────────────────────────────────┘
The ranking INVERTS for badly written TBs:
a polling loop or per-cycle $sformatf can push
"TB procedural code" past 60% on its own.Why the split matters
RTL cost scales with design activity — you mostly cannot reduce it from the TB side.
TB cost scales with how your code is written — fully under your control.
Waves and coverage are optional per run — the easiest wins when speed matters.
Assertions are usually worth their cost — disable only as a last resort, and never silently.
Profiling: measure before you touch anything
Every major simulator ships a profiler that attributes CPU time to modules, processes, and source lines (e.g. VCS -simprofile, Questa -profile, Xcelium -profile). The exact flags differ; the workflow does not: run a representative test with profiling on, read the top-of-list report, and fix the single biggest consumer first.
PROFILING WORKFLOW
1. Pick a representative test (not the shortest smoke test)
2. Run once with the vendor profiler enabled
3. Read the report TOP-DOWN:
rank consumer %CPU
──── ─────────────────────────────── ─────
1 tb_top.env.scb (run_phase loop) 38% ← fix THIS first
2 dut.u_core (RTL) 22%
3 wave dump engine 18%
4 tb_top.env.cov.cg sample 9%
4. Fix rank 1 only. Re-run profiler. Repeat.
Anti-pattern: "optimizing" items 3 and 4 while a
scoreboard polling loop burns 38% unexamined.A cheap first-pass measurement without a profiler
// Crude but effective: bracket suspect phases with wall-clock stamps.
// $realtime measures sim time; for wall-clock use $system or vendor PLI,
// or simply compare 'time ./simv' across configuration runs:
//
// run A: everything on -> 42 min
// run B: waves off -> 31 min (waves cost ~11 min)
// run C: waves off, coverage off -> 27 min (coverage ~4 min)
// run D: C + suspect monitor off -> 14 min (monitor is the story)
//
// Differential runs localize cost with zero tool knowledge.
module tb_top;
initial begin
if ($test$plusargs("NO_WAVES")) ; // skip dump setup
else begin
$dumpfile("waves.vcd");
$dumpvars(0, tb_top);
end
end
endmoduleTypical cost ranking and what it tells you
Reading your own environment
RTL dominant (>50%) — TB is healthy; speedups come from compile/optimization flags or running fewer cycles.
TB dominant — look for polling loops, hot-path string building, and copy storms (next sub-lesson).
Waves dominant — you are dumping too much hierarchy or running waves in regression (see Waves & Logs).
Coverage dominant — sample storms: a covergroup sampled per clock instead of per transaction.
File I/O dominant — verbose logging at scale; a million UVM_MEDIUM lines is real CPU and disk time.
Key takeaways
Profile first — differential runs or the vendor profiler; never optimize on instinct.
Fix the single largest consumer, re-measure, repeat — one fix at a time.
TB code and visibility instrumentation are the costs you control; RTL evaluation largely is not.
A healthy environment is RTL-dominant; a TB-dominant profile is a bug report on your testbench.
Common pitfalls
Optimizing without measuring — hours spent shaving coverage cost while a polling loop burns 40%.
Profiling a 100-cycle smoke test — startup cost dominates and the report misleads.
Assuming the simulator is slow — three vendors, same TB sin, same slow result.
Disabling assertions for speed without an owner sign-off — silent loss of checking.