Part 7 · Advanced & Integration · Intermediate

Where Simulation Time Goes

Cost breakdown of RTL activity, TB overhead, assertions, coverage, and waves — and how to profile before optimizing.

The cost breakdown

Simulation wall-clock time splits across several consumers, and the split shifts dramatically with design size and TB style. In a class-based environment on a mid-size block, it is common for the testbench plus instrumentation to cost more than the RTL itself . You cannot fix what you have not measured, so the first skill is knowing the categories and their typical ranking.

diagram
WHERE WALL-CLOCK TIME GOES (typical class-based TB, mid-size block)

  ┌────────────────────────────────────────────────────────────┐
  │ RTL event evaluation        ████████████          ~30-40%  │
  │ TB procedural code          ████████              ~20-30%  │
  │   (drivers, monitors,                                      │
  │    scoreboard, randomize)                                  │
  │ Wave dumping (full hier)    ██████                ~15-25%  │
  │ Coverage sampling           ███                   ~5-10%   │
  │ Assertions (SVA)            ██                    ~5-10%   │
  │ Logging / file I/O          ██                    ~2-10%   │
  └────────────────────────────────────────────────────────────┘

  The ranking INVERTS for badly written TBs:
  a polling loop or per-cycle $sformatf can push
  "TB procedural code" past 60% on its own.

Why the split matters

  • RTL cost scales with design activity — you mostly cannot reduce it from the TB side.

  • TB cost scales with how your code is written — fully under your control.

  • Waves and coverage are optional per run — the easiest wins when speed matters.

  • Assertions are usually worth their cost — disable only as a last resort, and never silently.


Profiling: measure before you touch anything

Every major simulator ships a profiler that attributes CPU time to modules, processes, and source lines (e.g. VCS -simprofile, Questa -profile, Xcelium -profile). The exact flags differ; the workflow does not: run a representative test with profiling on, read the top-of-list report, and fix the single biggest consumer first.

diagram
PROFILING WORKFLOW

  1. Pick a representative test (not the shortest smoke test)
  2. Run once with the vendor profiler enabled
  3. Read the report TOP-DOWN:
        rank  consumer                         %CPU
        ────  ───────────────────────────────  ─────
        1     tb_top.env.scb (run_phase loop)  38%   ← fix THIS first
        2     dut.u_core (RTL)                 22%
        3     wave dump engine                 18%
        4     tb_top.env.cov.cg sample         9%
  4. Fix rank 1 only. Re-run profiler. Repeat.

  Anti-pattern: "optimizing" items 3 and 4 while a
  scoreboard polling loop burns 38% unexamined.

A cheap first-pass measurement without a profiler

systemverilog
// Crude but effective: bracket suspect phases with wall-clock stamps.
// $realtime measures sim time; for wall-clock use $system or vendor PLI,
// or simply compare 'time ./simv' across configuration runs:
//
//   run A: everything on            -> 42 min
//   run B: waves off                -> 31 min   (waves cost ~11 min)
//   run C: waves off, coverage off  -> 27 min   (coverage ~4 min)
//   run D: C + suspect monitor off  -> 14 min   (monitor is the story)
//
// Differential runs localize cost with zero tool knowledge.
module tb_top;
  initial begin
    if ($test$plusargs("NO_WAVES")) ;        // skip dump setup
    else begin
      $dumpfile("waves.vcd");
      $dumpvars(0, tb_top);
    end
  end
endmodule

Typical cost ranking and what it tells you

Reading your own environment

  1. RTL dominant (>50%) — TB is healthy; speedups come from compile/optimization flags or running fewer cycles.

  2. TB dominant — look for polling loops, hot-path string building, and copy storms (next sub-lesson).

  3. Waves dominant — you are dumping too much hierarchy or running waves in regression (see Waves & Logs).

  4. Coverage dominant — sample storms: a covergroup sampled per clock instead of per transaction.

  5. File I/O dominant — verbose logging at scale; a million UVM_MEDIUM lines is real CPU and disk time.

Key takeaways

  • Profile first — differential runs or the vendor profiler; never optimize on instinct.

  • Fix the single largest consumer, re-measure, repeat — one fix at a time.

  • TB code and visibility instrumentation are the costs you control; RTL evaluation largely is not.

  • A healthy environment is RTL-dominant; a TB-dominant profile is a bug report on your testbench.

Common pitfalls

  • Optimizing without measuring — hours spent shaving coverage cost while a polling loop burns 40%.

  • Profiling a 100-cycle smoke test — startup cost dominates and the report misleads.

  • Assuming the simulator is slow — three vendors, same TB sin, same slow result.

  • Disabling assertions for speed without an owner sign-off — silent loss of checking.