Part 7 · Advanced & Integration · Intermediate
Clocking Blocks as the Cure
Input/output skew semantics, the same race re-walked with a clocking block, default clocking, and full driver/monitor usage in an interface.
Skew semantics, precisely
A clocking block declares, once, when the testbench samples inputs and when it drives outputs relative to its clocking event. The two defaults are chosen specifically to kill the boundary race: input #1step samples the value the signal held immediately before the edge (the Preponed region — before any process at this edge has run), and output #0 (or any output skew) drives after the Active/NBA update machinery (in the Re-NBA region), so the RTL at this edge never sees the new drive early.
interface bus_if (input logic clk);
logic valid, ready;
logic [7:0] data;
clocking cb @(posedge clk);
default input #1step output #2ns;
input ready; // TB samples: value just BEFORE the edge
output valid, data; // TB drives: 2ns AFTER the edge
endclocking
default clocking cb; // makes ##N mean "N cb clock cycles"
modport tb (clocking cb); // TB sees ONLY the skewed view
endinterfaceReading the skews
input #1step — sample in Preponed: the stable pre-edge value, identical to what an always_ff at this edge reads. No ordering involved.
output #0 — drive in Re-NBA of the same time step: after RTL has read its inputs for this edge; the DUT sees the new value at the NEXT edge.
output #2ns — same region semantics, but the pin physically changes 2ns after the edge: visually clean waves, mimics real setup margin.
1step is not 1ns — it is 'one simulation precision unit before the edge', i.e. the last value of the previous time slot.
The same race, re-walked with a clocking block
SAME SCENARIO, NOW THROUGH cb — time step at posedge clk
PREPONED
┌────────────────────────────────────────────────────────┐
│ cb.ready sampled = value from BEFORE the edge │
│ (no process has run yet — ordering is impossible) │
└────────────────────────────────────────────────────────┘
│
ACTIVE / NBA ▼
┌────────────────────────────────────────────────────────┐
│ RTL always_ff runs: reads data_in (old value), │
│ schedules data_q <= old value in NBA. The TB has │
│ touched NOTHING in this region. No race possible. │
└────────────────────────────────────────────────────────┘
│
RE-NBA (TB drive lands) ▼
┌────────────────────────────────────────────────────────┐
│ cb.data <= 8'hA5 takes effect HERE — after all RTL │
│ reads for this edge are done. RTL consumes A5 at the │
│ NEXT posedge. Deterministic in every simulator. │
└────────────────────────────────────────────────────────┘
ORDER A and ORDER B from the race lesson now produce
IDENTICAL results — ordering no longer matters.This is the key insight: the clocking block does not make the simulator pick a friendlier order. It moves the TB's sample point and drive point into regions where no RTL process is competing , so every legal order produces the same result.
Full driver and monitor usage
// DRIVER — all drives through cb, all timing via @(cb) / ##N
task automatic drive_txn(virtual bus_if.tb vif, input logic [7:0] d);
@(vif.cb); // synchronize to the clocking event
vif.cb.valid <= 1'b1; // lands in Re-NBA — race-free
vif.cb.data <= d;
do @(vif.cb); while (vif.cb.ready !== 1'b1); // SAMPLED pre-edge value
vif.cb.valid <= 1'b0;
endtask
// MONITOR — all samples through cb inputs
task automatic collect_txn(virtual bus_if.tb vif, output logic [7:0] d);
do @(vif.cb);
while (!(vif.cb.valid === 1'b1 && vif.cb.ready === 1'b1));
d = vif.cb.data; // the value the DUT actually clocked
endtask
// With 'default clocking cb;' cycle delays read naturally:
// ##1; // one clocking-event cycle
// ##[1:4]; // in assertions/sequences: 1 to 4 cyclesUsage rules that keep it race-free
Drive cb outputs only with <= through the clocking block (vif.cb.sig <= val) — never assign the raw interface signal from the TB.
Read cb inputs only as vif.cb.sig — reading the raw signal reintroduces the same-region sample race.
Synchronize with @(vif.cb) or ##N, not @(posedge clk) — one clock authority per boundary.
Put the clocking block in the interface and expose it via a tb modport, so the TB physically cannot touch raw pins.
Key takeaways
input #1step samples the pre-edge value in Preponed — the same value the RTL flops see.
Output drives land in Re-NBA, after all RTL reads for the edge — the race is structurally gone.
The cb does not change scheduler ordering; it makes ordering irrelevant.
default clocking enables ##N cycle delays and gives the TB one clock authority.
Common pitfalls
Mixing cb drives with raw-signal assignments to the same pin — two drivers, X contention or silent override.
Reading vif.data instead of vif.cb.data in a monitor — the sample race returns through the back door.
Assuming #1step equals 1ns — it is one precision unit before the edge, not a time delay you tune.
Waiting on @(posedge clk) in some tasks and @(vif.cb) in others — two clock authorities, off-by-one bugs.