Part 7 · Advanced & Integration · Intermediate

Clocking Blocks as the Cure

Input/output skew semantics, the same race re-walked with a clocking block, default clocking, and full driver/monitor usage in an interface.

Skew semantics, precisely

A clocking block declares, once, when the testbench samples inputs and when it drives outputs relative to its clocking event. The two defaults are chosen specifically to kill the boundary race: input #1step samples the value the signal held immediately before the edge (the Preponed region — before any process at this edge has run), and output #0 (or any output skew) drives after the Active/NBA update machinery (in the Re-NBA region), so the RTL at this edge never sees the new drive early.

systemverilog
interface bus_if (input logic clk);
  logic        valid, ready;
  logic [7:0]  data;

  clocking cb @(posedge clk);
    default input #1step output #2ns;
    input  ready;            // TB samples: value just BEFORE the edge
    output valid, data;      // TB drives: 2ns AFTER the edge
  endclocking

  default clocking cb;       // makes ##N mean "N cb clock cycles"

  modport tb (clocking cb);  // TB sees ONLY the skewed view
endinterface

Reading the skews

  • input #1step — sample in Preponed: the stable pre-edge value, identical to what an always_ff at this edge reads. No ordering involved.

  • output #0 — drive in Re-NBA of the same time step: after RTL has read its inputs for this edge; the DUT sees the new value at the NEXT edge.

  • output #2ns — same region semantics, but the pin physically changes 2ns after the edge: visually clean waves, mimics real setup margin.

  • 1step is not 1ns — it is 'one simulation precision unit before the edge', i.e. the last value of the previous time slot.


The same race, re-walked with a clocking block

diagram
SAME SCENARIO, NOW THROUGH cb — time step at posedge clk

  PREPONED
  ┌────────────────────────────────────────────────────────┐
  │ cb.ready sampled = value from BEFORE the edge          │
  │ (no process has run yet — ordering is impossible)      │
  └────────────────────────────────────────────────────────┘
                          │
  ACTIVE / NBA            ▼
  ┌────────────────────────────────────────────────────────┐
  │ RTL always_ff runs: reads data_in (old value),         │
  │ schedules data_q <= old value in NBA. The TB has       │
  │ touched NOTHING in this region. No race possible.      │
  └────────────────────────────────────────────────────────┘
                          │
  RE-NBA (TB drive lands) ▼
  ┌────────────────────────────────────────────────────────┐
  │ cb.data <= 8'hA5 takes effect HERE — after all RTL     │
  │ reads for this edge are done. RTL consumes A5 at the   │
  │ NEXT posedge. Deterministic in every simulator.        │
  └────────────────────────────────────────────────────────┘

  ORDER A and ORDER B from the race lesson now produce
  IDENTICAL results — ordering no longer matters.

This is the key insight: the clocking block does not make the simulator pick a friendlier order. It moves the TB's sample point and drive point into regions where no RTL process is competing , so every legal order produces the same result.


Full driver and monitor usage

systemverilog
// DRIVER — all drives through cb, all timing via @(cb) / ##N
task automatic drive_txn(virtual bus_if.tb vif, input logic [7:0] d);
  @(vif.cb);                      // synchronize to the clocking event
  vif.cb.valid <= 1'b1;           // lands in Re-NBA — race-free
  vif.cb.data  <= d;
  do @(vif.cb); while (vif.cb.ready !== 1'b1);   // SAMPLED pre-edge value
  vif.cb.valid <= 1'b0;
endtask

// MONITOR — all samples through cb inputs
task automatic collect_txn(virtual bus_if.tb vif, output logic [7:0] d);
  do @(vif.cb);
  while (!(vif.cb.valid === 1'b1 && vif.cb.ready === 1'b1));
  d = vif.cb.data;                // the value the DUT actually clocked
endtask

// With 'default clocking cb;' cycle delays read naturally:
//   ##1;        // one clocking-event cycle
//   ##[1:4];    // in assertions/sequences: 1 to 4 cycles

Usage rules that keep it race-free

  1. Drive cb outputs only with <= through the clocking block (vif.cb.sig <= val) — never assign the raw interface signal from the TB.

  2. Read cb inputs only as vif.cb.sig — reading the raw signal reintroduces the same-region sample race.

  3. Synchronize with @(vif.cb) or ##N, not @(posedge clk) — one clock authority per boundary.

  4. Put the clocking block in the interface and expose it via a tb modport, so the TB physically cannot touch raw pins.

Key takeaways

  • input #1step samples the pre-edge value in Preponed — the same value the RTL flops see.

  • Output drives land in Re-NBA, after all RTL reads for the edge — the race is structurally gone.

  • The cb does not change scheduler ordering; it makes ordering irrelevant.

  • default clocking enables ##N cycle delays and gives the TB one clock authority.

Common pitfalls

  • Mixing cb drives with raw-signal assignments to the same pin — two drivers, X contention or silent override.

  • Reading vif.data instead of vif.cb.data in a monitor — the sample race returns through the back door.

  • Assuming #1step equals 1ns — it is one precision unit before the edge, not a time delay you tune.

  • Waiting on @(posedge clk) in some tasks and @(vif.cb) in others — two clock authorities, off-by-one bugs.