Part 7 · Advanced & Integration · Intermediate

Anatomy of a TB/RTL Race

A same-edge drive/sample race walked cycle by cycle, why ordering is nondeterministic, and the classic works-in-one-tool symptom.

The setup: two processes, one edge

A race needs only two ingredients: a testbench process that drives or samples a signal with blocking semantics at a clock edge , and an RTL process triggered by the same edge. The LRM does not define which same-region process runs first — that ordering is the simulator's choice, and it is exactly where determinism dies.

systemverilog

// RTL — a registered pipeline stage
always_ff @(posedge clk)
  data_q <= data_in;          // nonblocking: reads data_in in Active,
                              // updates data_q in NBA

// TESTBENCH — naive driver, same edge, BLOCKING assignment
initial begin
  @(posedge clk);
  data_in = 8'hA5;            // blocking: updates data_in IMMEDIATELY
end

// TESTBENCH — naive monitor, same edge, immediate read
always @(posedge clk)
  captured = data_in;          // which value of data_in does this see?

Why blocking at the boundary is the trigger

RTL written with nonblocking assignments is internally race-free: all reads happen before any updates.
A TB blocking assignment updates the signal mid-Active-region — while RTL processes at the same edge may or may not have read it yet.
A TB immediate read in the same region may execute before or after the TB drive — same ambiguity, sampling side.

The race, cycle by cycle

diagram

SAME-EDGE DRIVE/SAMPLE RACE — time step at posedge clk

  ACTIVE REGION (process order is SIMULATOR'S CHOICE)
  ┌──────────────────────────────────────────────────────────┐
  │  runnable: { always_ff (RTL), driver, monitor }          │
  │                                                          │
  │  ORDER A (driver first):                                 │
  │    1. driver:    data_in = 8'hA5      (blocking, now)    │
  │    2. always_ff: reads data_in = A5 → schedules NBA      │
  │    3. monitor:   captured = A5                           │
  │                                                          │
  │  ORDER B (RTL first):                                    │
  │    1. always_ff: reads data_in = OLD  → schedules NBA    │
  │    2. driver:    data_in = 8'hA5                         │
  │    3. monitor:   captured = A5  (or OLD if monitor ran   │
  │                                  before the driver!)     │
  └──────────────────────────────────────────────────────────┘
                          │
                          ▼
  NBA REGION
  ┌──────────────────────────────────────────────────────────┐
  │  data_q <= (whatever always_ff READ above)               │
  │  ORDER A: data_q becomes A5   — DUT saw the new value    │
  │  ORDER B: data_q becomes OLD  — DUT saw the old value    │
  └──────────────────────────────────────────────────────────┘

  Same source code. Two legal outcomes. THAT is the race.

Both orders are legal per the LRM . The simulator may pick based on declaration order, elaboration order, optimization level, or internal heuristics — all of which can change when you add a signal, change a flag, or upgrade the tool.

Why it is nondeterministic — and the classic symptom

Sources of order variation

Different simulators make different (equally legal) ordering choices — VCS vs Questa vs Xcelium.
The same simulator can reorder after an optimization flag change (-O levels, partition options).
Unrelated code edits shift elaboration order, silently flipping which process runs first.
Even a passing test is not evidence of correctness — it is evidence you won the coin flip today.

diagram

THE CLASSIC SYMPTOM

  Monday:    test passes on Simulator X            ← order A
  Tuesday:   same test fails on Simulator Y        ← order B
  Wednesday: "fixed" by adding a #1 delay           ← hides the race
  Thursday:  fails again after -O3 recompile        ← race is back
  Friday:    vendor support ticket: "your tool is broken"
             vendor reply: "both behaviors are LRM-legal"

  Root cause was never the tool. It was a blocking
  drive/sample at the same edge the RTL uses.

How to recognize one in the wild

A value is off by exactly one cycle, intermittently or per-tool — the signature of edge-order ambiguity.
Behavior changes with simulator, optimization flags, or unrelated edits — code is constant, ordering is not.
There is a TB blocking assignment or immediate read synchronized to the same edge as RTL always_ff blocks.
A #0 or #1 'fix' is in the history — someone treated the symptom before.

Key takeaways

A race is two same-region processes whose LRM-legal orderings produce different results.
TB blocking drives/reads at the RTL's active clock edge are the canonical cause.
Works-in-one-tool, fails-in-another is the fingerprint of a race, not a tool bug.
Delays like #1 hide races; they never fix them — the cure is region separation (next lesson).

Common pitfalls

Driving DUT inputs with blocking assignments directly at @(posedge clk).
Sampling DUT outputs with an immediate read in the same Active region the RTL writes them.
Declaring victory because the test passes on your simulator — ordering luck is not correctness.
Patching with #0/#1 delays, which shift the ambiguity instead of removing it.

Practice this lesson