Part 1 · Language Foundations · Intermediate
Blocking vs Nonblocking Assignments
Scheduling semantics, the two-register swap classic, why NBA models flip-flops, and RTL vs testbench usage with race diagrams.
Two assignments, two schedules
A blocking assignment a = b evaluates the right side and updates the left side immediately , before the next statement runs — like assignment in any software language. A nonblocking assignment a <= b splits the operation in two: the right side is evaluated now (in the Active region), but the left side is updated later , in the NBA region of the same time slot, after every process triggered by this event has finished evaluating. That split is not a convenience — it is a faithful model of a flip-flop, which samples D at the clock edge and changes Q after the edge, never feeding the new value back into logic clocked by the same edge.
ONE CLOCK EDGE, TWO PROCESSES — WHY NBA HAS NO RACE
posedge clk fires both always_ff blocks
ACTIVE region NBA region
───────────────────────────── ─────────────────────
proc1: eval b (old value) ──┐
proc2: eval a (old value) ──┤ ┌─► a updated to old b
(order of proc1/proc2 ├──┤
DOESN'T MATTER: both │ └─► b updated to old a
read OLD values) ──┘
result: clean swap
WITH BLOCKING (=) INSTEAD:
if proc1 runs first: a = b; then proc2 reads the NEW a
→ b = a gives b its own old value back → NO swap
if proc2 runs first → opposite corruption
→ result depends on simulator process ordering = RACEThe diagram is the whole story of simulation races: with blocking assignments across communicating clocked processes, the result depends on which process the simulator happens to run first — legal, unspecified ordering. With nonblocking assignments every process reads pre-edge values and all updates commit together, so process order is irrelevant.
The two-register swap classic
The canonical interview demonstration: swap two registers on a clock edge. With <= it works in two lines with no temporary, because both right-hand sides are sampled before either update lands. With = the first assignment destroys the value the second one needs. If an interviewer asks you to “swap without a temp variable in RTL”, nonblocking assignment is the answer they are fishing for.
// CORRECT swap — RHS of both lines sampled before any update
always_ff @(posedge clk) begin
a <= b;
b <= a; // reads the OLD a: a and b exchange every cycle
end
// BROKEN swap — blocking destroys the needed value
always_ff @(posedge clk) begin
a = b; // a immediately becomes b
b = a; // reads the NEW a → b = old b. Both now equal old b.
end
// Same principle, shift register form:
always_ff @(posedge clk) begin
q1 <= d; // with <= : a real 3-stage pipeline
q2 <= q1;
q3 <= q2;
end
// With =, d would ripple through q1,q2,q3 in ONE cycle —
// synthesis gives 3 flops, simulation shows 1: classic mismatch.Usage rules for RTL and testbench
The discipline that follows from the semantics: use nonblocking for every clocked state element (always_ff), use blocking inside combinational blocks (always_comb) so intermediate values propagate through the block in order, and never mix the two for the same variable . In testbench procedural code (initial blocks, tasks, class methods) blocking is the default because you want software-like sequencing — but when the TB drives DUT inputs synchronously to a clock, drive with nonblocking (or better, through a clocking block) so the DUT samples old values exactly like a real upstream flop would.
// Combinational: blocking, so 'sum' flows into 'parity' in-order
always_comb begin
sum = a + b;
parity = ^sum; // uses THIS slot's sum — correct with =
end
// TB driving a DUT input at the clock edge:
initial begin
@(posedge clk);
data_in = 8'hA5; // RACE: does the DUT flop see A5 or the old value?
@(posedge clk);
data_in <= 8'h5A; // SAFE: DUT samples old value; A5→5A like real HW
endKey takeaways
Blocking updates immediately in statement order; nonblocking samples now, commits in the NBA region.
NBA exists to model flip-flops: all clocked processes read pre-edge values regardless of process order.
Rules: <= in always_ff, = in always_comb and TB sequencing, never both on one variable.
The two-register swap and the 1-stage-vs-3-stage shift register are the interview standards — know both.
Common pitfalls
Blocking assignments in clocked blocks — code works until process ordering changes, then races appear.
Nonblocking in combinational blocks — later statements read stale values, causing extra delta-cycle settling.
Driving DUT inputs with = right at @(posedge clk) from the TB — a textbook TB-vs-DUT race.
Mixing = and <= on the same variable — undefined results and a guaranteed code-review rejection.