Part 1 · Language Foundations · Intermediate

Blocking vs Nonblocking Assignments

Scheduling semantics, the two-register swap classic, why NBA models flip-flops, and RTL vs testbench usage with race diagrams.

Two assignments, two schedules

A blocking assignment a = b evaluates the right side and updates the left side immediately , before the next statement runs — like assignment in any software language. A nonblocking assignment a <= b splits the operation in two: the right side is evaluated now (in the Active region), but the left side is updated later , in the NBA region of the same time slot, after every process triggered by this event has finished evaluating. That split is not a convenience — it is a faithful model of a flip-flop, which samples D at the clock edge and changes Q after the edge, never feeding the new value back into logic clocked by the same edge.

diagram
ONE CLOCK EDGE, TWO PROCESSES — WHY NBA HAS NO RACE

  posedge clk fires both always_ff blocks

  ACTIVE region                        NBA region
  ─────────────────────────────       ─────────────────────
  proc1: eval  b  (old value) ──┐
  proc2: eval  a  (old value) ──┤  ┌─► a updated to old b
        (order of proc1/proc2   ├──┤
         DOESN'T MATTER: both   │  └─► b updated to old a
         read OLD values)     ──┘
                                      result: clean swap

  WITH BLOCKING (=) INSTEAD:
  if proc1 runs first: a = b;  then proc2 reads the NEW a
   b = a gives b its own old value back   NO swap
  if proc2 runs first  opposite corruption
   result depends on simulator process ordering = RACE

The diagram is the whole story of simulation races: with blocking assignments across communicating clocked processes, the result depends on which process the simulator happens to run first — legal, unspecified ordering. With nonblocking assignments every process reads pre-edge values and all updates commit together, so process order is irrelevant.


The two-register swap classic

The canonical interview demonstration: swap two registers on a clock edge. With <= it works in two lines with no temporary, because both right-hand sides are sampled before either update lands. With = the first assignment destroys the value the second one needs. If an interviewer asks you to “swap without a temp variable in RTL”, nonblocking assignment is the answer they are fishing for.

systemverilog
// CORRECT swap — RHS of both lines sampled before any update
always_ff @(posedge clk) begin
  a <= b;
  b <= a;     // reads the OLD a: a and b exchange every cycle
end

// BROKEN swap — blocking destroys the needed value
always_ff @(posedge clk) begin
  a = b;      // a immediately becomes b
  b = a;      // reads the NEW a → b = old b. Both now equal old b.
end

// Same principle, shift register form:
always_ff @(posedge clk) begin
  q1 <= d;    // with <= : a real 3-stage pipeline
  q2 <= q1;
  q3 <= q2;
end
// With =, d would ripple through q1,q2,q3 in ONE cycle —
// synthesis gives 3 flops, simulation shows 1: classic mismatch.

Usage rules for RTL and testbench

The discipline that follows from the semantics: use nonblocking for every clocked state element (always_ff), use blocking inside combinational blocks (always_comb) so intermediate values propagate through the block in order, and never mix the two for the same variable . In testbench procedural code (initial blocks, tasks, class methods) blocking is the default because you want software-like sequencing — but when the TB drives DUT inputs synchronously to a clock, drive with nonblocking (or better, through a clocking block) so the DUT samples old values exactly like a real upstream flop would.

systemverilog
// Combinational: blocking, so 'sum' flows into 'parity' in-order
always_comb begin
  sum    = a + b;
  parity = ^sum;          // uses THIS slot's sum — correct with =
end

// TB driving a DUT input at the clock edge:
initial begin
  @(posedge clk);
  data_in = 8'hA5;    // RACE: does the DUT flop see A5 or the old value?
  @(posedge clk);
  data_in <= 8'h5A;   // SAFE: DUT samples old value; A5→5A like real HW
end

Key takeaways

  • Blocking updates immediately in statement order; nonblocking samples now, commits in the NBA region.

  • NBA exists to model flip-flops: all clocked processes read pre-edge values regardless of process order.

  • Rules: <= in always_ff, = in always_comb and TB sequencing, never both on one variable.

  • The two-register swap and the 1-stage-vs-3-stage shift register are the interview standards — know both.

Common pitfalls

  • Blocking assignments in clocked blocks — code works until process ordering changes, then races appear.

  • Nonblocking in combinational blocks — later statements read stale values, causing extra delta-cycle settling.

  • Driving DUT inputs with = right at @(posedge clk) from the TB — a textbook TB-vs-DUT race.

  • Mixing = and <= on the same variable — undefined results and a guaranteed code-review rejection.