Part 6 · Testbench Architecture · Intermediate

Pipelined & Backpressured Driving

Multi-outstanding transactions with id tagging, respecting DUT backpressure, split request/response processes, and semaphore-guarded shared buses.

Why one-at-a-time driving is not enough

A blocking driver that waits for each response before issuing the next request can never create more than one outstanding transaction — so the DUT's pipeline, reorder logic, and full-FIFO paths are never exercised. Pipelined protocols (AXI-like) require the driver to split request issue from response collection and tag each request with an id so responses can be matched back, possibly out of order.

diagram

BLOCKING vs PIPELINED DRIVING

  BLOCKING (1 outstanding)
  req1 ──► wait resp1 ──► req2 ──► wait resp2
  DUT pipeline mostly empty; reorder logic never tested

  PIPELINED (N outstanding, id-tagged)
  req1(id0) req2(id1) req3(id2) ──────────►   issue process
        ◄── resp(id1) resp(id0) resp(id2)    response process
                 ▲ out-of-order completion exercised

  pending[id] = txn  on issue
  pending.delete(id) on response  ── matches resp to request

Split-process driver with id tagging

systemverilog

class pipelined_driver;
  mailbox #(bus_txn) in;
  virtual axi_lite_if vif;

  bus_txn pending [bit [3:0]];     // id → in-flight txn
  semaphore id_pool;                // limits outstanding count
  bit [3:0] next_id;

  function new(mailbox #(bus_txn) in, int max_outstanding = 8);
    this.in = in;
    id_pool = new(max_outstanding); // 8 keys = 8 outstanding max
  endfunction

  task run();
    fork
      issue_loop();
      response_loop();
    join_none
  endtask

  task issue_loop();
    forever begin
      bus_txn t;
      in.get(t);
      id_pool.get(1);              // blocks at max outstanding
      t.id_tag = next_id++;
      pending[t.id_tag] = t;

      @(vif.drv_cb);
      vif.drv_cb.arvalid <= 1;
      vif.drv_cb.arid    <= t.id_tag;
      vif.drv_cb.araddr  <= t.addr;
      // BACKPRESSURE: hold until DUT asserts ready
      do @(vif.drv_cb); while (vif.drv_cb.arready !== 1);
      vif.drv_cb.arvalid <= 0;
    end
  endtask

  task response_loop();
    forever begin
      @(vif.drv_cb);
      if (vif.drv_cb.rvalid === 1 && vif.drv_cb.rready === 1) begin
        bit [3:0] rid = vif.drv_cb.rid;
        if (!pending.exists(rid))
          $display("DRV-ERR: response for unknown id %0d", rid);
        else begin
          pending[rid].rdata = vif.drv_cb.rdata;
          pending.delete(rid);
          id_pool.put(1);          // free a slot for the issue loop
        end
      end
    end
  endtask
endclass

Code walkthrough

Two forever processes: issue_loop pushes requests as fast as ids and arready allow; response_loop independently collects completions.
The semaphore is the outstanding-count throttle — get(1) per issue, put(1) per response; 8 keys means at most 8 in flight.
pending is an associative array keyed by id — out-of-order responses match back to their requests by lookup, not by position.
Backpressure is respected purely by the arready wait — the driver never deasserts or mutates a stalled request.
A response with an id not in pending is a DUT bug (or TB bug) and is flagged immediately at the driver.

Semaphore-guarded shared bus

When two stimulus sources (say, a CPU-port driver and a DMA-port driver) share one physical bus, each transaction's pin sequence must be atomic — interleaving two half-driven requests corrupts both. A semaphore with one key is the mutex; whoever holds the key owns the bus for one complete transaction.

systemverilog

semaphore bus_lock = new(1);      // one key = mutual exclusion

task drive_locked(bus_txn t);
  bus_lock.get(1);                  // acquire the bus
  drive_one(t);                     // full atomic pin sequence
  bus_lock.put(1);                  // release for the other master
endtask

// cpu_driver and dma_driver both call drive_locked();
// arbitration order = semaphore FIFO order, never interleaved pins

Interview angle

“How do you drive N outstanding transactions?” — split issue/response processes, id tagging, associative-array matching, semaphore throttle. Name all four.
“What happens when the DUT stalls you?” — hold valid and payload stable until ready; the semaphore naturally propagates the stall back to the generator via the bounded mailbox.
“Two drivers, one bus?” — one-key semaphore around the atomic pin sequence; mention that the semaphore queues fairly in FIFO order.

Key takeaways

Pipelined driving = separate issue and response processes joined by an id-keyed pending array.
A semaphore with N keys is the cleanest outstanding-transaction throttle: get on issue, put on response.
Backpressure means holding the request stable until ready — never retracting or rewriting it mid-stall.
One-key semaphores serialize multiple stimulus sources onto a shared bus atomically.

Common pitfalls

Matching responses by issue order on a protocol that permits reordering — scoreboard chaos that looks like DUT bugs.
Forgetting id_pool.put(1) on the response path — the testbench deadlocks at exactly max_outstanding transactions.
Deasserting valid when stalled — protocol violation many DUTs tolerate in sim and fail on in silicon.
Reusing an id that is still in pending — two in-flight transactions become indistinguishable; size the id space to max outstanding.

Practice this lesson