Part 6 · Testbench Architecture · Intermediate

Reference Models

Fidelity spectrum, writing a transaction-level functional model class, keeping model and RTL spec-synced, and DPI-C preview.

The fidelity spectrum

A reference model answers one question — "given these inputs, what must the DUT output?" — and the right level of detail in that answer is a design decision. Model too little and bugs slip through; model too much and you have re-implemented the RTL, doubled the maintenance bill, and likely cloned the same misreading of the spec.

diagram

REFERENCE MODEL FIDELITY SPECTRUM

  txn-level functional            cycle-approximate           cycle-accurate
  ────────────────────            ─────────────────           ──────────────
  "out = f(in)"                   "+ latency windows,         "exact per-cycle
  no timing at all                 throughput bounds"          pin behavior"

  cost:    LOW                    MEDIUM                      VERY HIGH
  checks:  data integrity,        + perf envelopes,           + timing bugs
           ordering, flags          arb fairness
  risk:    misses timing bugs     some timing escapes         becomes 2nd RTL;
                                                              spec bugs cloned

  ◄── DEFAULT: start here ────────────────────────── only if spec demands ──►

Transaction-level functional: pure input-to-output mapping. Right default for 90% of scoreboards.
Cycle-approximate: adds latency min/max windows or throughput checks on top of functional output — usually as separate assertions, not inside the model.
Cycle-accurate: justified only when the spec itself is cycle-exact (e.g., a fixed-latency DSP pipe) — and even then prefer an independently-written model (different author, different language).

A functional model class

The model is an ordinary class with the same interface shape as the scoreboard expects: transactions in, predicted transactions out. Here is one for a FIFO-with-ALU DUT: each input transaction carries an opcode and two operands, the DUT computes and queues the result.

systemverilog

typedef enum logic [1:0] { OP_ADD, OP_SUB, OP_AND, OP_XOR } op_e;

class alu_txn;
  op_e         op;
  logic [31:0] a, b;
  logic [31:0] result;     // filled by DUT (actual) or model (expected)
  logic        overflow;

  function string convert2string();
    return $sformatf("op=%s a=0x%08h b=0x%08h result=0x%08h ovf=%0b",
                     op.name(), a, b, result, overflow);
  endfunction
endclass

class alu_ref_model;
  mailbox #(alu_txn) mbx_in;    // from input monitor
  mailbox #(alu_txn) mbx_exp;   // to scoreboard (expected stream)

  function new(mailbox #(alu_txn) mbx_in, mailbox #(alu_txn) mbx_exp);
    this.mbx_in  = mbx_in;
    this.mbx_exp = mbx_exp;
  endfunction

  // Spec section 3.2: result and overflow semantics
  function void predict(alu_txn t);
    logic [32:0] wide;
    case (t.op)
      OP_ADD: begin
        wide       = {1'b0, t.a} + {1'b0, t.b};
        t.result   = wide[31:0];
        t.overflow = wide[32];
      end
      OP_SUB: begin
        wide       = {1'b0, t.a} - {1'b0, t.b};
        t.result   = wide[31:0];
        t.overflow = wide[32];        // borrow, per spec 3.2.1
      end
      OP_AND: begin t.result = t.a & t.b; t.overflow = 1'b0; end
      OP_XOR: begin t.result = t.a ^ t.b; t.overflow = 1'b0; end
    endcase
  endfunction

  task run();
    forever begin
      alu_txn t;
      mbx_in.get(t);
      predict(t);          // fill expected fields in place
      mbx_exp.put(t);
    end
  endtask
endclass

What makes this model good

It consumes the input-monitor stream, so it predicts from what the DUT actually received.
predict() is a pure function of the transaction — no clocks, no pins, trivially unit-testable.
Spec section numbers appear as comments at each behavioral decision — auditable against the document.
State (none here; a FIFO model would hold a queue) lives in the class, reset by a reset() method when the env signals reset.

Keeping the model honest

The model encodes your reading of the spec, and specs change. Treat the model as a spec artifact, not a TB convenience: every model behavior should carry a spec reference, and every spec revision should trigger a model diff review. When DUT and model disagree, the triage question is always "which one matches the spec?" — about a third of scoreboard mismatches in practice turn out to be model bugs or spec ambiguities, and those spec ambiguities are valuable findings in their own right.

Tag each predict() branch with the governing spec section — disagreements become document lookups, not archaeology.
Keep the model independently authored where possible; if the RTL designer also writes the model, shared misreadings cancel out and bugs escape.
For algorithm-heavy DUTs (crypto, compression, DSP) the golden model often already exists in C/C++. Import it through DPI-C rather than re-implementing — covered in the DPI-C lessons later in this course.

Interview angle

Common probes: "How accurate should a reference model be?" (transaction-level by default; cycle accuracy only when the spec is cycle-exact, because the model must stay cheaper and more trustworthy than the RTL), and "What do you do when model and RTL disagree?" (check both against the spec — never auto-trust either side).

Key takeaways

Default to a transaction-level functional model; add timing checks as assertions, not model detail.
Feed the model from the input monitor so predictions reflect what the DUT really received.
Annotate model code with spec sections — mismatch triage becomes a spec lookup.
Reuse existing C/C++ golden models via DPI-C instead of re-implementing them in SV.

Common pitfalls

Cycle-accurate model by default — you maintain two RTLs and clone the same spec misreadings.
Predicting from generator intent instead of observed input — legal DUT behavior becomes false mismatches.
Model state never reset on mid-test reset — every post-reset compare fails in a cascade.
Assuming the model is right in every mismatch — roughly a third of the time the model or spec is the bug.

Practice this lesson