Part 6 · Testbench Architecture · Intermediate

Reference Models

Fidelity spectrum, writing a transaction-level functional model class, keeping model and RTL spec-synced, and DPI-C preview.

The fidelity spectrum

A reference model answers one question — "given these inputs, what must the DUT output?" — and the right level of detail in that answer is a design decision. Model too little and bugs slip through; model too much and you have re-implemented the RTL, doubled the maintenance bill, and likely cloned the same misreading of the spec.

diagram
REFERENCE MODEL FIDELITY SPECTRUM

  txn-level functional            cycle-approximate           cycle-accurate
  ────────────────────            ─────────────────           ──────────────
  "out = f(in)"                   "+ latency windows,         "exact per-cycle
  no timing at all                 throughput bounds"          pin behavior"

  cost:    LOW                    MEDIUM                      VERY HIGH
  checks:  data integrity,        + perf envelopes,           + timing bugs
           ordering, flags          arb fairness
  risk:    misses timing bugs     some timing escapes         becomes 2nd RTL;
                                                              spec bugs cloned

  ◄── DEFAULT: start here ────────────────────────── only if spec demands ──►
  • Transaction-level functional: pure input-to-output mapping. Right default for 90% of scoreboards.

  • Cycle-approximate: adds latency min/max windows or throughput checks on top of functional output — usually as separate assertions, not inside the model.

  • Cycle-accurate: justified only when the spec itself is cycle-exact (e.g., a fixed-latency DSP pipe) — and even then prefer an independently-written model (different author, different language).


A functional model class

The model is an ordinary class with the same interface shape as the scoreboard expects: transactions in, predicted transactions out. Here is one for a FIFO-with-ALU DUT: each input transaction carries an opcode and two operands, the DUT computes and queues the result.

systemverilog
typedef enum logic [1:0] { OP_ADD, OP_SUB, OP_AND, OP_XOR } op_e;

class alu_txn;
  op_e         op;
  logic [31:0] a, b;
  logic [31:0] result;     // filled by DUT (actual) or model (expected)
  logic        overflow;

  function string convert2string();
    return $sformatf("op=%s a=0x%08h b=0x%08h result=0x%08h ovf=%0b",
                     op.name(), a, b, result, overflow);
  endfunction
endclass

class alu_ref_model;
  mailbox #(alu_txn) mbx_in;    // from input monitor
  mailbox #(alu_txn) mbx_exp;   // to scoreboard (expected stream)

  function new(mailbox #(alu_txn) mbx_in, mailbox #(alu_txn) mbx_exp);
    this.mbx_in  = mbx_in;
    this.mbx_exp = mbx_exp;
  endfunction

  // Spec section 3.2: result and overflow semantics
  function void predict(alu_txn t);
    logic [32:0] wide;
    case (t.op)
      OP_ADD: begin
        wide       = {1'b0, t.a} + {1'b0, t.b};
        t.result   = wide[31:0];
        t.overflow = wide[32];
      end
      OP_SUB: begin
        wide       = {1'b0, t.a} - {1'b0, t.b};
        t.result   = wide[31:0];
        t.overflow = wide[32];        // borrow, per spec 3.2.1
      end
      OP_AND: begin t.result = t.a & t.b; t.overflow = 1'b0; end
      OP_XOR: begin t.result = t.a ^ t.b; t.overflow = 1'b0; end
    endcase
  endfunction

  task run();
    forever begin
      alu_txn t;
      mbx_in.get(t);
      predict(t);          // fill expected fields in place
      mbx_exp.put(t);
    end
  endtask
endclass

What makes this model good

  1. It consumes the input-monitor stream, so it predicts from what the DUT actually received.

  2. predict() is a pure function of the transaction — no clocks, no pins, trivially unit-testable.

  3. Spec section numbers appear as comments at each behavioral decision — auditable against the document.

  4. State (none here; a FIFO model would hold a queue) lives in the class, reset by a reset() method when the env signals reset.


Keeping the model honest

The model encodes your reading of the spec, and specs change. Treat the model as a spec artifact, not a TB convenience: every model behavior should carry a spec reference, and every spec revision should trigger a model diff review. When DUT and model disagree, the triage question is always "which one matches the spec?" — about a third of scoreboard mismatches in practice turn out to be model bugs or spec ambiguities, and those spec ambiguities are valuable findings in their own right.

  • Tag each predict() branch with the governing spec section — disagreements become document lookups, not archaeology.

  • Keep the model independently authored where possible; if the RTL designer also writes the model, shared misreadings cancel out and bugs escape.

  • For algorithm-heavy DUTs (crypto, compression, DSP) the golden model often already exists in C/C++. Import it through DPI-C rather than re-implementing — covered in the DPI-C lessons later in this course.

Interview angle

Common probes: "How accurate should a reference model be?" (transaction-level by default; cycle accuracy only when the spec is cycle-exact, because the model must stay cheaper and more trustworthy than the RTL), and "What do you do when model and RTL disagree?" (check both against the spec — never auto-trust either side).

Key takeaways

  • Default to a transaction-level functional model; add timing checks as assertions, not model detail.

  • Feed the model from the input monitor so predictions reflect what the DUT really received.

  • Annotate model code with spec sections — mismatch triage becomes a spec lookup.

  • Reuse existing C/C++ golden models via DPI-C instead of re-implementing them in SV.

Common pitfalls

  • Cycle-accurate model by default — you maintain two RTLs and clone the same spec misreadings.

  • Predicting from generator intent instead of observed input — legal DUT behavior becomes false mismatches.

  • Model state never reset on mid-test reset — every post-reset compare fails in a cascade.

  • Assuming the model is right in every mismatch — roughly a third of the time the model or spec is the bug.