Part 6 · Testbench Architecture · Intermediate
Reference Models
Fidelity spectrum, writing a transaction-level functional model class, keeping model and RTL spec-synced, and DPI-C preview.
The fidelity spectrum
A reference model answers one question — "given these inputs, what must the DUT output?" — and the right level of detail in that answer is a design decision. Model too little and bugs slip through; model too much and you have re-implemented the RTL, doubled the maintenance bill, and likely cloned the same misreading of the spec.
REFERENCE MODEL FIDELITY SPECTRUM
txn-level functional cycle-approximate cycle-accurate
──────────────────── ───────────────── ──────────────
"out = f(in)" "+ latency windows, "exact per-cycle
no timing at all throughput bounds" pin behavior"
cost: LOW MEDIUM VERY HIGH
checks: data integrity, + perf envelopes, + timing bugs
ordering, flags arb fairness
risk: misses timing bugs some timing escapes becomes 2nd RTL;
spec bugs cloned
◄── DEFAULT: start here ────────────────────────── only if spec demands ──►Transaction-level functional: pure input-to-output mapping. Right default for 90% of scoreboards.
Cycle-approximate: adds latency min/max windows or throughput checks on top of functional output — usually as separate assertions, not inside the model.
Cycle-accurate: justified only when the spec itself is cycle-exact (e.g., a fixed-latency DSP pipe) — and even then prefer an independently-written model (different author, different language).
A functional model class
The model is an ordinary class with the same interface shape as the scoreboard expects: transactions in, predicted transactions out. Here is one for a FIFO-with-ALU DUT: each input transaction carries an opcode and two operands, the DUT computes and queues the result.
typedef enum logic [1:0] { OP_ADD, OP_SUB, OP_AND, OP_XOR } op_e;
class alu_txn;
op_e op;
logic [31:0] a, b;
logic [31:0] result; // filled by DUT (actual) or model (expected)
logic overflow;
function string convert2string();
return $sformatf("op=%s a=0x%08h b=0x%08h result=0x%08h ovf=%0b",
op.name(), a, b, result, overflow);
endfunction
endclass
class alu_ref_model;
mailbox #(alu_txn) mbx_in; // from input monitor
mailbox #(alu_txn) mbx_exp; // to scoreboard (expected stream)
function new(mailbox #(alu_txn) mbx_in, mailbox #(alu_txn) mbx_exp);
this.mbx_in = mbx_in;
this.mbx_exp = mbx_exp;
endfunction
// Spec section 3.2: result and overflow semantics
function void predict(alu_txn t);
logic [32:0] wide;
case (t.op)
OP_ADD: begin
wide = {1'b0, t.a} + {1'b0, t.b};
t.result = wide[31:0];
t.overflow = wide[32];
end
OP_SUB: begin
wide = {1'b0, t.a} - {1'b0, t.b};
t.result = wide[31:0];
t.overflow = wide[32]; // borrow, per spec 3.2.1
end
OP_AND: begin t.result = t.a & t.b; t.overflow = 1'b0; end
OP_XOR: begin t.result = t.a ^ t.b; t.overflow = 1'b0; end
endcase
endfunction
task run();
forever begin
alu_txn t;
mbx_in.get(t);
predict(t); // fill expected fields in place
mbx_exp.put(t);
end
endtask
endclassWhat makes this model good
It consumes the input-monitor stream, so it predicts from what the DUT actually received.
predict() is a pure function of the transaction — no clocks, no pins, trivially unit-testable.
Spec section numbers appear as comments at each behavioral decision — auditable against the document.
State (none here; a FIFO model would hold a queue) lives in the class, reset by a reset() method when the env signals reset.
Keeping the model honest
The model encodes your reading of the spec, and specs change. Treat the model as a spec artifact, not a TB convenience: every model behavior should carry a spec reference, and every spec revision should trigger a model diff review. When DUT and model disagree, the triage question is always "which one matches the spec?" — about a third of scoreboard mismatches in practice turn out to be model bugs or spec ambiguities, and those spec ambiguities are valuable findings in their own right.
Tag each predict() branch with the governing spec section — disagreements become document lookups, not archaeology.
Keep the model independently authored where possible; if the RTL designer also writes the model, shared misreadings cancel out and bugs escape.
For algorithm-heavy DUTs (crypto, compression, DSP) the golden model often already exists in C/C++. Import it through DPI-C rather than re-implementing — covered in the DPI-C lessons later in this course.
Interview angle
Common probes: "How accurate should a reference model be?" (transaction-level by default; cycle accuracy only when the spec is cycle-exact, because the model must stay cheaper and more trustworthy than the RTL), and "What do you do when model and RTL disagree?" (check both against the spec — never auto-trust either side).
Key takeaways
Default to a transaction-level functional model; add timing checks as assertions, not model detail.
Feed the model from the input monitor so predictions reflect what the DUT really received.
Annotate model code with spec sections — mismatch triage becomes a spec lookup.
Reuse existing C/C++ golden models via DPI-C instead of re-implementing them in SV.
Common pitfalls
Cycle-accurate model by default — you maintain two RTLs and clone the same spec misreadings.
Predicting from generator intent instead of observed input — legal DUT behavior becomes false mismatches.
Model state never reset on mid-test reset — every post-reset compare fails in a cascade.
Assuming the model is right in every mismatch — roughly a third of the time the model or spec is the bug.