Part 3 · Constraint Randomization · Intermediate
Error Injection Architecture
Error knobs with dist-controlled rate, selective named-block disable, dedicated error subclasses, scoreboard handshake, and a CRC/length-error example at 5%.
Negative testing needs architecture too
Error injection inverts the usual contract: the generator must produce deliberately illegal stimulus — bad CRC, wrong length, reserved opcodes — at a controlled rate, while the rest of the transaction stays legal, and while the scoreboard knows which transactions are intentionally broken. Three architectural problems follow: how to violate legality selectively (not wholesale), how to control the rate , and how to tell the checkers . Ad-hoc answers (disable valid_c and hope; hand-corrupt after randomize with no flag) produce the classic negative-testing disasters: stimulus illegal in unintended ways, and scoreboards either failing on intended errors or excusing real bugs.
ERROR INJECTION DATA FLOW
err_pct knob (config)
|
v
inj_err dist { 1 := err_pct, ... } <- RATE control
|
+-- inj_err == 0 -------------------------+
| |
v v
err_kind dist (CRC / LEN / ...) fully legal txn
| |
v |
TARGETED violation: |
named legality sub-block for THAT |
field disabled or implication-gated; |
all OTHER legality still enforced |
| |
+-------------------+---------------------+
v
txn.inj_err / txn.err_kind FLAGS TRAVEL WITH TXN
|
+------------+------------+
v v
driver drives scoreboard reads flags:
corrupted fields inj_err -> EXPECT reject/
error response; clean txn
mishandled -> real DUT bugMechanism 1: partitioned legality + gated violation
The named-block strategy from the layering lesson pays off here: split legality so each independently violable rule has its own block (or gate every rule with the error flags inside one block). Then error injection disables — or implication-bypasses — exactly the rule under attack, and nothing else.
typedef enum bit [1:0] { ERR_NONE, ERR_CRC, ERR_LEN } ekind_e;
class eth_txn;
rand bit [13:0] length; // claimed length field
rand bit [13:0] actual_len; // payload actually generated
rand bit [31:0] crc;
rand bit inj_err; // error flags travel WITH the txn
rand ekind_e err_kind;
// knobs
int unsigned err_pct = 0;
// ---- rate ----
constraint rate_c {
inj_err dist { 1 := err_pct, 0 := (100 - err_pct) };
(inj_err == 0) -> err_kind == ERR_NONE;
(inj_err == 1) -> err_kind != ERR_NONE;
err_kind dist { ERR_NONE := 1, ERR_CRC := 1, ERR_LEN := 1 };
}
// ---- legality, partitioned per violable rule ----
constraint len_legal_c {
(err_kind != ERR_LEN) -> (length == actual_len); // gated rule
actual_len inside {[64:1518]}; // never violated
}
constraint len_err_c {
(err_kind == ERR_LEN) -> (length != actual_len &&
length inside {[64:1518]}); // wrong but plausible
}
// CRC handled post-solve: correct CRC is computed, then corrupted
function void post_randomize();
crc = compute_crc(actual_len); // golden CRC
if (err_kind == ERR_CRC) crc = crc ^ 32'h0000_0001; // 1-bit corruption
endfunction
function bit [31:0] compute_crc(bit [13:0] n);
return {18'h0, n} * 32'h9E37_79B9; // stand-in for real CRC32
endfunction
endclassNote the division of labor. Field-relationship violations (length mismatch) are done in constraints, gated by err_kind — the solver produces a violation that is wrong in exactly one dimension and plausible in all others (length still in the legal range, so the DUT exercises its mismatch check, not its range check). Computed-field violations (CRC) are done in post_randomize — you cannot reasonably constrain a CRC, so compute the golden value and corrupt it deterministically. The single-bit flip is deliberate: it tests CRC checking specifically, while a random garbage CRC might accidentally collide or trip unrelated decode logic.
Mechanism 2: the dedicated error subclass
// When error logic outgrows gating - many kinds, multi-field
// corruption, its own knobs - promote it to a subclass:
class eth_err_txn extends eth_txn;
rand bit [3:0] corrupt_byte; // extra error-only rand state
constraint always_err_c { inj_err == 1; }
constraint kinds_c { err_kind dist { ERR_CRC := 7, ERR_LEN := 3 }; }
endclass
// Sequence mixes populations explicitly:
// 90 clean eth_txn + 10 eth_err_txn per 100 (or via factory
// override in UVM to swap ALL items to the error type for a
// dedicated negative test).
// Comparison vs gated single class:
// gated: one class, rate dial, error mixed into normal traffic
// -> best for soak / background error rate
// subclass: error logic isolated, reviewable, factory-swappable
// -> best for dedicated negative tests and complex
// corruption (multi-field, stateful)
// Most benches use BOTH: gated class for ambient errors, subclass
// for targeted negative campaigns.Mechanism 3: the scoreboard handshake
Injected errors are only useful if checkers expect them. The handshake is simple but must be explicit: the error flags are fields of the transaction, the monitor-side transaction carries the observed outcome, and the scoreboard's compare function branches on the flags. Without it you get the two failure modes — scoreboard red on every intended error (test unusable) or a blanket “ignore errors” rule that also excuses real corruptions.
// Scoreboard fragment - flags drive EXPECTATION, not exemption
function void check_pair(eth_txn sent, eth_mon_txn got);
if (sent.inj_err) begin
// EXPECT rejection: DUT must flag/drop this frame
if (!got.dut_flagged_err)
$error("DUT ACCEPTED injected %s error (len=%0d)",
sent.err_kind.name(), sent.length);
else
err_caught_count++; // negative-path coverage
end
else begin
// Clean txn: any DUT error flag is a real bug
if (got.dut_flagged_err)
$error("DUT flagged error on clean frame");
else if (got.payload_hash != sent.payload_hash)
$error("payload corrupted through DUT");
end
endfunction
// Plus a covergroup on (inj_err, err_kind, dut_flagged_err) so the
// negative path has closure criteria like any other feature.Putting it together: 5% CRC/length errors
module demo;
initial begin
eth_txn t = new();
int n_err = 0, n_caught_locally = 0;
t.err_pct = 5; // the entire campaign: one knob
repeat (10000) begin
assert(t.randomize());
n_err += t.inj_err;
// local sanity: claimed vs actual mismatch ONLY when ERR_LEN
if ((t.length != t.actual_len) != (t.err_kind == ERR_LEN))
$fatal(1, "error gating broken");
end
$display("injected %0d/10000 (expect ~500)", n_err);
end
endmoduleThe histogram check matters here even more than usual: rate knobs feeding dist are exactly the construct that silently skews (see distribution debugging) if an implication or another constraint couples to inj_err. Measure ~500/10000 once at bring-up, and keep the (inj_err, err_kind, dut_flagged_err) cross in coverage so the rate stays honest across the project.
Interview angle
The interview form is “how do you inject protocol errors cleanly?” and the answer has four named parts: rate via a knob-fed dist on an inj_err flag; targeting via partitioned/gated legality so exactly one rule is violated while everything else stays legal (plausibly wrong beats randomly wrong); computed fields like CRC corrupted in post_randomize from the golden value; and the scoreboard handshake — flags travel in the transaction and drive expectation of rejection, never blanket exemption. Volunteer the two classic failures (scoreboard red on intended errors; ignore-rule excusing real bugs) and the subclass-vs-gated trade-off, and mention negative-path coverage closure — most candidates forget errors need coverage too.
Key takeaways
Rate, targeting, and checker-awareness are three separate mechanisms — design all three explicitly.
Violate one rule at a time via gated/partitioned legality — plausibly-wrong stimulus tests the intended check.
Computed fields (CRC) corrupt in post_randomize from the golden value; single-bit flips target the checker.
Error flags are transaction fields driving scoreboard expectations — expect rejection, never exempt.
Gated class for ambient error rate; dedicated subclass (factory-swappable) for negative campaigns.
Common pitfalls
Disabling all of valid_c to inject one error — stimulus illegal in unintended dimensions, DUT response meaningless.
Corrupting after randomize without setting a flag — scoreboard cannot distinguish intent from bug.
Blanket 'ignore errored transactions' in the scoreboard — real corruption rides out under the exemption.
Random-garbage CRC instead of golden-then-flip — may collide to valid or trip unrelated decode paths.
No coverage on the negative path — error injection runs for months without ever proving the DUT catches each kind.