Part 3 · Constraint Randomization · Intermediate

Error Injection Architecture

Error knobs with dist-controlled rate, selective named-block disable, dedicated error subclasses, scoreboard handshake, and a CRC/length-error example at 5%.

Negative testing needs architecture too

Error injection inverts the usual contract: the generator must produce deliberately illegal stimulus — bad CRC, wrong length, reserved opcodes — at a controlled rate, while the rest of the transaction stays legal, and while the scoreboard knows which transactions are intentionally broken. Three architectural problems follow: how to violate legality selectively (not wholesale), how to control the rate , and how to tell the checkers . Ad-hoc answers (disable valid_c and hope; hand-corrupt after randomize with no flag) produce the classic negative-testing disasters: stimulus illegal in unintended ways, and scoreboards either failing on intended errors or excusing real bugs.

diagram

ERROR INJECTION DATA FLOW

  err_pct knob (config)
       |
       v
  inj_err dist { 1 := err_pct, ... }     <- RATE control
       |
       +-- inj_err == 0 -------------------------+
       |                                         |
       v                                         v
  err_kind dist (CRC / LEN / ...)          fully legal txn
       |                                         |
       v                                         |
  TARGETED violation:                            |
    named legality sub-block for THAT            |
    field disabled or implication-gated;         |
    all OTHER legality still enforced            |
       |                                         |
       +-------------------+---------------------+
                           v
              txn.inj_err / txn.err_kind  FLAGS TRAVEL WITH TXN
                           |
              +------------+------------+
              v                         v
          driver drives           scoreboard reads flags:
          corrupted fields          inj_err -> EXPECT reject/
                                    error response; clean txn
                                    mishandled -> real DUT bug

Mechanism 1: partitioned legality + gated violation

The named-block strategy from the layering lesson pays off here: split legality so each independently violable rule has its own block (or gate every rule with the error flags inside one block). Then error injection disables — or implication-bypasses — exactly the rule under attack, and nothing else.

systemverilog

typedef enum bit [1:0] { ERR_NONE, ERR_CRC, ERR_LEN } ekind_e;

class eth_txn;
  rand bit [13:0] length;        // claimed length field
  rand bit [13:0] actual_len;    // payload actually generated
  rand bit [31:0] crc;
  rand bit        inj_err;       // error flags travel WITH the txn
  rand ekind_e    err_kind;

  // knobs
  int unsigned err_pct = 0;

  // ---- rate ----
  constraint rate_c {
    inj_err dist { 1 := err_pct, 0 := (100 - err_pct) };
    (inj_err == 0) -> err_kind == ERR_NONE;
    (inj_err == 1) -> err_kind != ERR_NONE;
    err_kind dist { ERR_NONE := 1, ERR_CRC := 1, ERR_LEN := 1 };
  }

  // ---- legality, partitioned per violable rule ----
  constraint len_legal_c {
    (err_kind != ERR_LEN) -> (length == actual_len);   // gated rule
    actual_len inside {[64:1518]};                     // never violated
  }
  constraint len_err_c {
    (err_kind == ERR_LEN) -> (length != actual_len &&
                              length inside {[64:1518]}); // wrong but plausible
  }
  // CRC handled post-solve: correct CRC is computed, then corrupted
  function void post_randomize();
    crc = compute_crc(actual_len);          // golden CRC
    if (err_kind == ERR_CRC) crc = crc ^ 32'h0000_0001;  // 1-bit corruption
  endfunction

  function bit [31:0] compute_crc(bit [13:0] n);
    return {18'h0, n} * 32'h9E37_79B9;      // stand-in for real CRC32
  endfunction
endclass

Note the division of labor. Field-relationship violations (length mismatch) are done in constraints, gated by err_kind — the solver produces a violation that is wrong in exactly one dimension and plausible in all others (length still in the legal range, so the DUT exercises its mismatch check, not its range check). Computed-field violations (CRC) are done in post_randomize — you cannot reasonably constrain a CRC, so compute the golden value and corrupt it deterministically. The single-bit flip is deliberate: it tests CRC checking specifically, while a random garbage CRC might accidentally collide or trip unrelated decode logic.

Mechanism 2: the dedicated error subclass

systemverilog

// When error logic outgrows gating - many kinds, multi-field
// corruption, its own knobs - promote it to a subclass:
class eth_err_txn extends eth_txn;
  rand bit [3:0] corrupt_byte;       // extra error-only rand state

  constraint always_err_c { inj_err == 1; }
  constraint kinds_c      { err_kind dist { ERR_CRC := 7, ERR_LEN := 3 }; }
endclass

// Sequence mixes populations explicitly:
//   90 clean eth_txn  +  10 eth_err_txn  per 100 (or via factory
//   override in UVM to swap ALL items to the error type for a
//   dedicated negative test).
// Comparison vs gated single class:
//   gated:    one class, rate dial, error mixed into normal traffic
//             -> best for soak / background error rate
//   subclass: error logic isolated, reviewable, factory-swappable
//             -> best for dedicated negative tests and complex
//                corruption (multi-field, stateful)
// Most benches use BOTH: gated class for ambient errors, subclass
// for targeted negative campaigns.

Mechanism 3: the scoreboard handshake

Injected errors are only useful if checkers expect them. The handshake is simple but must be explicit: the error flags are fields of the transaction, the monitor-side transaction carries the observed outcome, and the scoreboard's compare function branches on the flags. Without it you get the two failure modes — scoreboard red on every intended error (test unusable) or a blanket “ignore errors” rule that also excuses real corruptions.

systemverilog

// Scoreboard fragment - flags drive EXPECTATION, not exemption
function void check_pair(eth_txn sent, eth_mon_txn got);
  if (sent.inj_err) begin
    // EXPECT rejection: DUT must flag/drop this frame
    if (!got.dut_flagged_err)
      $error("DUT ACCEPTED injected %s error (len=%0d)",
             sent.err_kind.name(), sent.length);
    else
      err_caught_count++;                  // negative-path coverage
  end
  else begin
    // Clean txn: any DUT error flag is a real bug
    if (got.dut_flagged_err)
      $error("DUT flagged error on clean frame");
    else if (got.payload_hash != sent.payload_hash)
      $error("payload corrupted through DUT");
  end
endfunction
// Plus a covergroup on (inj_err, err_kind, dut_flagged_err) so the
// negative path has closure criteria like any other feature.

Putting it together: 5% CRC/length errors

systemverilog

module demo;
  initial begin
    eth_txn t = new();
    int n_err = 0, n_caught_locally = 0;
    t.err_pct = 5;                          // the entire campaign: one knob
    repeat (10000) begin
      assert(t.randomize());
      n_err += t.inj_err;
      // local sanity: claimed vs actual mismatch ONLY when ERR_LEN
      if ((t.length != t.actual_len) != (t.err_kind == ERR_LEN))
        $fatal(1, "error gating broken");
    end
    $display("injected %0d/10000 (expect ~500)", n_err);
  end
endmodule

The histogram check matters here even more than usual: rate knobs feeding dist are exactly the construct that silently skews (see distribution debugging) if an implication or another constraint couples to inj_err. Measure ~500/10000 once at bring-up, and keep the (inj_err, err_kind, dut_flagged_err) cross in coverage so the rate stays honest across the project.

Interview angle

The interview form is “how do you inject protocol errors cleanly?” and the answer has four named parts: rate via a knob-fed dist on an inj_err flag; targeting via partitioned/gated legality so exactly one rule is violated while everything else stays legal (plausibly wrong beats randomly wrong); computed fields like CRC corrupted in post_randomize from the golden value; and the scoreboard handshake — flags travel in the transaction and drive expectation of rejection, never blanket exemption. Volunteer the two classic failures (scoreboard red on intended errors; ignore-rule excusing real bugs) and the subclass-vs-gated trade-off, and mention negative-path coverage closure — most candidates forget errors need coverage too.

Key takeaways

Rate, targeting, and checker-awareness are three separate mechanisms — design all three explicitly.
Violate one rule at a time via gated/partitioned legality — plausibly-wrong stimulus tests the intended check.
Computed fields (CRC) corrupt in post_randomize from the golden value; single-bit flips target the checker.
Error flags are transaction fields driving scoreboard expectations — expect rejection, never exempt.
Gated class for ambient error rate; dedicated subclass (factory-swappable) for negative campaigns.

Common pitfalls

Disabling all of valid_c to inject one error — stimulus illegal in unintended dimensions, DUT response meaningless.
Corrupting after randomize without setting a flag — scoreboard cannot distinguish intent from bug.
Blanket 'ignore errored transactions' in the scoreboard — real corruption rides out under the exemption.
Random-garbage CRC instead of golden-then-flip — may collide to valid or trip unrelated decode paths.
No coverage on the negative path — error injection runs for months without ever proving the DUT catches each kind.

Practice this lesson