Part 8 · Senior & Interview Prep · Intermediate
Q&A: Scenario & Design Questions
Verify a FIFO end to end, TB for an arbiter, verifying a CDC path, debugging a hung test, and the tapeout-eve bug report.
Q: How would you verify a FIFO end to end? (the full senior answer)
Structure the answer as plan → environment → checks → coverage → closure. Plan: extract features (ordering, flags at exact boundaries, backpressure, reset mid-traffic), risk-rank (full/empty corners and reset highest), log spec ambiguities (simultaneous push+pop at full?). Environment: interface with clocking blocks; generator → mailbox → driver; independent input/output monitors; queue-based reference model in a scoreboard. Checks: scoreboard compares in-order data; bound assertion module for no-push-at-full, no-pop-at-empty, count consistency, flag definitions. Coverage: levels including exact boundaries, flag transitions, operation×state crosses, backpressure scenarios. Closure: seed sweep, merged coverage, hole analysis, bug-rate convergence, sign-off review.
gen ─mb─► drv ─┐ ┌─► in_mon ──► ┌────────────┐
fifo_if ──► DUT ──► fifo_if ─► out_mon ─► │ scoreboard │
assertions bound to DUT ▲ │ (queue mdl)│
(full/empty/count/data) │ └────────────┘
coverage: boundaries, crosses, backpressureFollow-up: "Which bug class does each checker catch that the others miss?" — Scoreboard: data corruption and ordering. Assertions: cycle-accurate protocol violations (push accepted at full) the scoreboard only sees later as corruption. Coverage catches nothing — it proves what was exercised. All three or you have holes.
Junior vs senior: a junior describes driving pushes and pops. A senior gives the plan-first lifecycle, the three-checker separation of concerns, and names the specific corner inventory (boundaries, simultaneous ops, reset mid-traffic).
Q: Design a testbench for a 4-master arbiter.
The structural answer is four active agents, one passive checker : one driver/monitor pair per master port (each with its own vif), a grant-side monitor, and a scoreboard checking three property families — exclusivity (at most one grant at a time), legality (grant only to a requester), and fairness/starvation (every continuous requester granted within N cycles, per the arbitration policy). Stimulus must create contention deliberately: all-request bursts, staggered arrivals, one master hogging, randomized priorities if programmable.
// the three property families, as assertions:
a_excl: assert property (@(posedge clk) disable iff (!rst_n)
$onehot0(gnt)); // at most one grant
a_legal: assert property (@(posedge clk) disable iff (!rst_n)
|gnt |-> |(gnt & req)); // grant implies its req
a_starv: assert property (@(posedge clk) disable iff (!rst_n)
(req[0] && !gnt[0]) |-> ##[1:64] gnt[0]); // bound per policy
// + scoreboard models the documented policy (round-robin pointer, etc.)
// + coverage: contention patterns (1,2,3,4 simultaneous requesters),
// back-to-back grants, pointer wrapsFollow-up: "How do you verify round-robin specifically?" — Model the pointer in the scoreboard and predict the exact winner each cycle; assert grant matches prediction. Pure properties can check starvation bounds, but exact rotation needs a reference model.
Junior vs senior: a junior builds four drivers. A senior leads with the three property families, knows fairness needs a policy model not just assertions, and designs stimulus for contention rather than hoping randomness finds it.
Q: How would you verify a CDC path?
Three layers. Structural: CDC lint (Spyglass-class tools) proving every crossing has a legal synchronizer structure — this is the layer simulation cannot replace. Protocol: assertions on the crossing contract — single-bit: source value stable long enough to be captured; handshake: req held until ack, no new req before ack falls; gray-coded pointers: exactly one bit changes per transition. Dynamic: simulate with truly asynchronous clock ratios (including near-same and extreme), and run metastability injection (randomized synchronizer delay models) so the design's tolerance is actually exercised — plus formal on the handshake FSMs where feasible.
// gray-code contract on an async FIFO pointer crossing:
a_gray: assert property (@(posedge wr_clk) disable iff (!rst_n)
$countones(wr_ptr_gray ^ $past(wr_ptr_gray)) <= 1);
// handshake contract:
a_hold: assert property (@(posedge src_clk) disable iff (!rst_n)
(req && !ack_sync) |=> req); // req held until ack seenFollow-up: "Why can't normal RTL simulation find CDC bugs?" — RTL sim has no metastability: a violated setup window still resolves to a clean value, in zero time, deterministically. The bug class physically does not exist in plain RTL simulation — hence lint for structure, injection for resilience.
Junior vs senior: a junior says "add two-flop synchronizers and test it." A senior gives the three layers and explains why simulation alone is structurally blind here — the single most senior-flavored sentence in this bank.
Q: A test hangs and the simulation never finishes — walk me through the debug.
First locate the hang, then classify it . Is sim time advancing? If wall clock burns but sim time is frozen, it is a zero-delay loop (always_comb feedback, a wait(expr) on an expression no time passes to change). If sim time advances forever, something blocks eternally: get() on an empty mailbox, @(ev) on a never-fired event (the missed-trigger race), a fork join waiting on a stuck branch, a semaphore key leaked by a killed process, or a DUT handshake that never completes. Then find who: pause in the interactive debugger and inspect process states, or bisect with timestamped prints at each component boundary — the last component that printed is upstream of the block.
HANG TRIAGE TREE
sim time frozen?
├─ YES → zero-delay loop: comb feedback / wait() that never yields
│ → simulator profile or pause shows the spinning process
└─ NO → blocking wait:
├─ mailbox.get() — producer dead? (gen finished early?)
├─ @(event) — same-timestep missed trigger?
├─ join — which branch is stuck? (print per branch)
├─ semaphore — key leaked by disable fork?
└─ DUT handshake — req with no ack: RTL bug or bad stimulus
always-on defense: global watchdog
initial begin #1ms; $fatal(1, "global timeout"); end
+ objection/activity tracking to report WHO was still busyFollow-up: "How do you make hangs debuggable before they happen?" — A global watchdog with a fatal and a status dump (queue sizes, outstanding transaction counts, per-component heartbeats), so a hang becomes a report naming the stuck component instead of a 4-hour silent regression slot.
Junior vs senior: a junior adds prints at random. A senior splits sim-time-frozen vs advancing first, walks the blocking-primitive checklist, and has the watchdog-plus-status-dump pattern already in the bench.
Q: One day to tapeout, a bug report lands — what do you check?
Triage for decision-grade facts in hours, not a fix : (1) Reproduce it — exact seed, test, RTL version; does it reproduce on the tapeout candidate? (2) Is it real silicon behavior or a testbench/checker artifact? (3) Severity and reach: which feature, what traffic pattern triggers it, how likely in the field, is there a software workaround or a register/strap mitigation? (4) Scope the RTL delta if fixed — one line in one block, or a re-verification cascade? (5) Hand management a decision package: fix-and-slip vs ship-with-erratum vs respin-risk, each with evidence. The verification deliverable on tapeout eve is the risk assessment, not the patch.
TAPEOUT-EVE TRIAGE CHECKLIST
□ reproducible on the tapeout RTL? (seed, test, version pinned)
□ DUT bug vs TB artifact? (waveform to root cause class)
□ trigger conditions (how reachable in real use?)
□ workaround exists? (sw sequence / register / strap)
□ fix blast radius (re-verify what, for how long?)
□ decision package up (options + risks + recommendation)
the answer they want: a decision process under pressure —
not "I'd fix it fast"Follow-up: "The bug is real but the fix needs a week of re-verification — what do you recommend?" — Depends on the triage facts: field likelihood × failure cost vs slip cost. If a software workaround fully contains it, ship with a documented erratum; if it corrupts data on a common path, the slip is cheaper than a respin. Recommend with the evidence; the project decides.
Junior vs senior: a junior promises a fast fix. A senior runs the triage checklist, separates DUT-bug from TB-artifact early, and frames the output as a decision package with options and risk — which is what the question is really probing.
Key takeaways
FIFO answer = lifecycle + three-checker separation (scoreboard/assertions/coverage) + corner inventory.
Arbiter = exclusivity, legality, starvation properties + a policy reference model + engineered contention.
CDC = lint for structure, assertions for contract, async-ratio sim + metastability injection for resilience.
Hang debug = sim-time frozen vs advancing first, then the blocking-primitive checklist; watchdog by default.
Tapeout-eve bug = triage to a decision package: reproduce, classify, scope, recommend.
Common pitfalls
Jumping to testbench components before stating the plan — the answer's frame matters.
Claiming simulation verifies CDC — interviewers wait for exactly this mistake.
Debugging hangs with random prints instead of classifying the hang type first.
Answering the tapeout question with heroics instead of a triage process.