Part 6 · Testbench Architecture · Intermediate
The Regression Flow
Test-list and seed matrix, parallel dispatch, result aggregation, triage buckets, rerun loop, and nightly vs per-commit tiers.
From test list to verdict board
A regression is a matrix: a test list (which tests, with which knobs) crossed with seeds (how many random universes per test). The dispatcher expands the matrix into individual simulation jobs, farms them out in parallel, then scrapes each log for its verdict banner and exit code. Everything earlier in this topic — banners, seeds in headers, knobs, grep-able logs — exists so this flow can run without humans in the loop.
REGRESSION FLOW
test list seeds
───────── ─────
smoke_test x1 random per run,
burst_test x20 recorded in log
err_inject x10
soak_test x5
│
▼
┌──────────────────────────────────────────────┐
│ DISPATCH: N jobs in parallel (farm / cores) │
│ job = (test, seed, knobs) → sim → run.log │
└──────────────────────────────────────────────┘
│
▼
AGGREGATE per job:
PASS banner + exit 0 → pass
FAIL banner / exit != 0 → fail + first-error signature
no banner / timeout → infra-error (counts as fail!)
│
▼
TRIAGE: bucket fails by first-error signature
bucket A (14 fails): "SCB txn mismatch addr=0x1040" → one bug
bucket B (2 fails): "timeout waiting for grant" → another bug
│
▼
RERUN failures with logged seeds → reproduce → debug → fix
└──────────── repeat until green ────────────────┘Aggregation and triage buckets
Thirty failing runs rarely mean thirty bugs. The aggregator extracts each run's first-error signature — the first ERROR line, with volatile fields like timestamps and data values masked — and clusters identical signatures into buckets. Each bucket is one suspected bug; engineers debug one representative run per bucket, not all thirty.
# Aggregate: one verdict line per run
for log in results/*/run.log; do
if grep -q "TEST PASSED" "$log"; then
echo "PASS $log"
else
# first-error signature, with numbers masked for clustering
sig=$(grep -m1 "ERROR" "$log" | sed 's/0x[0-9a-fA-F]*/0xN/g; s/#[0-9]*/#N/g')
if [ -z "$sig" ]; then sig="NO_BANNER_OR_CRASH"; fi
echo "FAIL $log $sig"
fi
done | tee summary.txt
# Triage buckets: cluster failures by signature
grep "^FAIL" summary.txt | cut -d' ' -f4- | sort | uniq -c | sort -rnThe rerun-failures loop
Aggregate the overnight run; cluster failures into signature buckets.
Pick one representative (test, seed) per bucket — the seed is in the log header.
Replay it with +VERBOSITY=DEBUG and waves on; debug; fix RTL or TB.
Rerun just the failing (test, seed) pairs to confirm each fix.
Rerun the full matrix — fixes can unmask new failures downstream of the old one.
Nightly vs per-commit pyramids
Not every change deserves the full matrix. Healthy projects layer regressions like a pyramid: tiny and fast at the commit gate, broad and slow at night, exhaustive before tape-out milestones.
REGRESSION PYRAMID
┌────────────┐
│ WEEKLY / │ full matrix, soak tests, long seeds,
│ MILESTONE │ coverage merge → closure report
├────────────┤
│ NIGHTLY │ all tests x many seeds (hours, farm)
│ │ triage board updated every morning
├────────────┤
│ PER-COMMIT │ smoke list x 1-2 seeds (minutes)
│ (gate) │ blocks merge on failure
└────────────┘
wide base = run constantly, must be fast and rock-stable
narrow top = run rarely, allowed to be slowPer-commit smoke must be fast and flake-free — a flaky gate trains people to ignore it.
Nightly catches the cross-test, multi-seed interactions a smoke list cannot.
Track pass-rate trends per bucket over days — a slowly growing bucket is a real bug, not noise.
Coverage merge belongs to the nightly/weekly tiers — single-run coverage means little.
Interview angle
"Walk me through your regression flow" — matrix → parallel dispatch → banner/exit-code scrape → signature buckets → seeded rerun loop.
"Thirty failures overnight, where do you start?" — bucket by first-error signature; debug one representative per bucket.
"What runs on every commit vs nightly?" — fast deterministic smoke at the gate; broad seed sweeps and coverage merge at night.
Key takeaways
A regression is a (test x seed) matrix dispatched in parallel and scraped mechanically.
Cluster failures by first-error signature — buckets map to bugs, runs do not.
No-banner and timeout runs count as failures — infrastructure errors must not pass silently.
Layer the pyramid: fast smoke per commit, broad nightly sweeps, exhaustive milestone runs.
Common pitfalls
Treating each failing run as a separate bug — thirty runs in one bucket are one bug.
Rerunning failures with fresh seeds — "it passed this time" proves nothing was fixed.
Counting timed-out or crashed runs as neither pass nor fail — they silently vanish from reports.
A slow, flaky commit gate — developers learn to bypass it, and the gate is worse than none.