Reading Results
The scoreboard, finding status pills, and how to translate a Battle Mode outcome into a merge decision.
Overview
A Battle Mode run produces a lot of output: the scoreboard, per-finding statuses, exploit code, fix diffs, retest results, and a verified PR. This page is about how to read all of that — and more importantly, how to translate it into a decision about whether to merge, deploy, or go back to the drawing board.
The scoreboard
The top of the battle results page shows three counts. These are the headline numbers.
Exploited
Findings the Red Team successfully attacked in a deployed environment. These are proven real.
Fixed
Patches applied AND survived the retest phase. These are verified closures.
Bypassed
Patches that initially looked like fixes but failed when Red Team came back with a new exploit.
How the numbers relate
| Relation | What it tells you |
|---|---|
| Every Fixed implies a corresponding Exploited | You can't fix what wasn't exploited |
| Exploited − Fixed − Bypassed | Findings with no surviving fix at all ("Unfixed" tier) |
| Exploited = Fixed, Bypassed = 0 | Ideal outcome — every real bug got closed and every closure held |
| High Bypassed count | First-pass fixes were superficial; underlying class of vuln not addressed |
Per-finding status pills
Each finding in the battle has three pills indicating where it ended up across the six phases.
| Pill | Color | Phase | Meaning |
|---|---|---|---|
| Exploited | Red | 2 or 3 | Red Team landed an exploit. Finding is proven real. |
| Fix applied | Blue | 4 | Blue Team produced a candidate patch that closed the original exploit |
| Defended or Bypassed | Green / Red | 5 | Whether the fix survived a fresh Red Team attack |
A fully-closed finding shows: Red (exploited) → Blue (fix applied) → Defended (survived retest). Anything short of that is a finding that needs human attention before you merge or deploy.
The finding card
Click any finding to expand its card. Inside you'll see:
| Field | What it shows |
|---|---|
| Exploit code | The actual test Red Team wrote — usually a Foundry test showing how the vulnerability was triggered |
| Fix diff | The code change Blue Team proposed, as a diff against the original |
| Retest outcome | Defended or Bypassed. If Bypassed, the new exploit that got around the fix |
| Finding context | The original CARA finding that seeded this battle |
For a Defended finding, the finding card is effectively a complete remediation record: the bug, the exploit, the fix, the proof the fix holds. That's the audit-trail artifact you'd want if someone asked "how do you know this vulnerability is actually closed?"
The verified PR
Battle Mode produces a pull request on the target repo containing every Defended fix. Bypassed and Unfixed findings are not in the PR — deliberately, because the point of the PR is "these are the changes we've proven survive attack."
Read the PR the way you'd read any PR, with two adjustments:
- Cross-reference the PR description against the scoreboard. The PR should list exactly the Defended findings. If there's a mismatch, something's off — probably a Fix that should have been excluded.
- Don't skip review on trust. Battle Mode verifies that a fix closes the exploit, but you still want a human engineer to confirm the fix is idiomatic, doesn't change public interfaces, doesn't introduce regressions in tests. The
forge testthe sandbox ran is necessary, not sufficient.
Once reviewed, the PR merges like any other. Merging moves the corresponding findings to Resolved in the normal finding lifecycle.
Handling Bypassed and Unfixed
These are the findings the automated loop couldn't close. They deserve the most human attention.
| Status | What it means | What to do |
|---|---|---|
| Bypassed | A fix that looked like it worked didn't hold up | Read the new exploit. Often the right move is a deeper refactor rather than another patch attempt |
| Unfixed | Blue Team tried 3 times and couldn't close the exploit | Usually architectural. Route to human engineering |
Do not deploy with open Exploited-but-Unfixed or Bypassed findings without an explicit accept-the-risk decision. Battle Mode produced a working exploit against live contracts; that exploit will work against mainnet unless something changed.
Merge and deploy decisions
A pragmatic rule of thumb:
| Result shape | Merge? | Deploy? |
|---|---|---|
| All Exploited findings are Fixed (Defended) | Review and merge the PR | Ready |
| Some findings Bypassed | Review verified PR separately | Not ready |
| Findings Unfixed | Merge verified portion if helpful | Not ready |
| Discovery phase found new Criticals | Regardless of other outcomes | Not ready |
The verified PR can often be merged even when the overall battle result is mixed, because merging a genuinely-Defended fix is never wrong. But deployment is a stricter bar: you want zero open Exploited findings, not "most of them are closed."
CSV export
For teams with audit-trail or compliance requirements, Battle Mode results can be exported as CSV from the battle page. The export includes finding ID, severity, Exploited / Fixed / Bypassed status, retest outcome, and links to the exploit and fix artifacts.
Useful for:
- External auditor hand-off ("here's what we ran, here's what we found, here's what we fixed").
- Internal governance records — showing that due diligence occurred before a deploy.
- Tracking remediation velocity over time by rolling up multiple battles into a single historical view.
Two sample read-outs
Ship it
A realistic result for a well-audited protocol, just before mainnet deploy:
| Metric | Value |
|---|---|
| Exploited | 4 (two Highs, two Mediums from the original audit; zero from Discovery) |
| Fixed | 4 |
| Bypassed | 0 |
| Verified PR | 4 commits, one per finding |
Human reviews the PR, merges, and the deploy can proceed.
Not yet
A problematic version of the same battle:
| Metric | Value |
|---|---|
| Exploited | 6 (four from above + two new Criticals from Discovery) |
| Fixed | 3 |
| Bypassed | 2 |
| Unfixed | 1 |
| Verified PR | 3 commits |
Read: the audit under-counted the attack surface; two fixes looked good but failed under fresh attack; one issue is architectural. Merge the 3-commit PR if clean, but don't deploy until the Bypassed and Unfixed items are resolved by humans.