Review Queue
The Review Queue closes the loop between automated evaluation and human judgment. Flagged eval runs, sampled passed runs, and manually submitted items all flow into a single queue where humans render verdicts. Those verdicts feed back into the detection rules, improving accuracy over time. This is the learning flywheel: flag, review, improve, repeat.Verdict Types
When reviewing a queued item, the reviewer assigns one of five verdicts:| Verdict | Meaning | Effect |
|---|---|---|
| True Positive | Real issue, correctly flagged | Confirms the detection rule is working. No rule change needed. |
| False Positive | No real issue, over-flagged | Detection rule is too sensitive. Feeds into threshold tuning. |
| True Negative | Correctly passed, no issue | Confirms the passing criteria are sound. Validates the eval. |
| False Negative | Missed a real issue | The eval system failed to catch a problem. Triggers new rule creation. |
| Defect Found | New bug category discovered | Neither the eval nor the human expected this. Creates a new detection category. |
Defect Found is the most valuable verdict. It means the review process surfaced something entirely new — a failure mode that no existing rule covers. These verdicts drive the most impactful rule improvements.
Review Sources
Items enter the Review Queue from three sources:Deterministic Flag
When an eval run exceeds a configured threshold (e.g., shift magnitude > 0.5 on a critical scenario), it is automatically flagged and added to the queue. These are the highest-priority items — the automated system detected a potential problem.Random Sample
A percentage of passed runs are sampled into the queue for human review. This catches false negatives — cases where the eval system said everything was fine, but a human might disagree.Manual Submission
Humans can manually submit items to the queue at any time. This is useful when an owner notices concerning agent behavior during normal operation that was not captured by an eval run.| Source | Priority | Purpose |
|---|---|---|
DETERMINISTIC_FLAG | High | Automated detection caught something |
RANDOM_SAMPLE | Medium | Spot-check to catch false negatives |
MANUAL | Varies | Human-initiated review |
Passed-Run Sampling
Passed-run sampling is the mechanism that catches false negatives. Without it, the eval system could silently miss real problems, and you would never know.How It Works
- After each eval batch completes, the system identifies runs that passed (no flag triggered)
- A random 5% of passed runs are selected for human review
- Selected runs are added to the Review Queue with source
RANDOM_SAMPLE - The reviewer examines the run and assigns a verdict
Why This Matters
If a reviewer assigns False Negative to a sampled run, it means the eval system missed a real problem. This directly triggers:- A new detection rule covering the missed case
- A review of similar passed runs to estimate the scope
- An update to the relevant scenario’s expected behavior
Ruleset Versioning
Verdicts feed into ruleset improvements. Each time detection rules are updated based on review outcomes, a new ruleset version is created.Version Tracking
| Field | Description |
|---|---|
version | Monotonically increasing version number |
created_at | When this version was created |
changes | Description of what changed |
tpr | True Positive Rate (sensitivity) |
fpr | False Positive Rate |
verdict_basis | Number of verdicts that informed this version |
Workflow
The day-to-day workflow for the Review Queue is straightforward:Triage pending items
Open the Review Queue. Items are sorted by priority: deterministic flags first, then manual submissions, then random samples. Each item shows the eval scenario, the agent’s output, and the expected behavior.
Examine the run
Review the agent’s actual output compared to the expected behavior. For flagged runs, check what threshold was exceeded. For sampled runs, assess whether the pass was legitimate.
Submit verdict
Assign one of the five verdict types. Add notes explaining your reasoning — these notes are invaluable for future rule refinement.
Submitting a Verdict
Filtering the Queue
pending, reviewed, dismissed), source, risk level, or date range to focus your review sessions.
API Reference
| Method | Endpoint | Description |
|---|---|---|
GET | /api/v1/review/queue?status=pending | List pending review items with filters |
POST | /api/v1/review/queue/{id}/verdicts | Submit a verdict for a queued item |
POST | /api/v1/review/sample-passed-runs | Trigger passed-run sampling for a batch |
GET | /api/v1/review/rulesets | List ruleset versions with TPR/FPR metrics |