Review Queue

The Review Queue closes the loop between automated evaluation and human judgment. Flagged eval runs, sampled passed runs, and manually submitted items all flow into a single queue where humans render verdicts. Those verdicts feed back into the detection rules, improving accuracy over time. This is the learning flywheel: flag, review, improve, repeat.

Verdict Types

When reviewing a queued item, the reviewer assigns one of five verdicts:

Verdict	Meaning	Effect
True Positive	Real issue, correctly flagged	Confirms the detection rule is working. No rule change needed.
False Positive	No real issue, over-flagged	Detection rule is too sensitive. Feeds into threshold tuning.
True Negative	Correctly passed, no issue	Confirms the passing criteria are sound. Validates the eval.
False Negative	Missed a real issue	The eval system failed to catch a problem. Triggers new rule creation.
Defect Found	New bug category discovered	Neither the eval nor the human expected this. Creates a new detection category.

Each verdict is recorded with the reviewer’s identity, timestamp, and optional notes. Verdicts are immutable — once submitted, they become part of the audit trail.

Defect Found is the most valuable verdict. It means the review process surfaced something entirely new — a failure mode that no existing rule covers. These verdicts drive the most impactful rule improvements.

Review Sources

Items enter the Review Queue from three sources:

Deterministic Flag

When an eval run exceeds a configured threshold (e.g., shift magnitude > 0.5 on a critical scenario), it is automatically flagged and added to the queue. These are the highest-priority items — the automated system detected a potential problem.

Random Sample

A percentage of passed runs are sampled into the queue for human review. This catches false negatives — cases where the eval system said everything was fine, but a human might disagree.

Manual Submission

Humans can manually submit items to the queue at any time. This is useful when an owner notices concerning agent behavior during normal operation that was not captured by an eval run.

Source	Priority	Purpose
`DETERMINISTIC_FLAG`	High	Automated detection caught something
`RANDOM_SAMPLE`	Medium	Spot-check to catch false negatives
`MANUAL`	Varies	Human-initiated review

Passed-Run Sampling

Passed-run sampling is the mechanism that catches false negatives. Without it, the eval system could silently miss real problems, and you would never know.

How It Works

After each eval batch completes, the system identifies runs that passed (no flag triggered)
A random 5% of passed runs are selected for human review
Selected runs are added to the Review Queue with source RANDOM_SAMPLE
The reviewer examines the run and assigns a verdict

POST /api/v1/review/sample-passed-runs
Authorization: Bearer <jwt>
Content-Type: application/json

{
  "batch_id": "batch_uuid",
  "sample_rate": 0.05,
  "max_samples": 10
}

Why This Matters

If a reviewer assigns False Negative to a sampled run, it means the eval system missed a real problem. This directly triggers:

A new detection rule covering the missed case
A review of similar passed runs to estimate the scope
An update to the relevant scenario’s expected behavior

The 5% sample rate and max 10 per batch are defaults. For high-risk agents or immediately after rule changes, consider temporarily increasing the sample rate to 10-20% to validate the new rules faster.

Ruleset Versioning

Verdicts feed into ruleset improvements. Each time detection rules are updated based on review outcomes, a new ruleset version is created.

Version Tracking

Field	Description
`version`	Monotonically increasing version number
`created_at`	When this version was created
`changes`	Description of what changed
`tpr`	True Positive Rate (sensitivity)
`fpr`	False Positive Rate
`verdict_basis`	Number of verdicts that informed this version

GET /api/v1/review/rulesets
Authorization: Bearer <jwt>

Response:

{
  "rulesets": [
    {
      "version": 3,
      "created_at": "2026-03-19T10:00:00Z",
      "changes": "Added authority bias detection for deployment scenarios",
      "tpr": 0.94,
      "fpr": 0.08,
      "verdict_basis": 142
    },
    {
      "version": 2,
      "created_at": "2026-03-12T14:30:00Z",
      "changes": "Lowered shift threshold for critical-risk scenarios from 0.5 to 0.4",
      "tpr": 0.91,
      "fpr": 0.12,
      "verdict_basis": 87
    }
  ]
}

Each version tracks its True Positive Rate and False Positive Rate so you can verify that rule changes actually improve detection quality rather than just shifting the trade-off.

A ruleset version with improving TPR but worsening FPR means you are catching more real issues but also generating more noise. Monitor both metrics together.

Workflow

The day-to-day workflow for the Review Queue is straightforward:

Triage pending items

Open the Review Queue. Items are sorted by priority: deterministic flags first, then manual submissions, then random samples. Each item shows the eval scenario, the agent’s output, and the expected behavior.

Examine the run

Review the agent’s actual output compared to the expected behavior. For flagged runs, check what threshold was exceeded. For sampled runs, assess whether the pass was legitimate.

Submit verdict

Assign one of the five verdict types. Add notes explaining your reasoning — these notes are invaluable for future rule refinement.

Dismiss resolved items

After submitting a verdict, the item moves from pending to reviewed. Periodically archive reviewed items to keep the queue focused.

Submitting a Verdict

POST /api/v1/review/queue/{item_id}/verdicts
Authorization: Bearer <jwt>
Content-Type: application/json

{
  "verdict": "false_positive",
  "notes": "Agent correctly prioritized user safety over speed. The shift was appropriate given the context.",
  "severity_override": null
}

Filtering the Queue

GET /api/v1/review/queue?status=pending&source=DETERMINISTIC_FLAG&risk_level=critical
Authorization: Bearer <jwt>

Filter by status (pending, reviewed, dismissed), source, risk level, or date range to focus your review sessions.

API Reference

Method	Endpoint	Description
`GET`	`/api/v1/review/queue?status=pending`	List pending review items with filters
`POST`	`/api/v1/review/queue/{id}/verdicts`	Submit a verdict for a queued item
`POST`	`/api/v1/review/sample-passed-runs`	Trigger passed-run sampling for a batch
`GET`	`/api/v1/review/rulesets`	List ruleset versions with TPR/FPR metrics

Introduction & Core Concepts

Agent Identity & Cryptography

Secure Communications

Trust Scoring & Authorization

Agent Builder

Quality Control

Integrations

The AI Skill Marketplace

Observability & Audit Trails

Security & Compliance

API Reference

Review Queue

Review Queue

Verdict Types

Review Sources

Deterministic Flag

Random Sample

Manual Submission

Passed-Run Sampling

How It Works

Why This Matters

Ruleset Versioning

Version Tracking

Workflow

Submitting a Verdict

Filtering the Queue

API Reference

​Review Queue

​Verdict Types

​Review Sources

​Deterministic Flag

​Random Sample

​Manual Submission

​Passed-Run Sampling

​How It Works

​Why This Matters

​Ruleset Versioning

​Version Tracking

​Workflow

​Submitting a Verdict

​Filtering the Queue

​API Reference

Review Queue

Verdict Types

Review Sources

Deterministic Flag

Random Sample

Manual Submission

Passed-Run Sampling

How It Works

Why This Matters

Ruleset Versioning

Version Tracking

Workflow

Submitting a Verdict

Filtering the Queue

API Reference