Public REST API — v1
Programmatic access to the same QA primitives the ArtificialQA UI exposes. Trigger test executions from CI/CD, integrate evaluation into an internal agent platform, or build dashboards on top of your test runs.
/api/v2/public/. Additive changes (new endpoints, new fields, new optional params) ship without a version bump. Current revision: v1.6.4.
Base URLs
- Production:
https://app.artificialqa.com/api/v1/public - Local dev:
http://localhost:7878/api/v1/public
OpenAPI spec: /openapi.yaml — feed it to any OpenAPI tool (Postman, Stoplight, Speakeasy, openapi-generator, etc.) to generate a typed client.
Chat-friendly surface? The same primitives are exposed over the Model Context Protocol at /api/v1/mcp. Use that route when integrating with Claude Desktop, Cursor, Continue, or any custom MCP-aware agent.
1. Authentication
Every request needs an API key in the Authorization header:
Authorization: Bearer aqa_live_xxxxxxxxxxxxxxxxxxxxxxxxx
Generating a key
- Sign in to ArtificialQA → click your avatar → API Keys & MCP → API Keys tab → New API Key.
- Pick a project the key will be bound to. The key can only read/write data within that single project.
- Pick a scope:
read— GET only. Use for dashboards, reporting, read-only integrations.write— GET + POST. Required for triggering executions / evaluations.
- Pick an expiration date (v1.6.3 — required). Min 1 day, max 12 months from now. Default 6 months. There is no "never expires" option — set a calendar reminder to rotate.
- Copy the plaintext key — it's shown once. Store it in a secret manager.
Who can mint keys:
org_adminof the key's org — yes, any project.super_admin— yes (bypass).project_admin— no (was allowed in v1.6.2, removed in v1.6.3). Project admins can still see their org's keys listed, but cannot create new ones — static keys are an org-tier service credential.tester/auditor/member— no.
Keys are project-scoped (v1.6.2). They never expose data from another project, neither across orgs nor across projects within the same org. If you need access to multiple projects, create one key per project. Lost a key? Revoke it from the same UI and generate a new one.
Audit semantics — keys act as a service (v1.6.3)
When an API key creates or modifies data via this REST API, the audit log records the actor as service:<key prefix> rather than the email of the human who minted the key. The minting human is captured separately as details.createdByUserId for traceability. At a glance the audit reader sees "this was a machine, here's which key" instead of misleadingly attributing thousands of CI operations to one engineer.
Legacy audit entries (pre-v1.6.3) keep their original apikey:<prefix> actor string — the discriminator on the api_keys table (actorKind) tells the audit writer which format to emit.
Project binding (v1.6.2)
Every API key is bound to exactly one project. Behavior:
- Endpoints that read or write project-scoped resources (TestCases, TestSuites, TestPlans, AgentConnections, Executions, EvaluationRuns, etc.) automatically filter by the key's project. You don't pass
projectIdfor it to work. - Endpoints that accept a
projectIdfield in the body (e.g.POST /test-cases,POST /test-plans) treat the field as optional:- Omitted → uses the key's project binding implicitly.
- Matches the key's binding → fine, accepted.
- Mismatches →
400 validation_failedwith both ids in thedetailfield.
- Cross-project reads on a single resource id →
404 not_found. The row exists in the DB but the key can't see it. - Idempotency keys are project-scoped: two projects in the same org can reuse the same idempotency-key string without colliding on each other's cached results.
- Org-level resources (Evaluators — both global and org-scoped custom; Subscription; AI providers) stay org-scoped. Your project-bound key still sees them.
OAuth (MCP only)
A second authentication path — oat_* Bearer tokens — is available exclusively on the MCP endpoint (/api/v1/mcp). It's the path used by Claude Desktop, Cursor, and other human-driven MCP clients: instead of pasting a static key, the user signs in to ArtificialQA in a browser and picks a project from a consent screen. The REST API (/api/v1/public) keeps using aqa_* static keys only — OAuth tokens are not accepted on REST routes. See Connect via MCP for setup.
Errors
| Status | Code | When |
|---|---|---|
| 401 | missing_token | No Authorization header. |
| 401 | invalid_token | Key not found or malformed. |
| 401 | token_revoked | Key revoked from the UI. |
| 401 | token_expired | Key past its expiresAt. |
| 403 | insufficient_scope | read key tried a POST. |
2. Conventions
Async + polling
Every state-changing endpoint is asynchronous. The POST returns 202 Accepted immediately with an executionId / evaluationId and a statusUrl. Clients poll the matching GET until status reaches a terminal state.
| Endpoint | Terminal states |
|---|---|
/executions/{id} | completed, failed, cancelled |
/evaluations/{id} | completed, failed |
There are no webhooks in v1.
Polling cadence
A reasonable client polls with exponential backoff, capped at 10 seconds:
delay = min(10, 1.5 ** attempt) seconds
Most executions complete in under 5 minutes; very long Playwright-based browser executions can run for 15+ minutes.
Errors — RFC 7807 problem+json
Every error response has content-type: application/problem+json and the body:
{
"type": "https://docs.artificialqa.com/errors/<code>",
"title": "Short human-readable summary",
"status": 404,
"code": "not_found",
"detail": "Optional verbose explanation."
}
Switch on code, not on title or detail. Titles may be reworded between releases; codes are part of the contract.
The complete catalog:
| Status | Code | Meaning |
|---|---|---|
| 400 | missing_path_param | Malformed URL — usually a missing UUID. |
| 400 | validation_failed | Body, query, or header failed validation. |
| 401 | missing_token / invalid_token / token_revoked / token_expired | Auth (see section 1). |
| 402 | quota_exceeded | Monthly plan quota hit (executions, evaluations, or tokens). |
| 403 | insufficient_scope | Key scope is read but route requires write. |
| 403 | mcp_disabled | Org has the MCP feature flag off (MCP route only). |
| 404 | not_found | Resource doesn't exist in this org / project. |
| 409 | invalid_state | Resource is in a state that blocks the operation. |
| 409 | empty_plan | Plan has zero active test cases. |
| 409 | evaluation_in_progress | A prior evaluation on this execution is still pending or running. |
| 409 | no_evaluators | Org has no active evaluators configured. |
| 409 | duplicate_membership | Membership row already exists (suite item or plan-suite link). |
| 429 | rate_limit_exceeded | Per-key rate limit hit. See below. |
| 500 | internal_error | Unexpected server error — file a ticket. |
Idempotency
POST endpoints accept an Idempotency-Key header (1-255 chars). Reusing the same key within the project returns the existing row with idempotent: true and HTTP 200 instead of creating a duplicate and returning 202.
POST /api/v1/public/test-plans/<id>/executions
Authorization: Bearer aqa_live_...
Content-Type: application/json
Idempotency-Key: ci-build-4815162342
{ "agentConnectionId": "..." }
Idempotency keys are scoped per-project (v1.6.2), persisted indefinitely. A typical pattern is to use the CI build ID, commit SHA, or a UUIDv4 generated per attempt. For POST /test-cases/import the cache TTL is 24h.
idempotencyKey field. If both are sent, the header wins. Migrate to the header.
Rate limits
Each key is rate-limited independently. On every response:
X-RateLimit-Limit: <max-per-window>
X-RateLimit-Remaining: <left-in-window>
X-RateLimit-Reset: <unix-seconds-when-window-resets>
A 429 adds Retry-After: <seconds>. The body is a standard problem+json with code: "rate_limit_exceeded". The rate-limit pool is shared between aqa_* static keys and oat_* OAuth tokens — both keyed by prefix.
Quotas
Executions, evaluations, and LLM tokens count against the org's monthly plan quota. Hitting any returns 402 quota_exceeded. The response body spells out which quota:
{
"code": "quota_exceeded",
"detail": "Monthly execution limit reached (250/250)."
}
3. The flow
The standard CI-integration flow is 4 calls:
1. GET /test-plans discover plan id
2. GET /agent-connections discover connection id
3. POST /test-plans/{id}/executions trigger execution
(optionally evaluate=true)
4. GET /executions/{id} (poll) wait for terminal status
If evaluate: true was set in step 3, the evaluation runs automatically afterwards using every evaluator allowed by the org's billing tier — GET /executions/{id}.evaluation carries the summary. Otherwise:
5. (optional) GET /evaluators discover evaluator ids
6. POST /executions/{id}/evaluations trigger evaluation explicitly
(optionally with evaluatorIds)
7. GET /evaluations/{id} (poll) wait for terminal status
The shortest happy path (evaluate: true) is 2 calls + polling.
4. Endpoints
Test plans
GET /test-plans
Lists every plan in the API key's project.
Query params: status, limit.
Response 200 — array of TestPlanSummary:
[
{
"id": "744cd22e-bf76-4e8c-8060-9f283a64796c",
"name": "TP_Browser_ArtQA",
"description": null,
"status": "completed",
"environment": null,
"defaultAgentConnectionId": "12b17e7b-...",
"suiteCount": 3,
"executionCount": 7,
"createdAt": "2026-05-21T13:44:09.103Z",
"updatedAt": "2026-06-10T17:01:55.211Z"
}
]
GET /test-plans/{id}
Detail with suite breakdown. Useful for picking a plan in a UI.
POST /test-plans — create (v1.4)
Creates an empty plan. Required: name. Optional: description, status (default "draft"), agentConnectionId, environment, scheduleCron, projectId (auto-picked if the org has exactly one active project; required otherwise — the error enumerates candidates). The runner is the sole writer of running and completed; create accepts {draft, ready} only.
Response 201 — full TestPlanDetail with createdBy="apikey:<prefix>", suites=[], executionCount=0.
PATCH /test-plans/{id} — update (v1.4)
Partial update. Patchable: name, description, status (subset {draft, ready, archived}), agentConnectionId, environment, scheduleCron. Not patchable: projectId (would orphan suite memberships), createdBy, createdAt, isActive (use DELETE).
PATCH on a plan in running is rejected with 409 invalid_state — the runner is mid-snapshot.
DELETE /test-plans/{id} — soft-delete (v1.4)
Flips isActive=false. Idempotent. Past executions that snapshotted the plan keep working. Rejected with 409 invalid_state if the plan is currently running.
POST /test-plans/{id}/suites — link a suite (v1.4)
{ "testSuiteId": "uuid", "sortOrder": 5 }
sortOrder is optional — omit it and the server assigns MAX(sortOrder)+1. Suite linking is project-bound: cross-project link → 409 invalid_state with both project ids in detail. Duplicate link → 409 duplicate_membership.
DELETE /test-plans/{id}/suites/{suiteId} — unlink (v1.4)
Removes the link. Idempotent. Bumps plan.updatedAt. Rejected with 409 invalid_state if the plan is currently running.
TestPlanSummary shape:
{
id: uuid,
name: string,
description: string | null,
status: "draft" | "ready" | "running" | "completed" | "archived",
environment: string | null,
defaultAgentConnectionId: uuid | null,
suiteCount: int,
executionCount: int,
createdAt: ISO,
updatedAt: ISO,
}
Test suites
GET /test-suites
Cursor-paginated list: { data: TestSuiteSummary[], nextCursor: string | null }. Query params: cursor, limit (1-200, default 50), projectId, containsTestCaseId, tags, search, createdAfter, updatedSince, isActive.
GET /test-suites/{id}
Full detail. items[] is ordered by (sortOrder ASC, id ASC), each carrying an embedded testCase summary. 404 on soft-deleted suites unless ?isActive=false.
POST /test-suites
Creates an empty suite. Required: name. Optional: description, tags, projectId (auto-picked when the org has one active project; required otherwise). Membership goes through the sub-resource endpoints below — by design you author the structure separately from the content.
PATCH /test-suites/{id}
Partial update. Patchable: name, description (send null to clear), tags, isActive. Not patchable: projectId, testCount (mutated only via membership endpoints), createdBy, createdAt.
DELETE /test-suites/{id}
Soft-delete. Idempotent. Join rows survive so any plan that referenced the suite keeps resolving.
POST /test-suites/{id}/test-cases
Adds a membership.
{ "testCaseId": "uuid", "sortOrder": 5 }
Project-bound: a test case can only be added to a suite in the same project. Cross-project add → 409 invalid_state. Duplicate → 409 duplicate_membership.
DELETE /test-suites/{id}/test-cases/{tcId}
Removes a membership. Idempotent. Removing a TC that wasn't a member returns 204 with no DB writes.
Test cases
Full CRUD plus bulk import. Lets a CI / IaC / agent script author and curate test cases without touching the UI.
Contract decisions:
typeis"simple"(single input/output) or"conversational"(multi-turn dialogue). The DB-levelagent_taskvalue is rejected at the public boundary by design — "what's being tested" is orthogonal to "shape of the test case".sourceis always"manual"(single create) or"imported"(bulk). The caller cannot set it."generated"is reserved for the generator runner.reviewStatusis always"approved"on API-authored test cases. PATCH cannot change it.- Soft-delete only.
DELETEsetsisActive=falseandlifecycleStatus="archived". The row stays so executions / suite items keep their FK references. design(JSONB) is the unified shape. Legacyinput/expectedOutput/turnscolumns coexist for backwards compat — every read returns both surfaces; writes accept either, withdesignwinning when both are sent.- PII auto-detection runs server-side on every create and on PATCHes that touch an input-bearing field. Results land in
piiDetected+piiTypesand are exposed on list summary too — filter without paging every row.
GET /test-cases
Cursor-paginated list: { data: TestCaseSummary[], nextCursor: string | null }.
Query params: cursor, limit (1-200, default 50), suiteId, type, difficulty, lifecycleStatus, reviewStatus, source, industryId, piiDetected, tags (repeated, ANY-match), search, createdAfter, updatedSince, isActive.
GET /test-cases/{id}
Full detail — every column of TestCase minus encrypted / internal fields. Returns both legacy and design shapes.
POST /test-cases
Required: type plus EITHER design, OR input/expectedOutput, OR turns (for conversational). The Zod boundary catches the conversational-without-turns case before any DB write.
Project resolution: if the calling org has exactly one active project, projectId is auto-picked. Otherwise it is required and the error detail enumerates the available projects with (id, name).
Response 201 — full TestCaseDetail. source=manual, reviewStatus=approved.
PATCH /test-cases/{id}
Partial update. Body must contain at least one field. Cannot change type, source, or reviewStatus. PII is re-scanned only when the patch touches input / expectedOutput / turns / design / tags.
DELETE /test-cases/{id}
Soft-delete. Idempotent — re-DELETE on an archived row returns the same 204 without a second DB write.
POST /test-cases/import — bulk
Up to 500 items per request. Per-row independence: a bad row goes into errors[] and the batch continues.
Idempotency: supplying Idempotency-Key makes the call replay-safe — cached against (project, key, "test_case") in the bulk_imports table for 24 hours. A repeated invocation within that window returns the original result verbatim with idempotent: true and zero DB mutations.
{
"createdIds": ["uuid-1", "uuid-2", ...],
"errors": [{ "index": 2, "code": "not_found", "detail": "industryId ... does not exist" }],
"idempotent": false
}
Status codes: 200 full success (or cache replay), 207 Multi-Status partial success (the same status applies on replay), 400 schema rejection, 402 quota.
Agent connections
GET /agent-connections
Lists connections. Never returns secrets — only the fields needed to choose a connection for an execution.
Query params: protocol (http | browser | websocket), isActive.
[
{
"id": "12b17e7b-2cec-4206-986d-9df390be2de3",
"name": "ArtificialQA Test Bank",
"protocol": "browser",
"baseUrl": "https://app.artificialqa.com",
"isActive": true,
"createdAt": "2026-05-19T09:12:11.502Z"
}
]
GET /agent-connections/{id} — detail (v1.5)
Returns the full AgentConnectionDetail with secrets masked as "***<last4>". Use this to inspect existing config before PATCH-ing back.
POST /agent-connections — create (v1.5)
Required: name, protocol, baseUrl, authConfig, messageConfig. Optional: description, environment (defaults to "production"), preChatConfig, postChatConfig, templateVars, environments, projectId.
Caller sends raw secrets in the config blobs; the helper encrypts them (AES-256-GCM) before INSERT and stored as enc:<iv>:<tag>:<ciphertext>.
PATCH /agent-connections/{id} — update (v1.5)
Partial update. Patchable: name, description, baseUrl, environment, isActive, authConfig, preChatConfig, messageConfig, postChatConfig, templateVars, environments. Not patchable: projectId, protocol.
Masked-secret round-trips: submitting a masked value (e.g. "***c123") at a secret-named path falls back to the existing decrypted DB value — you can read the detail, edit any plaintext field, and PATCH the whole config back without juggling secrets. Only NEW plaintext values overwrite the stored secret.
Config blobs are replaced per-key, not deep-merged. Omitting a key in your PATCH drops it from the column. Explicit null clears a nullable blob.
DELETE /agent-connections/{id} (v1.5)
Soft-delete. Idempotent. Past executions / plans keep working. To re-activate: PATCH { "isActive": true }.
POST /agent-connections/{id}/test — smoke-test (v1.5)
Synchronous smoke-test against the configured agent. Optional body: { "runtimeVars": { ... } } merged over persisted templateVars for this call only.
Per-protocol behavior:
http— 30s timeout. Returns latency, upstream HTTP status, and a truncated response indetails.browser— spawns Chromium. Can take 30-60s; set client timeout to ~90s.websocket— not implemented yet. Returns{ status: "completed", ok: false, error: "websocket protocol test not supported in v1.5" }.
Rate-limited to 1 call per 60s per (connectionId, apiKeyId).
{
"status": "completed",
"ok": true,
"latencyMs": 1234,
"error": null,
"details": { "protocol": "http", "statusCode": 200, "response": "..." },
"startedAt": "2026-06-17T10:00:00.000Z",
"completedAt": "2026-06-17T10:00:01.234Z"
}
status is a discriminator. v1.5 always sets "completed" because the endpoint is synchronous. Pattern-match on status as a union so a future "pending" + async polling doesn't break clients.
Executions
POST /test-plans/{id}/executions — trigger
Starts a background run. Returns 202 with the new id; 200 if idempotent.
{
"agentConnectionId": "12b17e7b-...",
"evaluate": true,
"evaluatorIds": ["ad12-...", "be34-..."],
"evaluatorWeights": { "ad12-...": 2.0 },
"runtimeVars": { "documentId": "doc-42" }
}
| Field | Type | Required | Notes |
|---|---|---|---|
agentConnectionId | uuid | yes | — |
evaluate | bool, default false | no | Auto-triggers the evaluation when execution finishes. |
evaluatorIds | uuid[] | no | Only honoured when evaluate: true. Restricts the auto-eval. Omit → every tier-allowed evaluator runs. |
evaluatorWeights | {uuid: number > 0} | no | Only honoured when evaluate: true. Per-evaluator weight overrides. Omitted entries use the evaluator's configured default weight. |
runtimeVars | {string: string} | no | Template var substitutions for the agent connection. |
idempotencyKey | string, deprecated | no | Prefer the Idempotency-Key header. |
Response 202:
{
"executionId": "9e2f-...",
"status": "pending",
"statusUrl": "https://app.artificialqa.com/api/v1/public/executions/9e2f-..."
}
Response 200 (idempotent replay): same shape plus "idempotent": true.
Common 409s: invalid_state (plan not ready/completed), empty_plan.
GET /executions — list
Filter: testPlanId, status, limit. Returns ExecutionSummary[] — same shape as detail but without results[].
GET /executions/{id} — detail
Returns the execution + every result + the most recent evaluation (if any).
{
"id": "9e2f-...",
"testPlanId": "744cd22e-...",
"agentConnectionId": "12b17e7b-...",
"runNumber": 8,
"status": "completed",
"totalCases": 12,
"completedCases": 12,
"failedCases": 1,
"durationMs": 184320,
"errorMessage": null,
"createdAt": "2026-06-10T17:02:11.040Z",
"completedAt": "2026-06-10T17:05:15.360Z",
"results": [
{
"id": "...",
"testCaseId": "...",
"executionStatus": "SUCCESS",
"responseValidity": "VALID",
"responseTimeMs": 8234,
"success": true,
"retryCount": 0,
"finalized": true,
"createdAt": "..."
}
],
"evaluation": {
"id": "ad12-...",
"status": "completed",
"runNumber": 1,
"overallScore": 0.87,
"passRate": 0.92,
"passed": true,
"totalCases": 12,
"passedCases": 11,
"failedCases": 1,
"durationMs": 32104
}
}
executionStatus: SUCCESS, ERROR, TIMEOUT, SKIPPED. responseValidity: VALID, EMPTY, MALFORMED. Only SUCCESS + VALID results enter evaluation.
Evaluators
GET /evaluators
Lists the evaluators visible to the org — platform-default globals plus any org-specific custom evaluators. Use it to discover the id you pass as evaluatorIds.
Query params: includeBlocked (default true).
Security. Never exposes the evaluator's agentConfig (encrypted credentials) or systemPrompt (calibration prompt).
| Field | Notes |
|---|---|
isGlobal | true for platform-default; false for org-specific (Enterprise). |
planAllowed | true if your billing tier permits this evaluator. false means a POST /evaluations referencing it will silently drop it from the run. Filter on this field client-side. |
weight | Default weight applied when computing the overall score. Override per-run via evaluatorWeights on the POST. |
Evaluations
POST /executions/{id}/evaluations — trigger
Runs the configured evaluators on a completed execution. Body (all optional):
{
"evaluatorIds": ["uuid", "uuid"],
"evaluatorWeights": { "uuid": 2.0 }
}
If evaluatorIds is omitted, every active evaluator allowed by the org's tier runs. Weight defaulting matches the UI: any evaluator not in evaluatorWeights uses its configured default weight from GET /evaluators.
Common 409s: invalid_state (execution not yet completed), evaluation_in_progress, no_evaluators.
GET /evaluations/{id} — detail
Returns the run + every per-test-case score. While status: "running" the response includes scoresCompleted / scoresTotal for a progress bar.
{
"id": "ad12-...",
"executionId": "9e2f-...",
"runNumber": 1,
"status": "completed",
"overallScore": 0.87,
"passRate": 0.92,
"passed": true,
"totalCases": 12,
"passedCases": 11,
"failedCases": 1,
"durationMs": 32104,
"createdAt": "...",
"completedAt": "...",
"scores": [
{
"id": "...",
"evaluatorId": "...",
"evaluatorName": "Tone",
"evaluatorSlug": "tone",
"weight": 1.0,
"testCaseId": "...",
"score": 0.83,
"passed": true,
"explanation": "Response stayed polite and professional throughout.",
"createdAt": "..."
}
]
}
scores[].weight is the runtime weight that was actually applied when computing overallScore. If you passed evaluatorWeights on the trigger, it matches that override; otherwise it matches the evaluator's configured default. For legacy evaluations it may be null.
Evaluation reports (v1.6)
4 endpoints + 2 PDF endpoints that expose the evaluation-report pipeline: executive summary, per-evaluator analysis, per-test-case scores, and a PDF. GET /report is read-only — never calls the LLM, never consumes tokens, missing summaries surface as empty strings ("").
Contract decisions:
- Only completed evaluations report. Every endpoint refuses
running/failed/pendingwith409 invalid_state. - POST endpoints consume tokens. Both regenerate endpoints spend against the monthly
tokensquota and require theai_evaluation_reportsplan feature.402 quota_exceededon either gate. - Cache-gated by default. Same
(evaluationId, lang)or(evaluationId, evaluatorId)returns the cached summary withcached: trueandtokensUsed: null. Passforce: trueto bypass. langis"en" | "es"only. Each language is cached independently for the executive summary; no fallback.- Per-evaluator cache key is
(evaluationId, evaluatorId)— language is NOT part of the key. Regenerating in EN then ES OVERWRITES the EN row.
GET /evaluations/{id}/report
Returns the consolidated JSON report. Read-only.
GET /evaluations/{id}/report-pdf
Returns the PDF as application/pdf with Content-Disposition: attachment; filename="report_<plan>_eval<runNumber>_<YYYY-MM-DD>.pdf". Read-only.
curl -H "Authorization: Bearer aqa_xxx" \
https://app.artificialqa.com/api/v1/public/evaluations/<id>/report-pdf \
-o report.pdf
POST /evaluations/{id}/report-pdf-url — tokenized URL (v1.6.1)
Issues a short-lived (1h TTL) tokenized URL that downloads the PDF without requiring an Authorization header. Use this when you want to hand a click-and-go link to a human or to an LLM client that can't easily round-trip the Bearer token.
{
"url": "https://app.../api/v1/public/downloads/<43-char-token>",
"expiresAt": "2026-06-17T18:30:00.000Z",
"expiresInSec": 3600
}
The URL is multi-use within the TTL, org-scoped, single-purpose (unlocks ONE evaluation's PDF), and Cache-Control: no-store. Does NOT spend tokens and does NOT require any feature gate.
GET /downloads/{token} — public, no Authorization
Streams the PDF binary. No API key required — the token is the auth. Token missing / malformed / expired → unified 404 (NOT 410) so the surface never leaks the existence of past or foreign tokens.
POST /evaluations/{id}/summary
Regenerates the executive summary. Body (optional): { "lang": "en", "force": false }.
{
"summary": "Generated text...",
"providerName": "saia",
"model": "agent-v1",
"cached": false,
"tokensUsed": 1234
}
POST /evaluations/{id}/evaluator-summary
Regenerates the analysis for one evaluator within this evaluation. Body: { "evaluatorId": "uuid", "lang": "en", "force": false }. evaluatorId required. Same SummaryResult shape as the executive variant.
5. Code samples
Runnable examples are available in three flavors, each implementing the full flow — discover → trigger → poll → print scores:
- curl — shell pipeline using
curl+jq. - Node — Node.js 20+ using built-in
fetch. - Python — Python 3.10+ using
httpx(orrequests).
For a typed client in any language, generate it from the OpenAPI spec.
6. Limits and edge cases
- Timeouts. Executions can run up to 30 minutes (longest browser test we've seen is ~15 min). Beyond that the runner marks the execution
failed. Poll accordingly. - Cancellation. No public cancel endpoint in v1 — cancel from the UI.
- Cross-project / cross-org data. A key from project A cannot see anything from project B. Returns
404 not_found(not403) to avoid leaking which UUIDs exist. - Concurrent executions on the same plan. Allowed. The runner serializes internally only when both target the same browser connection on the same worker.
- Webhooks. Not in v1. Poll the status endpoint and emit your own webhook downstream.
- SSRF guard (v1.6.3). Agent connections that point at private/reserved IP ranges (RFC1918, loopback, link-local incl. IMDS, IPv6 ULA/link-local) or non-
http(s)://schemes are blocked at create-time, update-time, the sync test endpoint, and execution runtime. Surfaces as400 validation_failedwith reason + offending field. Escape hatch:Organization.allowPrivateAgentConnections(org_admin only).
evaluatorIds, the sync pre-check verifies the ids belong to your org but does not apply the billing-tier whitelist — that runs async in the runner. Net effect: ids with planAllowed: false are silently dropped and the scores[] simply omits them. If every requested id is dropped, the evaluation finishes with status: "failed" and errorMessage: "No active evaluators configured". Always filter client-side on planAllowed: true from GET /evaluators before sending. Fix is on the v1.x roadmap.