Advanced

Public REST API — v1

Programmatic access to the same QA primitives the ArtificialQA UI exposes. Trigger test executions from CI/CD, integrate evaluation into an internal agent platform, or build dashboards on top of your test runs.

Status: v1 — stable. Breaking changes ship as /api/v2/public/. Additive changes (new endpoints, new fields, new optional params) ship without a version bump. Current revision: v1.6.4.

Base URLs

Production: https://app.artificialqa.com/api/v1/public
Local dev: http://localhost:7878/api/v1/public

OpenAPI spec: /openapi.yaml — feed it to any OpenAPI tool (Postman, Stoplight, Speakeasy, openapi-generator, etc.) to generate a typed client.

Chat-friendly surface? The same primitives are exposed over the Model Context Protocol at /api/v1/mcp. Use that route when integrating with Claude Desktop, Cursor, Continue, or any custom MCP-aware agent.

1. Authentication

Every request needs an API key in the Authorization header:

Authorization: Bearer aqa_live_xxxxxxxxxxxxxxxxxxxxxxxxx

Generating a key

Sign in to ArtificialQA → click your avatar → API Keys & MCP → API Keys tab → New API Key.
Pick a project the key will be bound to. The key can only read/write data within that single project.
Pick a scope:
- read — GET only. Use for dashboards, reporting, read-only integrations.
- write — GET + POST. Required for triggering executions / evaluations.
Pick an expiration date (v1.6.3 — required). Min 1 day, max 12 months from now. Default 6 months. There is no "never expires" option — set a calendar reminder to rotate.
Copy the plaintext key — it's shown once. Store it in a secret manager.

Who can mint keys:

org_admin of the key's org — yes, any project.
super_admin — yes (bypass).
project_admin — no (was allowed in v1.6.2, removed in v1.6.3). Project admins can still see their org's keys listed, but cannot create new ones — static keys are an org-tier service credential.
tester / auditor / member — no.

Keys are project-scoped (v1.6.2). They never expose data from another project, neither across orgs nor across projects within the same org. If you need access to multiple projects, create one key per project. Lost a key? Revoke it from the same UI and generate a new one.

Audit semantics — keys act as a service (v1.6.3)

When an API key creates or modifies data via this REST API, the audit log records the actor as service:<key prefix> rather than the email of the human who minted the key. The minting human is captured separately as details.createdByUserId for traceability. At a glance the audit reader sees "this was a machine, here's which key" instead of misleadingly attributing thousands of CI operations to one engineer.

Legacy audit entries (pre-v1.6.3) keep their original apikey:<prefix> actor string — the discriminator on the api_keys table (actorKind) tells the audit writer which format to emit.

Project binding (v1.6.2)

Every API key is bound to exactly one project. Behavior:

Endpoints that read or write project-scoped resources (TestCases, TestSuites, TestPlans, AgentConnections, Executions, EvaluationRuns, etc.) automatically filter by the key's project. You don't pass projectId for it to work.
Endpoints that accept a projectId field in the body (e.g. POST /test-cases, POST /test-plans) treat the field as optional:
- Omitted → uses the key's project binding implicitly.
- Matches the key's binding → fine, accepted.
- Mismatches → 400 validation_failed with both ids in the detail field.
Cross-project reads on a single resource id → 404 not_found. The row exists in the DB but the key can't see it.
Idempotency keys are project-scoped: two projects in the same org can reuse the same idempotency-key string without colliding on each other's cached results.
Org-level resources (Evaluators — both global and org-scoped custom; Subscription; AI providers) stay org-scoped. Your project-bound key still sees them.

OAuth (MCP only)

A second authentication path — oat_* Bearer tokens — is available exclusively on the MCP endpoint (/api/v1/mcp). It's the path used by Claude Desktop, Cursor, and other human-driven MCP clients: instead of pasting a static key, the user signs in to ArtificialQA in a browser and picks a project from a consent screen. The REST API (/api/v1/public) keeps using aqa_* static keys only — OAuth tokens are not accepted on REST routes. See Connect via MCP for setup.

Errors

Status	Code	When
401	`missing_token`	No `Authorization` header.
401	`invalid_token`	Key not found or malformed.
401	`token_revoked`	Key revoked from the UI.
401	`token_expired`	Key past its `expiresAt`.
403	`insufficient_scope`	`read` key tried a POST.

2. Conventions

Async + polling

Every state-changing endpoint is asynchronous. The POST returns 202 Accepted immediately with an executionId / evaluationId and a statusUrl. Clients poll the matching GET until status reaches a terminal state.

Endpoint	Terminal states
`/executions/{id}`	`completed`, `failed`, `cancelled`
`/evaluations/{id}`	`completed`, `failed`

There are no webhooks in v1.

Polling cadence

A reasonable client polls with exponential backoff, capped at 10 seconds:

delay = min(10, 1.5 ** attempt) seconds

Most executions complete in under 5 minutes; very long Playwright-based browser executions can run for 15+ minutes.

Errors — RFC 7807 problem+json

Every error response has content-type: application/problem+json and the body:

{
  "type":   "https://docs.artificialqa.com/errors/<code>",
  "title":  "Short human-readable summary",
  "status": 404,
  "code":   "not_found",
  "detail": "Optional verbose explanation."
}

Switch on code, not on title or detail. Titles may be reworded between releases; codes are part of the contract.

The complete catalog:

Status	Code	Meaning
400	`missing_path_param`	Malformed URL — usually a missing UUID.
400	`validation_failed`	Body, query, or header failed validation.
401	`missing_token` / `invalid_token` / `token_revoked` / `token_expired`	Auth (see section 1).
402	`quota_exceeded`	Monthly plan quota hit (executions, evaluations, or tokens).
403	`insufficient_scope`	Key scope is `read` but route requires `write`.
403	`mcp_disabled`	Org has the MCP feature flag off (MCP route only).
404	`not_found`	Resource doesn't exist in this org / project.
409	`invalid_state`	Resource is in a state that blocks the operation.
409	`empty_plan`	Plan has zero active test cases.
409	`evaluation_in_progress`	A prior evaluation on this execution is still `pending` or `running`.
409	`no_evaluators`	Org has no active evaluators configured.
409	`duplicate_membership`	Membership row already exists (suite item or plan-suite link).
429	`rate_limit_exceeded`	Per-key rate limit hit. See below.
500	`internal_error`	Unexpected server error — file a ticket.

Idempotency

POST endpoints accept an Idempotency-Key header (1-255 chars). Reusing the same key within the project returns the existing row with idempotent: true and HTTP 200 instead of creating a duplicate and returning 202.

POST /api/v1/public/test-plans/<id>/executions
Authorization: Bearer aqa_live_...
Content-Type: application/json
Idempotency-Key: ci-build-4815162342

{ "agentConnectionId": "..." }

Idempotency keys are scoped per-project (v1.6.2), persisted indefinitely. A typical pattern is to use the CI build ID, commit SHA, or a UUIDv4 generated per attempt. For POST /test-cases/import the cache TTL is 24h.

Legacy: the request body also accepts a deprecated idempotencyKey field. If both are sent, the header wins. Migrate to the header.

Rate limits

Each key is rate-limited independently. On every response:

X-RateLimit-Limit:     <max-per-window>
X-RateLimit-Remaining: <left-in-window>
X-RateLimit-Reset:     <unix-seconds-when-window-resets>

A 429 adds Retry-After: <seconds>. The body is a standard problem+json with code: "rate_limit_exceeded". The rate-limit pool is shared between aqa_* static keys and oat_* OAuth tokens — both keyed by prefix.

Quotas

Executions, evaluations, and LLM tokens count against the org's monthly plan quota. Hitting any returns 402 quota_exceeded. The response body spells out which quota:

{
  "code": "quota_exceeded",
  "detail": "Monthly execution limit reached (250/250)."
}

3. The flow

The standard CI-integration flow is 4 calls:

 1. GET  /test-plans                       discover plan id
 2. GET  /agent-connections                discover connection id
 3. POST /test-plans/{id}/executions       trigger execution
                                           (optionally evaluate=true)
 4. GET  /executions/{id}    (poll)        wait for terminal status

If evaluate: true was set in step 3, the evaluation runs automatically afterwards using every evaluator allowed by the org's billing tier — GET /executions/{id}.evaluation carries the summary. Otherwise:

 5. (optional) GET /evaluators              discover evaluator ids
 6. POST /executions/{id}/evaluations       trigger evaluation explicitly
                                            (optionally with evaluatorIds)
 7. GET  /evaluations/{id}   (poll)         wait for terminal status

The shortest happy path (evaluate: true) is 2 calls + polling.

4. Endpoints

Test plans

`GET /test-plans`

Lists every plan in the API key's project.

Query params: status, limit.

Response 200 — array of TestPlanSummary:

[
  {
    "id": "744cd22e-bf76-4e8c-8060-9f283a64796c",
    "name": "TP_Browser_ArtQA",
    "description": null,
    "status": "completed",
    "environment": null,
    "defaultAgentConnectionId": "12b17e7b-...",
    "suiteCount": 3,
    "executionCount": 7,
    "createdAt": "2026-05-21T13:44:09.103Z",
    "updatedAt": "2026-06-10T17:01:55.211Z"
  }
]

`GET /test-plans/{id}`

Detail with suite breakdown. Useful for picking a plan in a UI.

`POST /test-plans` — create (v1.4)

Creates an empty plan. Required: name. Optional: description, status (default "draft"), agentConnectionId, environment, scheduleCron, projectId (auto-picked if the org has exactly one active project; required otherwise — the error enumerates candidates). The runner is the sole writer of running and completed; create accepts {draft, ready} only.

Response 201 — full TestPlanDetail with createdBy="apikey:<prefix>", suites=[], executionCount=0.

`PATCH /test-plans/{id}` — update (v1.4)

Partial update. Patchable: name, description, status (subset {draft, ready, archived}), agentConnectionId, environment, scheduleCron. Not patchable: projectId (would orphan suite memberships), createdBy, createdAt, isActive (use DELETE).

PATCH on a plan in running is rejected with 409 invalid_state — the runner is mid-snapshot.

`DELETE /test-plans/{id}` — soft-delete (v1.4)

Flips isActive=false. Idempotent. Past executions that snapshotted the plan keep working. Rejected with 409 invalid_state if the plan is currently running.

`POST /test-plans/{id}/suites` — link a suite (v1.4)

{ "testSuiteId": "uuid", "sortOrder": 5 }

sortOrder is optional — omit it and the server assigns MAX(sortOrder)+1. Suite linking is project-bound: cross-project link → 409 invalid_state with both project ids in detail. Duplicate link → 409 duplicate_membership.

`DELETE /test-plans/{id}/suites/{suiteId}` — unlink (v1.4)

Removes the link. Idempotent. Bumps plan.updatedAt. Rejected with 409 invalid_state if the plan is currently running.

TestPlanSummary shape:

{
  id: uuid,
  name: string,
  description: string | null,
  status: "draft" | "ready" | "running" | "completed" | "archived",
  environment: string | null,
  defaultAgentConnectionId: uuid | null,
  suiteCount: int,
  executionCount: int,
  createdAt: ISO,
  updatedAt: ISO,
}

Test suites

`GET /test-suites`

Cursor-paginated list: { data: TestSuiteSummary[], nextCursor: string | null }. Query params: cursor, limit (1-200, default 50), projectId, containsTestCaseId, tags, search, createdAfter, updatedSince, isActive.

`GET /test-suites/{id}`

Full detail. items[] is ordered by (sortOrder ASC, id ASC), each carrying an embedded testCase summary. 404 on soft-deleted suites unless ?isActive=false.

`POST /test-suites`

Creates an empty suite. Required: name. Optional: description, tags, projectId (auto-picked when the org has one active project; required otherwise). Membership goes through the sub-resource endpoints below — by design you author the structure separately from the content.

`PATCH /test-suites/{id}`

Partial update. Patchable: name, description (send null to clear), tags, isActive. Not patchable: projectId, testCount (mutated only via membership endpoints), createdBy, createdAt.

`DELETE /test-suites/{id}`

Soft-delete. Idempotent. Join rows survive so any plan that referenced the suite keeps resolving.

`POST /test-suites/{id}/test-cases`

Adds a membership.

{ "testCaseId": "uuid", "sortOrder": 5 }

Project-bound: a test case can only be added to a suite in the same project. Cross-project add → 409 invalid_state. Duplicate → 409 duplicate_membership.

`DELETE /test-suites/{id}/test-cases/{tcId}`

Removes a membership. Idempotent. Removing a TC that wasn't a member returns 204 with no DB writes.

Test cases

Full CRUD plus bulk import. Lets a CI / IaC / agent script author and curate test cases without touching the UI.

Contract decisions:

type is "simple" (single input/output) or "conversational" (multi-turn dialogue). The DB-level agent_task value is rejected at the public boundary by design — "what's being tested" is orthogonal to "shape of the test case".
source is always "manual" (single create) or "imported" (bulk). The caller cannot set it. "generated" is reserved for the generator runner.
reviewStatus is always "approved" on API-authored test cases. PATCH cannot change it.
Soft-delete only. DELETE sets isActive=false and lifecycleStatus="archived". The row stays so executions / suite items keep their FK references.
design (JSONB) is the unified shape. Legacy input / expectedOutput / turns columns coexist for backwards compat — every read returns both surfaces; writes accept either, with design winning when both are sent.
PII auto-detection runs server-side on every create and on PATCHes that touch an input-bearing field. Results land in piiDetected + piiTypes and are exposed on list summary too — filter without paging every row.

`GET /test-cases`

Cursor-paginated list: { data: TestCaseSummary[], nextCursor: string | null }.

Query params: cursor, limit (1-200, default 50), suiteId, type, difficulty, lifecycleStatus, reviewStatus, source, industryId, piiDetected, tags (repeated, ANY-match), search, createdAfter, updatedSince, isActive.

`GET /test-cases/{id}`

Full detail — every column of TestCase minus encrypted / internal fields. Returns both legacy and design shapes.

`POST /test-cases`

Required: type plus EITHER design, OR input/expectedOutput, OR turns (for conversational). The Zod boundary catches the conversational-without-turns case before any DB write.

Project resolution: if the calling org has exactly one active project, projectId is auto-picked. Otherwise it is required and the error detail enumerates the available projects with (id, name).

Response 201 — full TestCaseDetail. source=manual, reviewStatus=approved.

`PATCH /test-cases/{id}`

Partial update. Body must contain at least one field. Cannot change type, source, or reviewStatus. PII is re-scanned only when the patch touches input / expectedOutput / turns / design / tags.

`DELETE /test-cases/{id}`

Soft-delete. Idempotent — re-DELETE on an archived row returns the same 204 without a second DB write.

`POST /test-cases/import` — bulk

Up to 500 items per request. Per-row independence: a bad row goes into errors[] and the batch continues.

Idempotency: supplying Idempotency-Key makes the call replay-safe — cached against (project, key, "test_case") in the bulk_imports table for 24 hours. A repeated invocation within that window returns the original result verbatim with idempotent: true and zero DB mutations.

{
  "createdIds": ["uuid-1", "uuid-2", ...],
  "errors": [{ "index": 2, "code": "not_found", "detail": "industryId ... does not exist" }],
  "idempotent": false
}

Status codes: 200 full success (or cache replay), 207 Multi-Status partial success (the same status applies on replay), 400 schema rejection, 402 quota.

Agent connections

`GET /agent-connections`

Lists connections. Never returns secrets — only the fields needed to choose a connection for an execution.

Query params: protocol (http | browser | websocket), isActive.

[
  {
    "id": "12b17e7b-2cec-4206-986d-9df390be2de3",
    "name": "ArtificialQA Test Bank",
    "protocol": "browser",
    "baseUrl": "https://app.artificialqa.com",
    "isActive": true,
    "createdAt": "2026-05-19T09:12:11.502Z"
  }
]

`GET /agent-connections/{id}` — detail (v1.5)

Returns the full AgentConnectionDetail with secrets masked as "***<last4>". Use this to inspect existing config before PATCH-ing back.

`POST /agent-connections` — create (v1.5)

Required: name, protocol, baseUrl, authConfig, messageConfig. Optional: description, environment (defaults to "production"), preChatConfig, postChatConfig, templateVars, environments, projectId.

Caller sends raw secrets in the config blobs; the helper encrypts them (AES-256-GCM) before INSERT and stored as enc:<iv>:<tag>:<ciphertext>.

`PATCH /agent-connections/{id}` — update (v1.5)

Partial update. Patchable: name, description, baseUrl, environment, isActive, authConfig, preChatConfig, messageConfig, postChatConfig, templateVars, environments. Not patchable: projectId, protocol.

Masked-secret round-trips: submitting a masked value (e.g. "***c123") at a secret-named path falls back to the existing decrypted DB value — you can read the detail, edit any plaintext field, and PATCH the whole config back without juggling secrets. Only NEW plaintext values overwrite the stored secret.

Config blobs are replaced per-key, not deep-merged. Omitting a key in your PATCH drops it from the column. Explicit null clears a nullable blob.

`DELETE /agent-connections/{id}` (v1.5)

Soft-delete. Idempotent. Past executions / plans keep working. To re-activate: PATCH { "isActive": true }.

`POST /agent-connections/{id}/test` — smoke-test (v1.5)

Synchronous smoke-test against the configured agent. Optional body: { "runtimeVars": { ... } } merged over persisted templateVars for this call only.

Per-protocol behavior:

http — 30s timeout. Returns latency, upstream HTTP status, and a truncated response in details.
browser — spawns Chromium. Can take 30-60s; set client timeout to ~90s.
websocket — not implemented yet. Returns { status: "completed", ok: false, error: "websocket protocol test not supported in v1.5" }.

Rate-limited to 1 call per 60s per (connectionId, apiKeyId).

{
  "status": "completed",
  "ok": true,
  "latencyMs": 1234,
  "error": null,
  "details": { "protocol": "http", "statusCode": 200, "response": "..." },
  "startedAt": "2026-06-17T10:00:00.000Z",
  "completedAt": "2026-06-17T10:00:01.234Z"
}

Forward-compat: status is a discriminator. v1.5 always sets "completed" because the endpoint is synchronous. Pattern-match on status as a union so a future "pending" + async polling doesn't break clients.

Executions

`POST /test-plans/{id}/executions` — trigger

Starts a background run. Returns 202 with the new id; 200 if idempotent.

{
  "agentConnectionId": "12b17e7b-...",
  "evaluate": true,
  "evaluatorIds": ["ad12-...", "be34-..."],
  "evaluatorWeights": { "ad12-...": 2.0 },
  "runtimeVars": { "documentId": "doc-42" }
}

Field	Type	Required	Notes
`agentConnectionId`	uuid	yes	—
`evaluate`	bool, default `false`	no	Auto-triggers the evaluation when execution finishes.
`evaluatorIds`	uuid[]	no	Only honoured when `evaluate: true`. Restricts the auto-eval. Omit → every tier-allowed evaluator runs.
`evaluatorWeights`	`{uuid: number > 0}`	no	Only honoured when `evaluate: true`. Per-evaluator weight overrides. Omitted entries use the evaluator's configured default weight.
`runtimeVars`	`{string: string}`	no	Template var substitutions for the agent connection.
`idempotencyKey`	string, deprecated	no	Prefer the `Idempotency-Key` header.

Response 202:

{
  "executionId": "9e2f-...",
  "status": "pending",
  "statusUrl": "https://app.artificialqa.com/api/v1/public/executions/9e2f-..."
}

Response 200 (idempotent replay): same shape plus "idempotent": true.

Common 409s: invalid_state (plan not ready/completed), empty_plan.

`GET /executions` — list

Filter: testPlanId, status, limit. Returns ExecutionSummary[] — same shape as detail but without results[].

`GET /executions/{id}` — detail

Returns the execution + every result + the most recent evaluation (if any).

{
  "id": "9e2f-...",
  "testPlanId": "744cd22e-...",
  "agentConnectionId": "12b17e7b-...",
  "runNumber": 8,
  "status": "completed",
  "totalCases": 12,
  "completedCases": 12,
  "failedCases": 1,
  "durationMs": 184320,
  "errorMessage": null,
  "createdAt": "2026-06-10T17:02:11.040Z",
  "completedAt": "2026-06-10T17:05:15.360Z",
  "results": [
    {
      "id": "...",
      "testCaseId": "...",
      "executionStatus": "SUCCESS",
      "responseValidity": "VALID",
      "responseTimeMs": 8234,
      "success": true,
      "retryCount": 0,
      "finalized": true,
      "createdAt": "..."
    }
  ],
  "evaluation": {
    "id": "ad12-...",
    "status": "completed",
    "runNumber": 1,
    "overallScore": 0.87,
    "passRate": 0.92,
    "passed": true,
    "totalCases": 12,
    "passedCases": 11,
    "failedCases": 1,
    "durationMs": 32104
  }
}

executionStatus: SUCCESS, ERROR, TIMEOUT, SKIPPED. responseValidity: VALID, EMPTY, MALFORMED. Only SUCCESS + VALID results enter evaluation.

Evaluators

`GET /evaluators`

Lists the evaluators visible to the org — platform-default globals plus any org-specific custom evaluators. Use it to discover the id you pass as evaluatorIds.

Query params: includeBlocked (default true).

Security. Never exposes the evaluator's agentConfig (encrypted credentials) or systemPrompt (calibration prompt).

Field	Notes
`isGlobal`	`true` for platform-default; `false` for org-specific (Enterprise).
`planAllowed`	`true` if your billing tier permits this evaluator. `false` means a `POST /evaluations` referencing it will silently drop it from the run. Filter on this field client-side.
`weight`	Default weight applied when computing the overall score. Override per-run via `evaluatorWeights` on the POST.

Evaluations

`POST /executions/{id}/evaluations` — trigger

Runs the configured evaluators on a completed execution. Body (all optional):

{
  "evaluatorIds": ["uuid", "uuid"],
  "evaluatorWeights": { "uuid": 2.0 }
}

If evaluatorIds is omitted, every active evaluator allowed by the org's tier runs. Weight defaulting matches the UI: any evaluator not in evaluatorWeights uses its configured default weight from GET /evaluators.

Common 409s: invalid_state (execution not yet completed), evaluation_in_progress, no_evaluators.

`GET /evaluations/{id}` — detail

Returns the run + every per-test-case score. While status: "running" the response includes scoresCompleted / scoresTotal for a progress bar.

{
  "id": "ad12-...",
  "executionId": "9e2f-...",
  "runNumber": 1,
  "status": "completed",
  "overallScore": 0.87,
  "passRate": 0.92,
  "passed": true,
  "totalCases": 12,
  "passedCases": 11,
  "failedCases": 1,
  "durationMs": 32104,
  "createdAt": "...",
  "completedAt": "...",
  "scores": [
    {
      "id": "...",
      "evaluatorId": "...",
      "evaluatorName": "Tone",
      "evaluatorSlug": "tone",
      "weight": 1.0,
      "testCaseId": "...",
      "score": 0.83,
      "passed": true,
      "explanation": "Response stayed polite and professional throughout.",
      "createdAt": "..."
    }
  ]
}

scores[].weight is the runtime weight that was actually applied when computing overallScore. If you passed evaluatorWeights on the trigger, it matches that override; otherwise it matches the evaluator's configured default. For legacy evaluations it may be null.

Evaluation reports (v1.6)

4 endpoints + 2 PDF endpoints that expose the evaluation-report pipeline: executive summary, per-evaluator analysis, per-test-case scores, and a PDF. GET /report is read-only — never calls the LLM, never consumes tokens, missing summaries surface as empty strings ("").

Contract decisions:

Only completed evaluations report. Every endpoint refuses running / failed / pending with 409 invalid_state.
POST endpoints consume tokens. Both regenerate endpoints spend against the monthly tokens quota and require the ai_evaluation_reports plan feature. 402 quota_exceeded on either gate.
Cache-gated by default. Same (evaluationId, lang) or (evaluationId, evaluatorId) returns the cached summary with cached: true and tokensUsed: null. Pass force: true to bypass.
lang is "en" | "es" only. Each language is cached independently for the executive summary; no fallback.
Per-evaluator cache key is (evaluationId, evaluatorId) — language is NOT part of the key. Regenerating in EN then ES OVERWRITES the EN row.

`GET /evaluations/{id}/report`

Returns the consolidated JSON report. Read-only.

`GET /evaluations/{id}/report-pdf`

Returns the PDF as application/pdf with Content-Disposition: attachment; filename="report_<plan>_eval<runNumber>_<YYYY-MM-DD>.pdf". Read-only.

curl -H "Authorization: Bearer aqa_xxx" \
  https://app.artificialqa.com/api/v1/public/evaluations/<id>/report-pdf \
  -o report.pdf

`POST /evaluations/{id}/report-pdf-url` — tokenized URL (v1.6.1)

Issues a short-lived (1h TTL) tokenized URL that downloads the PDF without requiring an Authorization header. Use this when you want to hand a click-and-go link to a human or to an LLM client that can't easily round-trip the Bearer token.

{
  "url": "https://app.../api/v1/public/downloads/<43-char-token>",
  "expiresAt": "2026-06-17T18:30:00.000Z",
  "expiresInSec": 3600
}

The URL is multi-use within the TTL, org-scoped, single-purpose (unlocks ONE evaluation's PDF), and Cache-Control: no-store. Does NOT spend tokens and does NOT require any feature gate.

`GET /downloads/{token}` — public, no Authorization

Streams the PDF binary. No API key required — the token is the auth. Token missing / malformed / expired → unified 404 (NOT 410) so the surface never leaks the existence of past or foreign tokens.

`POST /evaluations/{id}/summary`

Regenerates the executive summary. Body (optional): { "lang": "en", "force": false }.

{
  "summary": "Generated text...",
  "providerName": "saia",
  "model": "agent-v1",
  "cached": false,
  "tokensUsed": 1234
}

`POST /evaluations/{id}/evaluator-summary`

Regenerates the analysis for one evaluator within this evaluation. Body: { "evaluatorId": "uuid", "lang": "en", "force": false }. evaluatorId required. Same SummaryResult shape as the executive variant.

5. Code samples

Runnable examples are available in three flavors, each implementing the full flow — discover → trigger → poll → print scores:

curl — shell pipeline using curl + jq.
Node — Node.js 20+ using built-in fetch.
Python — Python 3.10+ using httpx (or requests).

For a typed client in any language, generate it from the OpenAPI spec.

6. Limits and edge cases

Timeouts. Executions can run up to 30 minutes (longest browser test we've seen is ~15 min). Beyond that the runner marks the execution failed. Poll accordingly.
Cancellation. No public cancel endpoint in v1 — cancel from the UI.
Cross-project / cross-org data. A key from project A cannot see anything from project B. Returns 404 not_found (not 403) to avoid leaking which UUIDs exist.
Concurrent executions on the same plan. Allowed. The runner serializes internally only when both target the same browser connection on the same worker.
Webhooks. Not in v1. Poll the status endpoint and emit your own webhook downstream.
SSRF guard (v1.6.3). Agent connections that point at private/reserved IP ranges (RFC1918, loopback, link-local incl. IMDS, IPv6 ULA/link-local) or non-http(s):// schemes are blocked at create-time, update-time, the sync test endpoint, and execution runtime. Surfaces as 400 validation_failed with reason + offending field. Escape hatch: Organization.allowPrivateAgentConnections (org_admin only).

Plan-tier evaluator filtering is async-silent. When you pass evaluatorIds, the sync pre-check verifies the ids belong to your org but does not apply the billing-tier whitelist — that runs async in the runner. Net effect: ids with planAllowed: false are silently dropped and the scores[] simply omits them. If every requested id is dropped, the evaluation finishes with status: "failed" and errorMessage: "No active evaluators configured". Always filter client-side on planAllowed: true from GET /evaluators before sending. Fix is on the v1.x roadmap.

Need access? The Public REST API is available on Pro and Enterprise plans. Generate a key from your avatar → API Keys & MCP → API Keys, or reach out via artificialqa.com.

Public REST API — v1

Base URLs

1. Authentication

Generating a key

Audit semantics — keys act as a service (v1.6.3)

Project binding (v1.6.2)

OAuth (MCP only)

Errors

2. Conventions

Async + polling

Polling cadence

Errors — RFC 7807 problem+json

Idempotency

Rate limits

Quotas

3. The flow

4. Endpoints

Test plans

GET /test-plans

GET /test-plans/{id}

POST /test-plans — create (v1.4)

PATCH /test-plans/{id} — update (v1.4)

DELETE /test-plans/{id} — soft-delete (v1.4)

POST /test-plans/{id}/suites — link a suite (v1.4)

DELETE /test-plans/{id}/suites/{suiteId} — unlink (v1.4)

Test suites

GET /test-suites

GET /test-suites/{id}

POST /test-suites

PATCH /test-suites/{id}

DELETE /test-suites/{id}

POST /test-suites/{id}/test-cases

DELETE /test-suites/{id}/test-cases/{tcId}

Test cases

GET /test-cases

GET /test-cases/{id}

POST /test-cases

PATCH /test-cases/{id}

DELETE /test-cases/{id}

POST /test-cases/import — bulk

Agent connections

GET /agent-connections

GET /agent-connections/{id} — detail (v1.5)

POST /agent-connections — create (v1.5)

PATCH /agent-connections/{id} — update (v1.5)

DELETE /agent-connections/{id} (v1.5)

POST /agent-connections/{id}/test — smoke-test (v1.5)

Executions

POST /test-plans/{id}/executions — trigger

GET /executions — list

GET /executions/{id} — detail

Evaluators

GET /evaluators

Evaluations

POST /executions/{id}/evaluations — trigger

GET /evaluations/{id} — detail

Evaluation reports (v1.6)

GET /evaluations/{id}/report

GET /evaluations/{id}/report-pdf

POST /evaluations/{id}/report-pdf-url — tokenized URL (v1.6.1)

GET /downloads/{token} — public, no Authorization

POST /evaluations/{id}/summary

POST /evaluations/{id}/evaluator-summary

5. Code samples

6. Limits and edge cases

`GET /test-plans`

`GET /test-plans/{id}`

`POST /test-plans` — create (v1.4)

`PATCH /test-plans/{id}` — update (v1.4)

`DELETE /test-plans/{id}` — soft-delete (v1.4)

`POST /test-plans/{id}/suites` — link a suite (v1.4)

`DELETE /test-plans/{id}/suites/{suiteId}` — unlink (v1.4)

`GET /test-suites`

`GET /test-suites/{id}`

`POST /test-suites`

`PATCH /test-suites/{id}`

`DELETE /test-suites/{id}`

`POST /test-suites/{id}/test-cases`

`DELETE /test-suites/{id}/test-cases/{tcId}`

`GET /test-cases`

`GET /test-cases/{id}`

`POST /test-cases`

`PATCH /test-cases/{id}`

`DELETE /test-cases/{id}`

`POST /test-cases/import` — bulk

`GET /agent-connections`

`GET /agent-connections/{id}` — detail (v1.5)

`POST /agent-connections` — create (v1.5)

`PATCH /agent-connections/{id}` — update (v1.5)

`DELETE /agent-connections/{id}` (v1.5)

`POST /agent-connections/{id}/test` — smoke-test (v1.5)

`POST /test-plans/{id}/executions` — trigger

`GET /executions` — list

`GET /executions/{id}` — detail

`GET /evaluators`

`POST /executions/{id}/evaluations` — trigger

`GET /evaluations/{id}` — detail

`GET /evaluations/{id}/report`

`GET /evaluations/{id}/report-pdf`

`POST /evaluations/{id}/report-pdf-url` — tokenized URL (v1.6.1)

`GET /downloads/{token}` — public, no Authorization

`POST /evaluations/{id}/summary`

`POST /evaluations/{id}/evaluator-summary`