ES EN
Advanced

Public REST API — v1

Programmatic access to the same QA primitives the ArtificialQA UI exposes. Trigger test executions from CI/CD, integrate evaluation into an internal agent platform, or build dashboards on top of your test runs.

Status: v1 — stable. Breaking changes ship as /api/v2/public/. Additive changes (new endpoints, new fields, new optional params) ship without a version bump. Current revision: v1.6.4.

Base URLs

OpenAPI spec: /openapi.yaml — feed it to any OpenAPI tool (Postman, Stoplight, Speakeasy, openapi-generator, etc.) to generate a typed client.

Chat-friendly surface? The same primitives are exposed over the Model Context Protocol at /api/v1/mcp. Use that route when integrating with Claude Desktop, Cursor, Continue, or any custom MCP-aware agent.

1. Authentication

Every request needs an API key in the Authorization header:

Authorization: Bearer aqa_live_xxxxxxxxxxxxxxxxxxxxxxxxx

Generating a key

  1. Sign in to ArtificialQA → click your avatar → API Keys & MCPAPI Keys tab → New API Key.
  2. Pick a project the key will be bound to. The key can only read/write data within that single project.
  3. Pick a scope:
    • read — GET only. Use for dashboards, reporting, read-only integrations.
    • write — GET + POST. Required for triggering executions / evaluations.
  4. Pick an expiration date (v1.6.3 — required). Min 1 day, max 12 months from now. Default 6 months. There is no "never expires" option — set a calendar reminder to rotate.
  5. Copy the plaintext key — it's shown once. Store it in a secret manager.

Who can mint keys:

Keys are project-scoped (v1.6.2). They never expose data from another project, neither across orgs nor across projects within the same org. If you need access to multiple projects, create one key per project. Lost a key? Revoke it from the same UI and generate a new one.

Audit semantics — keys act as a service (v1.6.3)

When an API key creates or modifies data via this REST API, the audit log records the actor as service:<key prefix> rather than the email of the human who minted the key. The minting human is captured separately as details.createdByUserId for traceability. At a glance the audit reader sees "this was a machine, here's which key" instead of misleadingly attributing thousands of CI operations to one engineer.

Legacy audit entries (pre-v1.6.3) keep their original apikey:<prefix> actor string — the discriminator on the api_keys table (actorKind) tells the audit writer which format to emit.

Project binding (v1.6.2)

Every API key is bound to exactly one project. Behavior:

OAuth (MCP only)

A second authentication path — oat_* Bearer tokens — is available exclusively on the MCP endpoint (/api/v1/mcp). It's the path used by Claude Desktop, Cursor, and other human-driven MCP clients: instead of pasting a static key, the user signs in to ArtificialQA in a browser and picks a project from a consent screen. The REST API (/api/v1/public) keeps using aqa_* static keys only — OAuth tokens are not accepted on REST routes. See Connect via MCP for setup.

Errors

StatusCodeWhen
401missing_tokenNo Authorization header.
401invalid_tokenKey not found or malformed.
401token_revokedKey revoked from the UI.
401token_expiredKey past its expiresAt.
403insufficient_scoperead key tried a POST.

2. Conventions

Async + polling

Every state-changing endpoint is asynchronous. The POST returns 202 Accepted immediately with an executionId / evaluationId and a statusUrl. Clients poll the matching GET until status reaches a terminal state.

EndpointTerminal states
/executions/{id}completed, failed, cancelled
/evaluations/{id}completed, failed

There are no webhooks in v1.

Polling cadence

A reasonable client polls with exponential backoff, capped at 10 seconds:

delay = min(10, 1.5 ** attempt) seconds

Most executions complete in under 5 minutes; very long Playwright-based browser executions can run for 15+ minutes.

Errors — RFC 7807 problem+json

Every error response has content-type: application/problem+json and the body:

{
  "type":   "https://docs.artificialqa.com/errors/<code>",
  "title":  "Short human-readable summary",
  "status": 404,
  "code":   "not_found",
  "detail": "Optional verbose explanation."
}

Switch on code, not on title or detail. Titles may be reworded between releases; codes are part of the contract.

The complete catalog:

StatusCodeMeaning
400missing_path_paramMalformed URL — usually a missing UUID.
400validation_failedBody, query, or header failed validation.
401missing_token / invalid_token / token_revoked / token_expiredAuth (see section 1).
402quota_exceededMonthly plan quota hit (executions, evaluations, or tokens).
403insufficient_scopeKey scope is read but route requires write.
403mcp_disabledOrg has the MCP feature flag off (MCP route only).
404not_foundResource doesn't exist in this org / project.
409invalid_stateResource is in a state that blocks the operation.
409empty_planPlan has zero active test cases.
409evaluation_in_progressA prior evaluation on this execution is still pending or running.
409no_evaluatorsOrg has no active evaluators configured.
409duplicate_membershipMembership row already exists (suite item or plan-suite link).
429rate_limit_exceededPer-key rate limit hit. See below.
500internal_errorUnexpected server error — file a ticket.

Idempotency

POST endpoints accept an Idempotency-Key header (1-255 chars). Reusing the same key within the project returns the existing row with idempotent: true and HTTP 200 instead of creating a duplicate and returning 202.

POST /api/v1/public/test-plans/<id>/executions
Authorization: Bearer aqa_live_...
Content-Type: application/json
Idempotency-Key: ci-build-4815162342

{ "agentConnectionId": "..." }

Idempotency keys are scoped per-project (v1.6.2), persisted indefinitely. A typical pattern is to use the CI build ID, commit SHA, or a UUIDv4 generated per attempt. For POST /test-cases/import the cache TTL is 24h.

Legacy: the request body also accepts a deprecated idempotencyKey field. If both are sent, the header wins. Migrate to the header.

Rate limits

Each key is rate-limited independently. On every response:

X-RateLimit-Limit:     <max-per-window>
X-RateLimit-Remaining: <left-in-window>
X-RateLimit-Reset:     <unix-seconds-when-window-resets>

A 429 adds Retry-After: <seconds>. The body is a standard problem+json with code: "rate_limit_exceeded". The rate-limit pool is shared between aqa_* static keys and oat_* OAuth tokens — both keyed by prefix.

Quotas

Executions, evaluations, and LLM tokens count against the org's monthly plan quota. Hitting any returns 402 quota_exceeded. The response body spells out which quota:

{
  "code": "quota_exceeded",
  "detail": "Monthly execution limit reached (250/250)."
}

3. The flow

The standard CI-integration flow is 4 calls:

 1. GET  /test-plans                       discover plan id
 2. GET  /agent-connections                discover connection id
 3. POST /test-plans/{id}/executions       trigger execution
                                           (optionally evaluate=true)
 4. GET  /executions/{id}    (poll)        wait for terminal status

If evaluate: true was set in step 3, the evaluation runs automatically afterwards using every evaluator allowed by the org's billing tier — GET /executions/{id}.evaluation carries the summary. Otherwise:

 5. (optional) GET /evaluators              discover evaluator ids
 6. POST /executions/{id}/evaluations       trigger evaluation explicitly
                                            (optionally with evaluatorIds)
 7. GET  /evaluations/{id}   (poll)         wait for terminal status

The shortest happy path (evaluate: true) is 2 calls + polling.

4. Endpoints

Test plans

GET /test-plans

Lists every plan in the API key's project.

Query params: status, limit.

Response 200 — array of TestPlanSummary:

[
  {
    "id": "744cd22e-bf76-4e8c-8060-9f283a64796c",
    "name": "TP_Browser_ArtQA",
    "description": null,
    "status": "completed",
    "environment": null,
    "defaultAgentConnectionId": "12b17e7b-...",
    "suiteCount": 3,
    "executionCount": 7,
    "createdAt": "2026-05-21T13:44:09.103Z",
    "updatedAt": "2026-06-10T17:01:55.211Z"
  }
]

GET /test-plans/{id}

Detail with suite breakdown. Useful for picking a plan in a UI.

POST /test-plans — create (v1.4)

Creates an empty plan. Required: name. Optional: description, status (default "draft"), agentConnectionId, environment, scheduleCron, projectId (auto-picked if the org has exactly one active project; required otherwise — the error enumerates candidates). The runner is the sole writer of running and completed; create accepts {draft, ready} only.

Response 201 — full TestPlanDetail with createdBy="apikey:<prefix>", suites=[], executionCount=0.

PATCH /test-plans/{id} — update (v1.4)

Partial update. Patchable: name, description, status (subset {draft, ready, archived}), agentConnectionId, environment, scheduleCron. Not patchable: projectId (would orphan suite memberships), createdBy, createdAt, isActive (use DELETE).

PATCH on a plan in running is rejected with 409 invalid_state — the runner is mid-snapshot.

DELETE /test-plans/{id} — soft-delete (v1.4)

Flips isActive=false. Idempotent. Past executions that snapshotted the plan keep working. Rejected with 409 invalid_state if the plan is currently running.

POST /test-plans/{id}/suites — link a suite (v1.4)

{ "testSuiteId": "uuid", "sortOrder": 5 }

sortOrder is optional — omit it and the server assigns MAX(sortOrder)+1. Suite linking is project-bound: cross-project link → 409 invalid_state with both project ids in detail. Duplicate link → 409 duplicate_membership.

DELETE /test-plans/{id}/suites/{suiteId} — unlink (v1.4)

Removes the link. Idempotent. Bumps plan.updatedAt. Rejected with 409 invalid_state if the plan is currently running.

TestPlanSummary shape:

{
  id: uuid,
  name: string,
  description: string | null,
  status: "draft" | "ready" | "running" | "completed" | "archived",
  environment: string | null,
  defaultAgentConnectionId: uuid | null,
  suiteCount: int,
  executionCount: int,
  createdAt: ISO,
  updatedAt: ISO,
}

Test suites

GET /test-suites

Cursor-paginated list: { data: TestSuiteSummary[], nextCursor: string | null }. Query params: cursor, limit (1-200, default 50), projectId, containsTestCaseId, tags, search, createdAfter, updatedSince, isActive.

GET /test-suites/{id}

Full detail. items[] is ordered by (sortOrder ASC, id ASC), each carrying an embedded testCase summary. 404 on soft-deleted suites unless ?isActive=false.

POST /test-suites

Creates an empty suite. Required: name. Optional: description, tags, projectId (auto-picked when the org has one active project; required otherwise). Membership goes through the sub-resource endpoints below — by design you author the structure separately from the content.

PATCH /test-suites/{id}

Partial update. Patchable: name, description (send null to clear), tags, isActive. Not patchable: projectId, testCount (mutated only via membership endpoints), createdBy, createdAt.

DELETE /test-suites/{id}

Soft-delete. Idempotent. Join rows survive so any plan that referenced the suite keeps resolving.

POST /test-suites/{id}/test-cases

Adds a membership.

{ "testCaseId": "uuid", "sortOrder": 5 }

Project-bound: a test case can only be added to a suite in the same project. Cross-project add → 409 invalid_state. Duplicate → 409 duplicate_membership.

DELETE /test-suites/{id}/test-cases/{tcId}

Removes a membership. Idempotent. Removing a TC that wasn't a member returns 204 with no DB writes.

Test cases

Full CRUD plus bulk import. Lets a CI / IaC / agent script author and curate test cases without touching the UI.

Contract decisions:

GET /test-cases

Cursor-paginated list: { data: TestCaseSummary[], nextCursor: string | null }.

Query params: cursor, limit (1-200, default 50), suiteId, type, difficulty, lifecycleStatus, reviewStatus, source, industryId, piiDetected, tags (repeated, ANY-match), search, createdAfter, updatedSince, isActive.

GET /test-cases/{id}

Full detail — every column of TestCase minus encrypted / internal fields. Returns both legacy and design shapes.

POST /test-cases

Required: type plus EITHER design, OR input/expectedOutput, OR turns (for conversational). The Zod boundary catches the conversational-without-turns case before any DB write.

Project resolution: if the calling org has exactly one active project, projectId is auto-picked. Otherwise it is required and the error detail enumerates the available projects with (id, name).

Response 201 — full TestCaseDetail. source=manual, reviewStatus=approved.

PATCH /test-cases/{id}

Partial update. Body must contain at least one field. Cannot change type, source, or reviewStatus. PII is re-scanned only when the patch touches input / expectedOutput / turns / design / tags.

DELETE /test-cases/{id}

Soft-delete. Idempotent — re-DELETE on an archived row returns the same 204 without a second DB write.

POST /test-cases/import — bulk

Up to 500 items per request. Per-row independence: a bad row goes into errors[] and the batch continues.

Idempotency: supplying Idempotency-Key makes the call replay-safe — cached against (project, key, "test_case") in the bulk_imports table for 24 hours. A repeated invocation within that window returns the original result verbatim with idempotent: true and zero DB mutations.

{
  "createdIds": ["uuid-1", "uuid-2", ...],
  "errors": [{ "index": 2, "code": "not_found", "detail": "industryId ... does not exist" }],
  "idempotent": false
}

Status codes: 200 full success (or cache replay), 207 Multi-Status partial success (the same status applies on replay), 400 schema rejection, 402 quota.

Agent connections

GET /agent-connections

Lists connections. Never returns secrets — only the fields needed to choose a connection for an execution.

Query params: protocol (http | browser | websocket), isActive.

[
  {
    "id": "12b17e7b-2cec-4206-986d-9df390be2de3",
    "name": "ArtificialQA Test Bank",
    "protocol": "browser",
    "baseUrl": "https://app.artificialqa.com",
    "isActive": true,
    "createdAt": "2026-05-19T09:12:11.502Z"
  }
]

GET /agent-connections/{id} — detail (v1.5)

Returns the full AgentConnectionDetail with secrets masked as "***<last4>". Use this to inspect existing config before PATCH-ing back.

POST /agent-connections — create (v1.5)

Required: name, protocol, baseUrl, authConfig, messageConfig. Optional: description, environment (defaults to "production"), preChatConfig, postChatConfig, templateVars, environments, projectId.

Caller sends raw secrets in the config blobs; the helper encrypts them (AES-256-GCM) before INSERT and stored as enc:<iv>:<tag>:<ciphertext>.

PATCH /agent-connections/{id} — update (v1.5)

Partial update. Patchable: name, description, baseUrl, environment, isActive, authConfig, preChatConfig, messageConfig, postChatConfig, templateVars, environments. Not patchable: projectId, protocol.

Masked-secret round-trips: submitting a masked value (e.g. "***c123") at a secret-named path falls back to the existing decrypted DB value — you can read the detail, edit any plaintext field, and PATCH the whole config back without juggling secrets. Only NEW plaintext values overwrite the stored secret.

Config blobs are replaced per-key, not deep-merged. Omitting a key in your PATCH drops it from the column. Explicit null clears a nullable blob.

DELETE /agent-connections/{id} (v1.5)

Soft-delete. Idempotent. Past executions / plans keep working. To re-activate: PATCH { "isActive": true }.

POST /agent-connections/{id}/test — smoke-test (v1.5)

Synchronous smoke-test against the configured agent. Optional body: { "runtimeVars": { ... } } merged over persisted templateVars for this call only.

Per-protocol behavior:

Rate-limited to 1 call per 60s per (connectionId, apiKeyId).

{
  "status": "completed",
  "ok": true,
  "latencyMs": 1234,
  "error": null,
  "details": { "protocol": "http", "statusCode": 200, "response": "..." },
  "startedAt": "2026-06-17T10:00:00.000Z",
  "completedAt": "2026-06-17T10:00:01.234Z"
}
Forward-compat: status is a discriminator. v1.5 always sets "completed" because the endpoint is synchronous. Pattern-match on status as a union so a future "pending" + async polling doesn't break clients.

Executions

POST /test-plans/{id}/executions — trigger

Starts a background run. Returns 202 with the new id; 200 if idempotent.

{
  "agentConnectionId": "12b17e7b-...",
  "evaluate": true,
  "evaluatorIds": ["ad12-...", "be34-..."],
  "evaluatorWeights": { "ad12-...": 2.0 },
  "runtimeVars": { "documentId": "doc-42" }
}
FieldTypeRequiredNotes
agentConnectionIduuidyes
evaluatebool, default falsenoAuto-triggers the evaluation when execution finishes.
evaluatorIdsuuid[]noOnly honoured when evaluate: true. Restricts the auto-eval. Omit → every tier-allowed evaluator runs.
evaluatorWeights{uuid: number > 0}noOnly honoured when evaluate: true. Per-evaluator weight overrides. Omitted entries use the evaluator's configured default weight.
runtimeVars{string: string}noTemplate var substitutions for the agent connection.
idempotencyKeystring, deprecatednoPrefer the Idempotency-Key header.

Response 202:

{
  "executionId": "9e2f-...",
  "status": "pending",
  "statusUrl": "https://app.artificialqa.com/api/v1/public/executions/9e2f-..."
}

Response 200 (idempotent replay): same shape plus "idempotent": true.

Common 409s: invalid_state (plan not ready/completed), empty_plan.

GET /executions — list

Filter: testPlanId, status, limit. Returns ExecutionSummary[] — same shape as detail but without results[].

GET /executions/{id} — detail

Returns the execution + every result + the most recent evaluation (if any).

{
  "id": "9e2f-...",
  "testPlanId": "744cd22e-...",
  "agentConnectionId": "12b17e7b-...",
  "runNumber": 8,
  "status": "completed",
  "totalCases": 12,
  "completedCases": 12,
  "failedCases": 1,
  "durationMs": 184320,
  "errorMessage": null,
  "createdAt": "2026-06-10T17:02:11.040Z",
  "completedAt": "2026-06-10T17:05:15.360Z",
  "results": [
    {
      "id": "...",
      "testCaseId": "...",
      "executionStatus": "SUCCESS",
      "responseValidity": "VALID",
      "responseTimeMs": 8234,
      "success": true,
      "retryCount": 0,
      "finalized": true,
      "createdAt": "..."
    }
  ],
  "evaluation": {
    "id": "ad12-...",
    "status": "completed",
    "runNumber": 1,
    "overallScore": 0.87,
    "passRate": 0.92,
    "passed": true,
    "totalCases": 12,
    "passedCases": 11,
    "failedCases": 1,
    "durationMs": 32104
  }
}

executionStatus: SUCCESS, ERROR, TIMEOUT, SKIPPED. responseValidity: VALID, EMPTY, MALFORMED. Only SUCCESS + VALID results enter evaluation.

Evaluators

GET /evaluators

Lists the evaluators visible to the org — platform-default globals plus any org-specific custom evaluators. Use it to discover the id you pass as evaluatorIds.

Query params: includeBlocked (default true).

Security. Never exposes the evaluator's agentConfig (encrypted credentials) or systemPrompt (calibration prompt).

FieldNotes
isGlobaltrue for platform-default; false for org-specific (Enterprise).
planAllowedtrue if your billing tier permits this evaluator. false means a POST /evaluations referencing it will silently drop it from the run. Filter on this field client-side.
weightDefault weight applied when computing the overall score. Override per-run via evaluatorWeights on the POST.

Evaluations

POST /executions/{id}/evaluations — trigger

Runs the configured evaluators on a completed execution. Body (all optional):

{
  "evaluatorIds": ["uuid", "uuid"],
  "evaluatorWeights": { "uuid": 2.0 }
}

If evaluatorIds is omitted, every active evaluator allowed by the org's tier runs. Weight defaulting matches the UI: any evaluator not in evaluatorWeights uses its configured default weight from GET /evaluators.

Common 409s: invalid_state (execution not yet completed), evaluation_in_progress, no_evaluators.

GET /evaluations/{id} — detail

Returns the run + every per-test-case score. While status: "running" the response includes scoresCompleted / scoresTotal for a progress bar.

{
  "id": "ad12-...",
  "executionId": "9e2f-...",
  "runNumber": 1,
  "status": "completed",
  "overallScore": 0.87,
  "passRate": 0.92,
  "passed": true,
  "totalCases": 12,
  "passedCases": 11,
  "failedCases": 1,
  "durationMs": 32104,
  "createdAt": "...",
  "completedAt": "...",
  "scores": [
    {
      "id": "...",
      "evaluatorId": "...",
      "evaluatorName": "Tone",
      "evaluatorSlug": "tone",
      "weight": 1.0,
      "testCaseId": "...",
      "score": 0.83,
      "passed": true,
      "explanation": "Response stayed polite and professional throughout.",
      "createdAt": "..."
    }
  ]
}
scores[].weight is the runtime weight that was actually applied when computing overallScore. If you passed evaluatorWeights on the trigger, it matches that override; otherwise it matches the evaluator's configured default. For legacy evaluations it may be null.

Evaluation reports (v1.6)

4 endpoints + 2 PDF endpoints that expose the evaluation-report pipeline: executive summary, per-evaluator analysis, per-test-case scores, and a PDF. GET /report is read-only — never calls the LLM, never consumes tokens, missing summaries surface as empty strings ("").

Contract decisions:

GET /evaluations/{id}/report

Returns the consolidated JSON report. Read-only.

GET /evaluations/{id}/report-pdf

Returns the PDF as application/pdf with Content-Disposition: attachment; filename="report_<plan>_eval<runNumber>_<YYYY-MM-DD>.pdf". Read-only.

curl -H "Authorization: Bearer aqa_xxx" \
  https://app.artificialqa.com/api/v1/public/evaluations/<id>/report-pdf \
  -o report.pdf

POST /evaluations/{id}/report-pdf-url — tokenized URL (v1.6.1)

Issues a short-lived (1h TTL) tokenized URL that downloads the PDF without requiring an Authorization header. Use this when you want to hand a click-and-go link to a human or to an LLM client that can't easily round-trip the Bearer token.

{
  "url": "https://app.../api/v1/public/downloads/<43-char-token>",
  "expiresAt": "2026-06-17T18:30:00.000Z",
  "expiresInSec": 3600
}

The URL is multi-use within the TTL, org-scoped, single-purpose (unlocks ONE evaluation's PDF), and Cache-Control: no-store. Does NOT spend tokens and does NOT require any feature gate.

GET /downloads/{token} — public, no Authorization

Streams the PDF binary. No API key required — the token is the auth. Token missing / malformed / expired → unified 404 (NOT 410) so the surface never leaks the existence of past or foreign tokens.

POST /evaluations/{id}/summary

Regenerates the executive summary. Body (optional): { "lang": "en", "force": false }.

{
  "summary": "Generated text...",
  "providerName": "saia",
  "model": "agent-v1",
  "cached": false,
  "tokensUsed": 1234
}

POST /evaluations/{id}/evaluator-summary

Regenerates the analysis for one evaluator within this evaluation. Body: { "evaluatorId": "uuid", "lang": "en", "force": false }. evaluatorId required. Same SummaryResult shape as the executive variant.

5. Code samples

Runnable examples are available in three flavors, each implementing the full flow — discover → trigger → poll → print scores:

For a typed client in any language, generate it from the OpenAPI spec.

6. Limits and edge cases

Plan-tier evaluator filtering is async-silent. When you pass evaluatorIds, the sync pre-check verifies the ids belong to your org but does not apply the billing-tier whitelist — that runs async in the runner. Net effect: ids with planAllowed: false are silently dropped and the scores[] simply omits them. If every requested id is dropped, the evaluation finishes with status: "failed" and errorMessage: "No active evaluators configured". Always filter client-side on planAllowed: true from GET /evaluators before sending. Fix is on the v1.x roadmap.
Need access? The Public REST API is available on Pro and Enterprise plans. Generate a key from your avatar → API Keys & MCPAPI Keys, or reach out via artificialqa.com.