Concepts

The ArtificialQA vocabulary

The minimum glossary you need to navigate the platform. Each concept connects to the others in a coherent flow.

Quick mental map

Concepts are organized around 4 stages:

Design → Test Cases → Test Suites → Test Plans (the executable unit).
Execution → you run a Test Plan N times; each run is a Run. You need an Agent Connection set up first.
Evaluation / Analysis → you take an execution (Run) and score its responses with the evaluators you pick.
Reports → consolidates one or several evaluations; exportable and optionally AI-enhanced.

1. Design

Test Case

Test Case (TC)

The minimum unit. It defines an input that will be sent to the AI agent and the expected response (or characteristics that response should meet).

Two modes:

Simple Q&A: one question, one expected answer.
Conversational: multiple turns in a single conversation, simulating real dialogue.

Each TC can carry deterministic asserts (checks that don't depend on AI: contains, regex, JSON Schema) and/or be evaluated by LLM evaluators.

Test Suite

Test Suite (TS) — sum of TCs

A grouping of Test Cases. Lets you organize by topic (FAQs, escalation, sensitive data), criticality, release, or however suits you. The same TC can live in multiple suites.

Test Plan

Test Plan (TP) — sum of TSs, the executable unit

The reusable executable unit. You create it with a name and description, then attach one or more Test Suites to it. At run time you pick the Agent Connection for that specific run.

2. Execution

Agent Connection

Agent Connection — prerequisite

The configuration that tells ArtificialQA how to talk to your AI agent. It's a separate object from the Test Plan: the same TP can run against different connections (dev, staging, production).

Two protocols:

HTTP/API: your AI agent exposes an endpoint that receives a message and returns a response.
Browser (Playwright): your AI agent is embedded in a webpage; ArtificialQA runs headless Chromium on its cloud workers.

More detail in Connections.

Run / Execution

Run (Execution) — a concrete execution

A concrete execution of a Test Plan against a chosen Agent Connection. Each Run is recorded as an immutable snapshot: the AI agent's responses, timings, and logs stay fixed even if you later change Test Cases or the Connection.

Immutable snapshot, but non-deterministic executions: the snapshot of each Run cannot be modified, but if you re-run the same TP the responses can be different — there's an LLM behind the AI agent. What stays fixed is what happened on each specific run.

3. Evaluation / Analysis

Evaluator

Evaluator — scores an execution

On top of an execution (Run) you activate the evaluators you want and they score the responses. ArtificialQA has two layers:

Deterministic (asserts): exact match, regex, contains, JSON Schema, numeric range, response time, keyword presence, classification. No AI involved.
LLM Evaluators (17 calibrated): a model that grades on a specific dimension (comparison, completeness, conciseness, formality, bias, tone, empathy, security, inappropriate content, error handling, ambiguity, fluency, data accuracy, hallucination, escalation, language, consistency).

Each evaluator returns a decimal score between 0 and 1 and a textual justification. On the same Run you can run several evaluations at different points in time — each one is saved as an independent snapshot. Because there's an LLM behind, two evaluations of the same run can give different scores.

Score Override

Score Override — the only editable case

If you disagree with a specific evaluator score, you can edit it manually. The platform marks the score as "modified", keeps the original in the history, and records who, when, and why. The audit trail stays intact. The list of all overrides is at Execution → Score Overrides.

This is the only edit possible on a run's data; the AI agent responses and logs cannot be modified.

4. Reports

Report

Evaluation Report

The consolidated result of one or several evaluations on a run:

Overall score and pass rate.
Performance per evaluator.
Detail per test case with textual justification.
Failed cases highlighted for inspection.
PDF export.

On the Enterprise plan you also get AI-powered reports: automatic executive summary and per-evaluator textual analysis.

How concepts connect

🧭 Typical journey. You design Test Cases → group them into a Test Suite → assemble a Test Plan with that suite → set up the Agent Connection you'll test against → run the TP (a Run is created) → activate the Evaluators on the Run → look at the Evaluation Report.

Other useful terms

Project: a logical grouping inside your organization. Each project has its own Test Cases, Suites, Plans, and Runs.
Organization: your company's isolated space. Multi-tenant: no data is shared between organizations.
Review view: after generating cases with AI or importing them, they land in a view where you can edit each one and decide where it goes — send it to your Test Cases catalog, push it to a specific Test Suite, or discard it. Keeps a human in the loop.
Industry: the domain you pick when generating cases with AI (15 industries supported).
Hard / Soft assertion: deterministic asserts can be hard (fail the test if not met) or soft (don't fail, but stay logged as observations).

Next step

With this you can move around the tool without getting lost. When you want to go deeper, the next sections explain how to use each concept in practice.