ES EN
How it works

Executing the tests

How the AI agent connects, how Test Plans are built, and how runs work. Everything that happens when you click Run.

Mental model

To execute, you need 3 pieces:

  1. An Agent Connection — the AI agent you'll test against.
  2. A Test Suite — the cases you want to run.
  3. A Test Plan — the recipe combining the previous two.

Once you have the Test Plan, you can run it as many times as you want. Each run is recorded as an immutable Run.

1. Configure the Agent Connection

Go to Configuration → AI Agents → + New Connection.

Browser connection (Playwright)

For AI agents embedded in a webpage. ArtificialQA runs headless Chromium on its cloud workers and operates it as a user would.

HTTP/API connection (Pro or Enterprise plan)

For AI agents exposing their own endpoint.

Test Connection

Once the connection is saved, use the Test Connection button to validate it. It sends a test message and shows what was sent and received. If it fails, you'll see the exact error (timeout, 401, selector not found, etc.).

🔁 Retries. If a question fails due to a transient error (timeout, 5xx) the platform automatically retries before marking the case as failed. Sensible defaults are applied — no manual setup required.

2. Build a Test Plan

Go to Test Design → Test Plans → New Plan. Provide:

Once the plan is created, open its detail view and add one or more Test Suites to it. A Test Plan can contain multiple suites — all of them will run together when the plan is executed.

3. Run the Test Plan

From the Test Plans list, click Run on the plan you want to execute and pick the Agent Connection to run against. The platform executes every test case from every suite attached to that plan, against the selected connection.

Real-time view

While running you'll see:

Run detail screen with metrics and case list
Detail of a completed Run — Total / Passed / Failed / Avg Time metrics on top, list of test cases below with S (Simple) and C (Conversational) badges.

Run states

Conversational cases

When a case is multi-turn, ArtificialQA keeps the session with the AI agent across all turns of the case. Each turn's response is saved and evaluated in the context of the conversation.

Tokens consumed during execution

It depends on the connection protocol:

Other phases that consume tokens, regardless of protocol:

Immutable snapshots

Each Run stores a complete snapshot: input sent, headers, raw response, timings, and logs. Even if you later modify the Test Case, the Connection, or the Suite, the Run preserves exactly what happened that time.

That's what allows auditing and reproducing results months later.

Common errors and how to fix them

Next step

You now have the Run with all responses. The next step is activating the evaluators on that run.