Official documentation
Welcome to ArtificialQA
The platform to test, evaluate, and monitor the quality of AI agents. Automated, reproducible, and auditable.
What problem does it solve?
Traditional testing assumes that the same input always produces the exact same output. AI agents break this assumption: the same question can produce several valid answers with different quality, tone, accuracy, or level of detail.
ArtificialQA is built specifically for that scenario:
- Generate test cases with AI, pull them from our public catalog of 25,000 curated cases, import from Excel/JSON, or create them manually.
- Run them automatically against your actual AI agent (via HTTP/API or browser).
- Each response is evaluated with deterministic asserts and 17 calibrated LLM evaluators.
- Get executive reports, trend dashboards, and immutable per-execution snapshots for auditing and reproducibility.
The 3-module flow
Module 01
Generation
Build the test cases.
With AI by industry, importing Excel/JSON, or manually.
With AI by industry, importing Excel/JSON, or manually.
Module 02
Execution
Run the cases against your AI agent.
HTTP/API connection or browser via Playwright.
HTTP/API connection or browser via Playwright.
Module 03
Evaluation
Each response is scored.
Deterministic asserts + 17 calibrated LLM evaluators.
Deterministic asserts + 17 calibrated LLM evaluators.
Who is it for?
🧪
QA teams
Looking to scale AI agent testing without adding person-hours per release.
💻
Developers
Who need to integrate automated LLM response testing into their development workflows.
🏢
Companies with AI agents in production
Requiring continuous quality control, version traceability, and auditable reports.
How do I start?
We recommend the following path:
🚀
5-minute Quickstart
Create your account, connect your first agent, and get your first report.
🧠
Understand the concepts
Test Cases, Suites, Plans, Connections, Evaluators, and Reports.
🎯
How evaluation works
The two evaluation layers and the 17 LLM evaluators.
💰
Plans and pricing
Free, Pro, and Enterprise. Start free, no credit card required.
The Free plan lets you try the platform without a credit card and with no time limit. It's enough to validate if ArtificialQA fits your workflow before making any decisions.
Main features
- AI-powered test case generator by industry. 15 industries supported (healthcare, finance, e-commerce, insurance, telecom, education, legal, HR, SaaS, travel, real estate, food, safety, customer support, and general).
- Support for simple and conversational cases. Multi-turn test cases or single Q&A.
- Two connection protocols. HTTP/API for AI agents with an endpoint, or browser with Playwright for AI agents embedded in websites.
- 17 calibrated LLM evaluators. Comparison, completeness, conciseness, formality, bias, tone, empathy, security, inappropriate content, error handling, ambiguity, fluency, data accuracy, hallucination, escalation, language, consistency.
- Automatic PII (Personally Identifiable Information) detection. Emails, phone numbers, ID documents, and credit cards.
- Pre-calibrated evaluators. We internally validate each evaluator against reference datasets before enabling it, so scores are reliable. You don't have to calibrate anything.
- PDF-exportable reports. Executive summary, evaluator-by-evaluator performance, and failed-case detail.
- Isolated multi-tenant. Each organization lives in a separate compartment; data is never shared between customers.
- External security audit by Nextfense. We've gone through a security review with Nextfense; recommendations were considered and implemented.