Advanced

Security & compliance

How we protect your organization's data, what access controls we have, and where we are on the certification path.

Multi-tenant isolation

Each organization lives in a separate compartment. Your organization's data is never shared with another organization's data — at the application level or at the database level.

Every request carries the organization context and is validated on every operation.
Test Cases, Suites, Plans, Runs, and Reports are scoped to the user's organization_id.
There's no way — UI or API — to list or read another organization's resources.

Authentication

Email + password with mandatory email verification before first login.
Login with Google (OAuth).
2FA / TOTP mandatory for every user. Compatible with Google Authenticator, Authy, 1Password, etc.
Password recovery via single-use email link.

Roles and permissions

Within an organization, users have roles that define what they can do. Main roles:

Admin / Owner: full control over the organization, billing, and members.
Member: can create and operate Test Cases, Suites, Plans, Runs, and Evaluations.
Viewer (read-only): can view reports but doesn't run or modify.

Data in transit and at rest

In transit: all traffic between your browser/pipeline and the platform travels over HTTPS / TLS.
At rest: data is stored in cloud infrastructure with provider-supplied at-rest encryption.
Sensitive credentials (tokens, API keys of your AI agent) are stored encrypted and never returned in plaintext to the UI or API.

Automatic PII detection

As part of evaluation, the platform automatically detects:

Emails.
Phone numbers.
ID documents.
Credit cards.

Detection is used to alert when an AI agent exposes PII where it shouldn't.

Immutable snapshots and traceability

Every Run stores a complete snapshot: input, response, timings, logs, and scores. These snapshots are not edited after creation, which provides:

Reproducibility: you can replay exactly what happened in an old run.
Auditability: if a regulator or client asks to see a historical run, it's available identical to how it was generated.

Evaluator calibration

The 17 LLM evaluators that score your AI agent's responses come pre-calibrated by our team. Calibration is the process by which we make sure an evaluator returns reliable, consistent scores across different domains and models.

How we do it internally

We maintain reference datasets with responses labeled by human experts with their expected score.
Each evaluator scores those cases and we measure the deviation from the reference score.
We tune the evaluator's prompt, model, and temperature until the deviation falls within an acceptable margin.
We repeat the process on any meaningful change: new model, prompt adjustment, or customer feedback suggesting drift.
Calibration datasets are versioned alongside the evaluator so behavior is reproducible.

What this guarantees

When you enable an evaluator in your organization, it's already validated to score consistently.
If in a future version we change the model or prompt, we recalibrate before enabling it.
You don't have to understand or maintain the process — you just pick which evaluators to activate for your domain.

💡 Why does it matter? An uncalibrated LLM evaluator can give arbitrary scores — different models, prompts, and temperatures return very different scores for the same response. Calibration is what turns a "raw LLM score" into a reliable metric you can act on.

Infrastructure & stack

High-level summary of the infrastructure ArtificialQA runs on, aimed at IT and compliance teams on the customer side:

Cloud: AWS. The entire platform runs on Amazon Web Services infrastructure.
LLM layer: the models powering AI generation, evaluators, and AI-enhanced reports go through Globant Enterprise AI, a managed intermediate platform that gives us stability across model-provider changes, security controls, and consumption traceability. Customer data is never exposed directly to a single model provider.
Browser execution: Playwright on headless Chromium, on managed workers.
Web application: Next.js + React + TypeScript end-to-end, with TailwindCSS for the design system.
Persistence: managed relational database for Test Cases, Suites, Plans, Runs, evaluations, and reports; Redis cache for sessions and real-time operations.
API: standard REST, available on Pro and Enterprise plans for integration with CI/CD or other internal systems.

If you need a more detailed technical sheet (versions, regions, cloud-provider certifications, specific DPA), we coordinate delivery under NDA on the Enterprise plan.

External audit by Nextfense

ArtificialQA went through an external security review with Nextfense. Recommendations from the process were evaluated and implemented.

If you need scope and findings detail for your security team, get in touch.

Compliance: where we are

To be transparent:

We have the operational controls that serve as a foundation for certifications like SOC 2 / ISO 27001 (access management, segregation, logging, backups).
Formal certification is an in-progress process and we cannot represent that we already hold it.
For Enterprise customers with specific requirements (data residency, custom DPA, dedicated SLA) we coordinate a tailored plan.

Best practices for your team

Enable 2FA for all accounts, especially Admin/Owner.
Use minimum roles: Viewer for users who only need to read reports.
Rotate your AI agent's tokens periodically; the platform re-encrypts them without asking for the old one.
Don't put real production data in Test Case inputs unless necessary. Use synthetic data.
Watch the PII detector: if it detects real leaks in AI agent responses, escalate immediately with development.

Reporting a vulnerability

If you found a possible vulnerability, write to us through artificialqa.com. We take reports seriously and respond within a few days.

Next step

If operational or plan questions remain, check the FAQ and the Plans & pricing page.