Skip to content

AI Validator

The AI validator lets you write acceptance criteria in natural language and have an LLM evaluate the submission against them. It's the right tool when a check is genuinely subjective — "does this README explain installation, configuration, and licensing?" — or when the format is loose enough that a formal schema would be brittle.

The AI validator is not a replacement for schemas, CEL, or simulations. It's complementary: use it for things humans would naturally evaluate by reading, not for anything you can encode deterministically.

What you'll need

  • A Validibot account with permission to author workflows.
  • A workflow whose allowed file types include the format you want to check (typically PDF, text, Markdown, or JSON).
  • Your acceptance criteria, written as plain-English sentences.
  • An LLM provider configured for your deployment. See Self-Hosted Editions for which providers Validibot supports and how to wire credentials in. On Validibot Cloud, an LLM is already configured for you.

Setting up an AI step

  1. Open the workflow editor and click Add step.
  2. Pick AI from the validator library.
  3. Give the step a name like "README quality gate" and a short description.
  4. Write your acceptance criteria in the Rules field, one rule per line or as a numbered list. The clearer and more specific each rule is, the more reliably it can be evaluated.
  5. (Optional) Pick a model tier — most deployments expose a fast tier for quick checks and a stronger tier for deeper reasoning.
  6. Click Save step.

Writing good AI criteria

The validator's reliability depends almost entirely on how you phrase the rules. A few patterns that work:

  • Be specific. "The document covers installation" is vague. "The document explains how to install the package with pip install and lists supported Python versions" is testable.
  • One concept per rule. Compound rules ("Has installation AND testing AND licensing") produce muddier findings than three separate rules.
  • Say what must be present, not what should not be. "The document mentions a license" works better than "the document does not omit licensing information."
  • Anchor on observable behaviour. "Includes a runnable example" is testable. "Is engaging" is not.

What the validator reports

For each rule, the AI validator returns a pass / fail plus a short explanation. Findings include the rule that failed and the LLM's reasoning, so submitters can act on it.

The validator is conservative: an ambiguous case is reported as a warning rather than silently passing, and a rule the model cannot evaluate reliably is escalated as a finding rather than fabricated.

A note on determinism and trust

LLM evaluations are not bit-for-bit deterministic the way a JSON Schema check is. Two runs of the same submission against the same rules can produce slightly different findings — usually agreeing on the headline result, occasionally differing on edge phrasing.

That means the AI validator is a great gate ("does this look roughly right?") and a less great audit trail ("here is the canonical reason this passed"). When a check has compliance implications, prefer a deterministic validator — schema, SHACL, FMU, EnergyPlus — and use the AI validator alongside, not as a replacement.

Cost and rate

Each AI step calls an LLM, which has cost and latency. For high-volume workflows:

  • Put cheap, deterministic checks (schemas, CEL) first — the AI step only runs when its predecessors pass.
  • Pick the smallest model that handles your rules well.
  • On Validibot Cloud, see your plan's metered usage page for current AI-validation pricing.

File types

The AI validator can read any file Validibot's text extraction stack supports — typically PDF, plain text, Markdown, HTML, and JSON. Image and audio support depend on the configured provider; check the provider page in your deployment's admin for what's available.

Tips

  • Treat the AI as a first reader, not a final arbiter. Use it to filter out obvious misses; reserve human review for the borderline cases.
  • Iterate on your rules with a known-good and a known-bad sample. Run both through the step, compare the findings, refine the wording until each rule fires when it should and stays quiet when it shouldn't.
  • Don't ask the LLM to compute. "Does the total equal the sum of line items?" should be a CEL rule, not an AI rule.

Where to next

Spotted a problem on this page? Report it or suggest an edit