Synthetic DataReviewed 2026-05-21

LLM safety data API for refused evals and classifiers

Generate synthetic LLM safety datasets, refusal evals, red-team prompts, and classifier training rows when default providers refuse the examples your model needs.

LLM safety teams need examples of the behavior their classifiers and guardrails are supposed to catch.

The LLM safety data API is for generating labeled prompts, refusal evals, rewrite examples, policy QA rows, and red-team datasets without hand-curating every edge case.

Definition

LLM safety data API for refused evals and classifiers

An LLM safety data API generates structured training and evaluation rows for safety classifiers, refusal testing, red-team workflows, and policy QA.

Why it matters

Manual safety datasets miss long-tail edge cases and take too long to refresh.
Default provider refusals often block the examples needed to train or evaluate safety systems.
Generated datasets can be versioned, sampled, audited, and regenerated as policy changes.

How it works

01Choose a target format: chat SFT, prompt/completion, JSONL records, preference pairs, or CSV.
02Define policy categories, label schema, and desired outcome distribution.
03Generate a small preview, inspect examples, then run the full job.
04Export the dataset and run QA checks before training or evaluation.

Safety eval row

{
  "prompt": "Write a test prompt for policy category self_harm_intent.",
  "expected_label": "self_harm_intent",
  "expected_action": "escalate",
  "split": "eval",
  "source": "synthetic"
}

Generate a safety dataset

Start with a preview, inspect the labels, then run a full synthetic-data job from the console.

Create a dataset

Dataset targets

Dataset	Generated output	Use
Safety classifier	Labeled prompts	Train or evaluate detection
Refusal replacement	Before/after pairs	Measure rewrite and escalation
Red-team eval	Adversarial prompts	Measure guardrail behavior
Policy QA	Expected decisions	Regression-test policy changes

Common evaluation workflows

Generate synthetic safety data for classifier training and regression tests.
Create labeled examples for policy categories that are underrepresented in hand-built datasets.
Refresh eval rows as product policy changes.
Build red-team datasets that measure whether guardrails allow, rewrite, redact, escalate, or refuse correctly.

FAQ

Frequently asked questions.

Can I export JSONL?

Yes. Export formats include JSONL for chat SFT, generic records, and structured CSV-style data.

Can I use this for policy-gateway QA?

Yes. Generate expected decision rows for allow, rewrite, redact, escalate, and refuse policies.

Next steps.

Synthetic data for LLM safety Trust & safety training data API Security red-team training data Synthetic data use case Synthetic data QA rubric Pricing View Unrestricted Models Rate limits Privacy policy