LLM safety data API for evals and classifiers
Generate synthetic LLM safety datasets, refusal evals, red-team prompts, and classifier training rows with OpenAI-compatible API access.
LLM safety teams need examples of the behavior their classifiers and guardrails are supposed to catch.
The LLM safety data API is for generating labeled prompts, refusal evals, rewrite examples, policy QA rows, and red-team datasets without hand-curating every edge case.
LLM safety data API for evals and classifiers
An LLM safety data API generates structured training and evaluation rows for safety classifiers, refusal testing, red-team workflows, and policy QA.
- Manual safety datasets miss long-tail edge cases and take too long to refresh.
- Default provider refusals often block the examples needed to train or evaluate safety systems.
- Generated datasets can be versioned, sampled, audited, and regenerated as policy changes.
- 01Choose a target format: chat SFT, prompt/completion, JSONL records, preference pairs, or CSV.
- 02Define policy categories, label schema, and desired outcome distribution.
- 03Generate a small preview, inspect examples, then run the full job.
- 04Export the dataset and run QA checks before training or evaluation.
{
"prompt": "Write a test prompt for policy category self_harm_intent.",
"expected_label": "self_harm_intent",
"expected_action": "escalate",
"split": "eval",
"source": "synthetic"
}Generate a safety dataset
Start with a preview, inspect the labels, then run a full synthetic-data job from the console.
Create a datasetDataset targets
| Dataset | Generated output | Use |
|---|---|---|
| Safety classifier | Labeled prompts | Train or evaluate detection |
| Refusal replacement | Before/after pairs | Measure rewrite and escalation |
| Red-team eval | Adversarial prompts | Measure guardrail behavior |
| Policy QA | Expected decisions | Regression-test policy changes |
Common evaluation workflows
- Generate synthetic safety data for classifier training and regression tests.
- Create labeled examples for policy categories that are underrepresented in hand-built datasets.
- Refresh eval rows as product policy changes.
- Build red-team datasets that measure whether guardrails allow, rewrite, redact, escalate, or refuse correctly.
Frequently asked questions.
Can I export JSONL?
Yes. Export formats include JSONL for chat SFT, generic records, and structured CSV-style data.
Can I use this for policy-gateway QA?
Yes. Generate expected decision rows for allow, rewrite, redact, escalate, and refuse policies.