Platform · Training Data

Generate the labels other models refuse to write.

Generate labelled datasets at scale: preference pairs, eval rows, classifier examples, and the edge cases other LLMs refuse to write. Reasoning traces and live web-search grounding on every row; one-click export to Hugging Face, Kaggle, S3, GCS, or Azure. Schema-validated outputs, ready to drop into any training pipeline.

abliteration.ai/console/synthetic-data
Describe what you want

Generate 10,000 realistic customer support conversations about refunds and billing edge cases.

Chat SFT10,000 recordsOpenAI training JSONL

billing-support-training.jsonl

10,000 rows · 4.2 MB · ready

OpenAI training JSONL
{"messages":[{"role":"user","content":"My card was charged twice for the same order — can I get one refunded?"},{"role":"assistant","content":"Of course — I can look that up. Could you share the order number?"}]}
{"messages":[{"role":"user","content":"Where's my order for SKU-A-12? It said shipped 4 days ago."},{"role":"assistant","content":"Let me pull the tracking. One moment…"}]}
{"messages":[{"role":"user","content":"Can you change my shipping address before the order leaves?"},{"role":"assistant","content":"Sure — what's the new address?"}]}
{"messages":[{"role":"user","content":"I want to cancel and get a refund. How long does it take?"},{"role":"assistant","content":"Got it. Cancellation processes immediately; refund hits in 3-5 business days."}]}
+ 9,996 more rows
What you can generate

The categories that block your training pipeline.

Coded harassment

Multi-lingual dog-whistles and ingroup-coded slurs for moderation classifier training.

Phishing emails

Synthetic phishing across urgency, authority, and pretexting tactics for detection eval.

Prompt injections

Direct, indirect, ASCII-smuggling, and tool-confusion vectors for AppSec red-team training.

Adversarial edge cases

Refusal-eval prompts, policy edge cases, and known-jailbreak corpora for safety regression sets.

Reasoning

Reasoning included.

Most frontier APIs hide chain-of-thought out of distillation paranoia. We don't. Toggle Thinking in the console and the model's reasoning lands on every row as a schema-aware sidecar, ready to feed your next reasoning model.

row_00042.jsonl + reasoningtoggle · Thinking
promptWalk me through your reasoning. Why would a junior engineer prefer Postgres over MongoDB for an event-log table?
completionPostgres wins for an event-log table when you need transactional integrity, point-in-time recovery, and structured queries over event metadata.
Reasoning · most providers strip this out
reasoningThinking + Export reasoningStep 1: The use case is an event log, so the workload is append-heavy. Step 2: Compare strengths. Postgres gives ACID and a richer query surface; Mongo gives looser schema but no native transactions across collections at this scale. Step 3: Junior engineers benefit from constraints — schema enforcement catches bugs early. Step 4: Decide Postgres.
provenance{"model":"abliterated-model","seed":420,"run_id":"R-018"}
Preview & cost

Preview before you commit.

Every job starts with a paid 3-row preview generated against your spec, schema, and toggles. We charge real preview credits, then estimate the full run as the highest-cost preview row × your row target plus a reserve buffer. Top up if short. Promote to full job if it looks right.

preview.run · R-018preview ready
#sample prompttokens in/outcost
1Multi-lingual coded harassment, tier-2 severity.1,402 / 812$0.0052
2RAG-grounded answer about Article 53 timeline.1,538 / 921$0.0064max
3Deepfake-script detector seed, voice-clone variant.1,318 / 784$0.0049
preview spent$0.0165/ 3 rows
full-run estimate · max-row × 10,000 + 5% reserve$0.0064×10,000$67.20
Reserve required · $67.20
Output formats

In the format your pipeline already speaks.

JSONLOpenAI fine-tuneDPO / ORPOHugging Face datasetsParquetCSV
Export

Push to where your training pipeline lives.

Stream a run straight to the bucket, dataset hub, or notebook you already use. No middleman, no manual upload step.

See export options
run.flow
Generate
billing-support-training.jsonl
10,000 rows · 4.2 MB · ready
Pick destination
huggingface.co / your-org / billing-support
private dataset · CMEK
Pushed
✓ ready in HF Hub
under 30 seconds
Hugging Face Hub

Hugging Face Hub

One-click push to a private or public dataset card.

Kaggle

Kaggle

Push to a private Kaggle dataset for competitions and notebooks.

Amazon S3

Amazon S3

Stream directly to your bucket. SSE-KMS supported.

Google Cloud Storage

Google Cloud Storage

Stream to GCS with service-account auth and CMEK.

Azure Blob

Azure Blob

Write straight to a container with managed identity.

Direct download

Grab a signed URL and pull the run from your training box.

Schema & provenance

Typed once. Validated every row.

Define the dataset shape once. Every generated row is validated before it leaves the workspace, with provenance, tier, and policy label attached. No bespoke clean-up step downstream.

schema.json
{
  "schema": "preference-pair-v3",
  "fields": [
    { "name": "prompt",   "type": "string" },
    { "name": "chosen",   "type": "string" },
    { "name": "rejected", "type": "string" },
    { "name": "policy",   "type": "string" },
    { "name": "tier",     "type": "int"    }
  ]
}
sample.jsonl
{
  "prompt":   "Rewrite this support ticket in a civil tone.",
  "chosen":   "I understand your frustration. Let's solve the refund...",
  "rejected": "Stop being so rude. Read the FAQ.",
  "policy":   "civil-tone-v2",
  "tier":     1
}
Integrate

One line in your training script.

Once a run is exported, load it the same way you load every other dataset: Hugging Face datasets, the OpenAI fine-tune API, or your own RLHF stack.

train.py
from datasets import load_dataset

ds = load_dataset(
  "abliteration/runs/billing-support-training",
  split="train",
)
trainer.train(dataset=ds)
Pricing

Free tier. Pay-as-you-go. Enterprise.

Generate up to 1M records per job. Tiered by records-per-month, with usage-based scaling for enterprise.

See pricing

Try the model that doesn’t say no.

Free tier. OpenAI-compatible. Policy Gateway when you scale.