FAQ

Frequently Asked Questions

Does abliteration ruin model quality?

No. When applied correctly, abliterated models retain > 95% of baseline benchmark scores while reducing refusal rates from 30-60% to under 5%.

What benchmarks do you use?

MMLU for general knowledge, HellaSwag for commonsense reasoning, TruthfulQA for factuality, and HumanEval for code generation. We also use a custom refusal eval set.

How do you prevent the model from answering actually harmful prompts?

Abliteration reduces blanket refusals. Policy Gateway enforces your specific rules about what should actually be refused. The combination gives you control without false positives.

Can I run these evals myself?

Yes. The methodology is fully documented here. Use the same eval sets against the abliteration.ai API to reproduce our results.