FAQ

Frequently Asked Questions

Is abliteration better than jailbreaking?

For production use, yes. Abliteration is stable across prompt variations, auditable, and reversible. Jailbreaks break unpredictably and cannot be governed.

Can I combine abliteration with fine-tuning?

Yes. You can abliterate a fine-tuned model, or fine-tune an abliterated model. The approaches operate at different layers.

Do system-prompt guardrails actually work?

They work for honest users but provide no enforcement against adversarial prompts. Pair them with Policy Gateway for auditable enforcement.

Which method retains the most model capability?

Abliteration, because it targets a narrow refusal direction without updating weights. Fine-tuning risks broader capability shifts.