Is refusal vector ablation the same as fine-tuning?
No. It is a deterministic edit to activations, not a gradient-based weight update.
Glossary
Refusal vector ablation removes the refusal direction from hidden states.
It is the core operation behind abliteration.
Refusal vector ablation is the process of subtracting a learned refusal direction from a model's hidden states to reduce refusals without retraining the entire model.
h_ablit = h - (h · r_hat) r_hat
FAQ
No. It is a deterministic edit to activations, not a gradient-based weight update.
Yes. Because the edit is applied at inference time, you can remove it or adjust its strength.