Does orthogonalization change model weights?
No. It is applied to activations at inference time, not to weights.
Glossary
Orthogonalization removes a vector component by subtracting its projection.
Abliteration uses this to remove the refusal direction from hidden states.
Orthogonalization is the process of making a vector orthogonal to another by subtracting its projection. In activation editing, it removes a behavior direction from hidden states.
h_orth = h - (h · v_hat) v_hat
FAQ
No. It is applied to activations at inference time, not to weights.
It cleanly removes the refusal component while leaving other information intact.