FAQ

Frequently Asked Questions

Is a refusal vector a model weight?

No. It is a direction in activation space derived from hidden states, not a new set of learned weights.

Does removing a refusal vector break the model?

When applied carefully, it targets refusal behavior without broadly degrading capability.

Can I combine this with policy filters?

Yes. Abliteration reduces blanket refusals, while policy layers enforce your specific rules.