vectors

class dialz.SteeringVector(model_type, directions)[source]

A per-layer steering direction for activation-level model control.

Steering vectors can be combined arithmetically (+, -, *, /), serialized to GGUF files, and applied to a SteeringModel via SteeringModel.set_control().

model_type

HuggingFace model type string (e.g. "mistral").

directions

Mapping from layer index to direction vector.

directions
export_gguf(path)[source]

Export this steering vector to a GGUF file.

Return type:

None

Note

The GGUF format is not yet supported by llama.cpp for steering vectors. This is a WIP serialisation target.

Parameters:

path – File path to write the .gguf file to.

Example:

vector = SteeringVector.train(model, dataset)
vector.export_gguf("vector.gguf")
classmethod import_gguf(path)[source]

Load a steering vector from a GGUF file.

Return type:

SteeringVector

Parameters:

path – Path to the .gguf file.

Returns:

The deserialized steering vector.

Raises:

ValueError – If required GGUF fields are missing or malformed.

model_type
classmethod train(model, dataset, method=Method.PCA, **kwargs)[source]

Train a SteeringVector from a contrastive dataset.

A tokenizer is loaded automatically from model.model_name.

Return type:

SteeringVector

Parameters:
  • model – The model to train against (must have model_name and token attributes).

  • dataset – The contrastive dataset used for training.

  • method – The extraction strategy. Accepts a Method enum, a string ("pca", "mean_diff", etc.), or any custom SteeringStrategy callable. Defaults to Method.PCA.

  • **kwargs

    Forwarded to read_representations(). Useful keys:

    • batch_size (int) – max batch size (default 32).

    • token_index (int) – token position index into non-padding tokens (default -1, last token).

Returns:

The trained steering vector.

class dialz.SteeringModel(model_name, layer_ids, token=None, torch_dtype=torch.float16)[source]

This mutates the wrapped `model`! Be careful using `model` after passing it to this class.

A wrapped language model that can have controls set on its layers with self.set_control.

property config

Model configuration (delegates to the wrapped model).

property device

Device the model resides on.

forward(*args, **kwargs)[source]

Delegate to the wrapped model’s forward.

Return type:

Any

generate(*args, **kwargs)[source]

Delegate to the wrapped model’s generate.

Return type:

Any

reset()[source]

Resets the control for all layer_ids, returning the model to base behavior.

Return type:

None

set_control(control, scalar=1.0, **kwargs)[source]

Apply a SteeringVector to the controllable layers.

Return type:

None

Parameters:
  • control – Steering vector whose layer directions will be applied.

  • scalar – Strength multiplier. Negative values invert the direction (e.g. happiness → sadness).

  • **kwargs – Passed to BlockControlParams. normalize (bool) rescales activations to their pre-control magnitude. operator (callable) overrides the default + combination.

set_raw_control(control, **kwargs)[source]

Set or remove control parameters to the layers this ControlModel handles. The keys of control should be equal to or a superset of the layer_ids passed to __init__. Only those layers will be controlled, any others in control will be ignored.

Passing control=None will reset the control tensor for all layer_ids, making the model act like a non-control model.

Return type:

None

Additional kwargs: - normalize: bool: track the magnitude of the non-modified activation, and rescale the

activation to that magnitude after control (default: False)

  • operator: Callable[[Tensor, Tensor], Tensor]: how to combine the base output and control (default: +)

unwrap()[source]

Removes the mutations done to the wrapped model and returns it. After using this method, set_control and reset will not work.

Return type:

PreTrainedModel