vectors¶

class dialz.SteeringVector(model_type, directions)[source]¶

directions¶

export_gguf(path)[source]¶

Export a trained SteeringVector to a llama.cpp .gguf file. Note: This file can’t be used with llama.cpp yet. WIP!

`python vector = SteeringVector.train(...) vector.export_gguf("path/to/write/vector.gguf") ` ```

classmethod import_gguf(path)[source]¶

Return type:: SteeringVector

model_type¶

classmethod train(model, dataset, method='pca', **kwargs)[source]¶

Train a SteeringVector for a given model and tokenizer using the provided dataset.

Return type:

SteeringVector

Parameters:

model (PreTrainedModel | SteeringModel) – The model to train against.
tokenizer (PreTrainedTokenizerBase) – The tokenizer to tokenize the dataset.
dataset (list[DatasetEntry]) – The dataset used for training.
**kwargs –
Additional keyword arguments. max_batch_size (int, optional): The maximum batch size for training.

Defaults to 32. Try reducing this if you’re running out of memory.

method (str, optional): The training method to use. Can be either
”pca” or “pca_center”. Defaults to “pca”.

Returns:

The trained vector.

Return type:

SteeringVector

class dialz.SteeringModel(model_name, layer_ids, token=None)[source]¶

This mutates the wrapped `model`! Be careful using `model` after passing it to this class.

A wrapped language model that can have controls set on its layers with self.set_control.

property config¶

property device¶

forward(*args, **kwargs)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

generate(*args, **kwargs)[source]¶

reset()[source]¶

Resets the control for all layer_ids, returning the model to base behavior.

Return type:: None

set_control(control, scalar=1.0, **kwargs)[source]¶

Set a SteeringVector for the layers this SteeringModel handles, with a strength given by scalar. (Negative scalar values invert the control vector, e.g. happiness→sadness.) scalar defaults to 1.0.

Return type:: None

Additional kwargs: - normalize: bool: track the magnitude of the non-modified activation, and rescale the

activation to that magnitude after control (default: False)

operator: Callable[[Tensor, Tensor], Tensor]: how to combine the base output and control (default: +)

set_raw_control(control, **kwargs)[source]¶

Set or remove control parameters to the layers this ControlModel handles. The keys of control should be equal to or a superset of the layer_ids passed to __init__. Only those layers will be controlled, any others in control will be ignored.

Passing control=None will reset the control tensor for all layer_ids, making the model act like a non-control model.

Return type:: None

Additional kwargs: - normalize: bool: track the magnitude of the non-modified activation, and rescale the

activation to that magnitude after control (default: False)

operator: Callable[[Tensor, Tensor], Tensor]: how to combine the base output and control (default: +)

unwrap()[source]¶

Removes the mutations done to the wrapped model and returns it. After using this method, set_control and reset will not work.

Return type:: PreTrainedModel