score¶
- dialz.score.get_activation_score(input_text, model, control_vector, layer_index=None, scoring_method='mean')[source]¶
Compute the activation score for input_text by projecting hidden states onto the control_vector direction(s) at the specified layer(s).
- Return type:
float
- Scoring methods:
‘mean’: Average dot products over all tokens.
‘final_token’: Dot product of the final token.
‘max_token’: Maximum dot product among tokens.
‘median_token’: Median dot product among tokens.
- Parameters:
input_text (str) – The input string to evaluate.
model (SteeringModel) – The model to use for computing activations.
control_vector (SteeringVector) – Contains direction(s) keyed by layer index.
layer_index (int or list[int], optional) – Layer(s) to use. Defaults to last in model.layer_ids.
scoring_method (str) – Scoring method to use.
- Returns:
Averaged activation score across selected layers. int: Number of tokens in the input. list: Unaggregated dot product scores for each layer.
- Return type:
float