score¶
- dialz.score.get_activation_score(input_text, model, control_vector, layer_index=None, scoring_method='mean')[source]¶
Returns the activation score for the input_text by projecting hidden state(s) onto the given control_vector direction(s) for the specified layer(s). If multiple layers are provided, the activation scores are averaged.
- Scoring methods:
‘mean’: Average the dot products over all tokens.
‘final_token’: Use only the dot product of the final token.
‘max_token’: Use the maximum dot product value among all tokens.
‘median_token’: Use the median of the dot product values among all tokens.
- Parameters:
input_text (
str
) – The input string to evaluate.control_vector (
SteeringVector
) – A ControlVector containing direction(s) keyed by layer index.layer_index – An int or a list of ints representing the layer(s) to use. If None, defaults to the last controlled layer in model.layer_ids.
scoring_method (
str
) – A string specifying which scoring method to use.
- Return type:
float
- Returns:
A single float representing the averaged activation score.