score

dialz.score.get_activation_score(input_text, model, control_vector, layer_index=None, scoring_method='mean')[source]

Compute the activation score for input_text by projecting hidden states onto the control_vector direction(s) at the specified layer(s).

Return type:

float

Scoring methods:
  • ‘mean’: Average dot products over all tokens.

  • ‘final_token’: Dot product of the final token.

  • ‘max_token’: Maximum dot product among tokens.

  • ‘median_token’: Median dot product among tokens.

Parameters:
  • input_text (str) – The input string to evaluate.

  • model (SteeringModel) – The model to use for computing activations.

  • control_vector (SteeringVector) – Contains direction(s) keyed by layer index.

  • layer_index (int or list[int], optional) – Layer(s) to use. Defaults to last in model.layer_ids.

  • scoring_method (str) – Scoring method to use.

Returns:

Averaged activation score across selected layers. int: Number of tokens in the input. list: Unaggregated dot product scores for each layer.

Return type:

float