score

dialz.score.get_activation_score(input_text, model, control_vector, layer_index=None, scoring_method='mean')[source]

Returns the activation score for the input_text by projecting hidden state(s) onto the given control_vector direction(s) for the specified layer(s). If multiple layers are provided, the activation scores are averaged.

Scoring methods:
  • ‘mean’: Average the dot products over all tokens.

  • ‘final_token’: Use only the dot product of the final token.

  • ‘max_token’: Use the maximum dot product value among all tokens.

  • ‘median_token’: Use the median of the dot product values among all tokens.

Parameters:
  • input_text (str) – The input string to evaluate.

  • control_vector (SteeringVector) – A ControlVector containing direction(s) keyed by layer index.

  • layer_index – An int or a list of ints representing the layer(s) to use. If None, defaults to the last controlled layer in model.layer_ids.

  • scoring_method (str) – A string specifying which scoring method to use.

Return type:

float

Returns:

A single float representing the averaged activation score.