datasets

class dialz.Dataset[source]

A class to manage a dataset of positive and negative examples.

add_entry(positive, negative)[source]

Adds a new DatasetEntry to the dataset.

Return type:

None

Parameters:
  • positive (str) – The positive example.

  • negative (str) – The negative example.

add_from_saved(saved_entries)[source]

Adds entries from a pre-saved dataset.

Return type:

None

Parameters:

saved_entries (List[dict]) – A list of dictionaries, each containing “positive” and “negative” keys.

classmethod create_dataset(model_name, contrastive_pair, system_role='Act as if you are extremely ', prompt_type='sentence-starters', num_sents=300)[source]

Creates a dataset by generating positive and negative examples based on a given model, contrastive pairs, and prompt variations. This function uses a tokenizer to process input prompts and applies a chat template to generate positive and negative examples for each variation. The resulting examples are added to a dataset object.

Return type:

Dataset

Parameters:
  • cls – The class instance (used for accessing class methods).

  • model_name (str) – The name of the pre-trained model to use for tokenization.

  • contrastive_pair (list) – A list containing two elements representing the positive and negative contrastive pairs.

  • system_role (str, optional) – A string representing the system’s role in the chat template. Defaults to “Act as if you are extremely “.

  • prompt_type (str, optional) – The type of prompt variations to use. Defaults to “sentence-starters”.

  • num_sents (int, optional) – The number of prompt variations to process. Defaults to 300.

Returns:

A dataset object containing the generated positive and negative examples.

Return type:

Dataset

Raises:
  • FileNotFoundError – If the specified prompt variations file does not exist.

  • json.JSONDecodeError – If the prompt variations file is not a valid JSON file.

classmethod load_dataset(model_name, name, num_sents=300)[source]

Loads a default pre-saved corpus included in the package, re-applies chat templates to each entry, and limits to num_sents.

Return type:

Dataset

classmethod load_from_file(file_path)[source]

Loads a dataset from a JSON file.

Return type:

Dataset

Parameters:

file_path (str) – The path to the JSON file containing the dataset.

Returns:

A new Dataset instance loaded from the file.

Return type:

Dataset

save_to_file(file_path)[source]

Saves the dataset to a JSON file.

Return type:

None

Parameters:

file_path (str) – The path to the file where the dataset will be saved.

view_dataset()[source]

Returns the current dataset as a list of DatasetEntry objects.

Return type:

List[DatasetEntry]

Returns:

The list of all entries in the dataset.

Return type:

List[DatasetEntry]