datasets¶

class dialz.Dataset[source]¶

A class to manage a dataset of positive and negative examples.

add_entry(positive, negative)[source]¶

Adds a new DatasetEntry to the dataset.

Return type:

None

Parameters:

positive (str) – The positive example.
negative (str) – The negative example.

add_from_saved(saved_entries)[source]¶

Adds entries from a pre-saved dataset.

Return type:: None
Parameters:: saved_entries (List[dict]) – A list of dictionaries, each containing “positive” and “negative” keys.

classmethod create_dataset(model_name, contrastive_pair, system_role='Act as if you are extremely ', prompt_type='sentence-starters', num_sents=300)[source]¶

Creates a dataset by generating positive and negative examples based on a given model, contrastive pairs, and prompt variations. This function uses a tokenizer to process input prompts and applies a chat template to generate positive and negative examples for each variation. The resulting examples are added to a dataset object.

Return type:

Dataset

Parameters:

cls – The class instance (used for accessing class methods).
model_name (str) – The name of the pre-trained model to use for tokenization.
contrastive_pair (list) – A list containing two elements representing the positive and negative contrastive pairs.
system_role (str, optional) – A string representing the system’s role in the chat template. Defaults to “Act as if you are extremely “.
prompt_type (str, optional) – The type of prompt variations to use. Defaults to “sentence-starters”.
num_sents (int, optional) – The number of prompt variations to process. Defaults to 300.

Returns:

A dataset object containing the generated positive and negative examples.

Return type:

Dataset

Raises:

FileNotFoundError – If the specified prompt variations file does not exist.
json.JSONDecodeError – If the prompt variations file is not a valid JSON file.

classmethod load_dataset(model_name, name, num_sents=300)[source]¶

Loads a default pre-saved corpus included in the package, re-applies chat templates to each entry, and limits to num_sents.

Return type:: Dataset

classmethod load_from_file(file_path)[source]¶

Loads a dataset from a JSON file.

Return type:: Dataset
Parameters:: file_path (str) – The path to the JSON file containing the dataset.
Returns:: A new Dataset instance loaded from the file.
Return type:: Dataset

save_to_file(file_path)[source]¶

Saves the dataset to a JSON file.

Return type:: None
Parameters:: file_path (str) – The path to the file where the dataset will be saved.

view_dataset()[source]¶

Returns the current dataset as a list of DatasetEntry objects.

Return type:: List[DatasetEntry]
Returns:: The list of all entries in the dataset.
Return type:: List[DatasetEntry]