datasets¶
- class dialz.Dataset[source]¶
A class to manage a dataset of positive and negative examples.
- add_entry(positive, negative)[source]¶
Adds a new DatasetEntry to the dataset.
- Return type:
None
- Parameters:
positive (str) – The positive example.
negative (str) – The negative example.
- add_from_saved(saved_entries)[source]¶
Adds entries from a pre-saved dataset.
- Return type:
None
- Parameters:
saved_entries (List[dict]) – A list of dictionaries, each containing “positive” and “negative” keys.
- classmethod create_dataset(model_name, contrastive_pair, system_role='Act as if you are extremely ', prompt_type='sentence-starters', num_sents=300)[source]¶
Creates a dataset by generating positive and negative examples based on a given model, contrastive pairs, and prompt variations. This function uses a tokenizer to process input prompts and applies a chat template to generate positive and negative examples for each variation. The resulting examples are added to a dataset object.
- Return type:
- Parameters:
cls – The class instance (used for accessing class methods).
model_name (str) – The name of the pre-trained model to use for tokenization.
contrastive_pair (list) – A list containing two elements representing the positive and negative contrastive pairs.
system_role (str, optional) – A string representing the system’s role in the chat template. Defaults to “Act as if you are extremely “.
prompt_type (str, optional) – The type of prompt variations to use. Defaults to “sentence-starters”.
num_sents (int, optional) – The number of prompt variations to process. Defaults to 300.
- Returns:
A dataset object containing the generated positive and negative examples.
- Return type:
- Raises:
FileNotFoundError – If the specified prompt variations file does not exist.
json.JSONDecodeError – If the prompt variations file is not a valid JSON file.
- classmethod load_dataset(model_name, name, num_sents=300)[source]¶
Loads a default pre-saved corpus included in the package, re-applies chat templates to each entry, and limits to num_sents.
- Return type: