regal_data.zip (59.62 MB)
regal_data.zip
Preprocessed datasets used in REGAL: Rule-Guided Active Learning for Semi-Automated Weak Supervision. Each dataset is a dictionary containing 'train', 'valid', and 'test' keys corresponding to the respective train, validation, and test splits. The main dictionary also contains two other keys:
- 'class_names': A dictionary mapping each class to its name
- 'rule_keywords': A dictionary mapping each class to a list of seed rules for that class
Each split contains a dictionary the following keys:
Each split contains a dictionary the following keys:
- 'text': A list of strings, each of which is a training example
- 'labels': A torch.LongTensor containing the class label of each example
- 'text': List of strings. Each element is the text of one training example
- 'text': List of strings. Each element is the text of one training example
-