figshare
Browse
regal_data.zip (59.62 MB)

regal_data.zip

Download (59.62 MB)
dataset
posted on 2022-01-29, 00:25 authored by David KartchnerDavid Kartchner
Preprocessed datasets used in REGAL: Rule-Guided Active Learning for Semi-Automated Weak Supervision. Each dataset is a dictionary containing 'train', 'valid', and 'test' keys corresponding to the respective train, validation, and test splits. The main dictionary also contains two other keys:
- 'class_names': A dictionary mapping each class to its name
- 'rule_keywords': A dictionary mapping each class to a list of seed rules for that class

Each split contains a dictionary the following keys:
- 'text': A list of strings, each of which is a training example
- 'labels': A torch.LongTensor containing the class label of each example
- 'text': List of strings. Each element is the text of one training example
-

History