regal_data.zip (59.62 MB)

regal_data.zip

dataset

posted on 2022-01-29, 00:25 authored by David KartchnerDavid Kartchner

Preprocessed datasets used in REGAL: Rule-Guided Active Learning for Semi-Automated Weak Supervision. Each dataset is a dictionary containing 'train', 'valid', and 'test' keys corresponding to the respective train, validation, and test splits. The main dictionary also contains two other keys:

- 'class_names': A dictionary mapping each class to its name

- 'rule_keywords': A dictionary mapping each class to a list of seed rules for that class

Each split contains a dictionary the following keys:

- 'text': A list of strings, each of which is a training example

- 'labels': A torch.LongTensor containing the class label of each example
- 'text': List of strings. Each element is the text of one training example

History

Usage metrics

Keywords

weak supervision natural language processing text classification machine learning NLP weakly supervised learning Natural Language Processing Computer-Human Interaction

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM