Natural Language Understanding Dataset for DoD Cybersecurity Policies (CSIAC-DoDIN V1.0)
The CSIAC-DoDIN (V1.0) dataset collects cybersecurity-related policies and issuances developed by the DoD Deputy CIO for Cybersecurity. The dataset is based on a knowledge base that clusters and classifies these policies and provides an organizational structure. The dataset includes annotated documents with policies, responsibilities, procedures, classification, purpose, scope, and applicability. The dataset also includes cluster and subcluster classification, type classification, and text entailment. The dataset is available for research and experimentation, and baseline performances using transformer language models have been provided. The limitations of the dataset include its focus on DoD cybersecurity policies, the English language, and the provided tasks. The dataset can serve as a benchmark and basis for future cybersecurity policy datasets and applications. Still, caution should be exercised regarding potential risks and biases associated with transformer language models.