figshare
Browse

Natural Language Understanding Dataset for DoD Cybersecurity Policies (CSIAC-DoDIN V1.0)

Version 2 2023-11-21, 21:16
Version 1 2023-11-21, 20:58
dataset
posted on 2023-11-21, 21:16 authored by Ernesto Quevedo Caballero, Pablo RivasPablo Rivas, Ana Paula Arguelles, Alejandro Rodriguez, Jorge Yero, Dan Pienta, Tomas Cerny

The CSIAC-DoDIN (V1.0) dataset collects cybersecurity-related policies and issuances developed by the DoD Deputy CIO for Cybersecurity. The dataset is based on a knowledge base that clusters and classifies these policies and provides an organizational structure. The dataset includes annotated documents with policies, responsibilities, procedures, classification, purpose, scope, and applicability. The dataset also includes cluster and subcluster classification, type classification, and text entailment. The dataset is available for research and experimentation, and baseline performances using transformer language models have been provided. The limitations of the dataset include its focus on DoD cybersecurity policies, the English language, and the provided tasks. The dataset can serve as a benchmark and basis for future cybersecurity policy datasets and applications. Still, caution should be exercised regarding potential risks and biases associated with transformer language models.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC