figshare
Browse

MD&A+ Audit Opinion dataset for Bankruptcy Prediction

Download (4.3 GB) This item is shared privately
dataset
modified on 2024-08-02, 07:04

The ECL dataset includes the MD&A section and a binary label indicating whether the company filed for bankruptcy in the year following the report’s filing date. In our work, we expanded the ECL dataset with the Audit Opinion (AO) text. Specifically, we utilized an NLP tool (EDGAR-CORPUS) to extract the AO text where possible (usually found in item 8 of the financial report) and align it with the existing filings in the ECL. Our ECL + AO dataset consists of 50345 training samples, with a positive ratio of 0.84%, and 18275 test samples with a positive ratio of 0.67%.


The companies in our dataset span the period from 1995 to 2015 for the training set and from 2016 to 2021 for the test set.