Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.
The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).
We build MNIST4OD in the following way:
To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits.
For implementation simplicity we then flatten the images (28 X 28) into vectors.
Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.
The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9).
See the following numbers for a complete list of the statistics of each datasets ( Name | Instances | Dimensions | Number of Outliers in % ):