OpenML R Bot Benchmark Data (final subset)

Version 2 2018-03-21, 09:20

Version 1 2018-02-13, 09:09

dataset

posted on 2018-03-21, 09:20 authored by Daniel Kühn, Philipp ProbstPhilipp Probst, Janek ThomasJanek Thomas, Bernd Bischl

This is a clean subset of the data that was created by the OpenML R Bot that executed benchmark experiments on binary classification task of the OpenML100 benchmarking suite with six R algorithms: glmnet, rpart, kknn, svm, ranger and xgboost. The hyperparameters of these algorithms were drawn randomly. In total it contains more than 2.6 million benchmark experiments and can be used by other researchers.

The subset was created by taking 500000 results of each learner (except of kknn for which only 1140 results are available).

The csv-file for each learner is a table that for each benchmark experiment has a row that contains: OpenML-Data ID, hyperparameter values, performance measures (AUC, accuracy, brier score), runtime, scimark (runtime reference of the machine), and some meta features of the dataset.

OpenMLRandomBotResults.RData (format for R) contains all data in seperate tables for the results, the hyperparameters, the meta features, the runtime, the scimark results and reference results.