sorry, we can't preview this file
...but you can still download Benchmark_AFLOW_Data_Sets_for_Machine_Learning.zip
Benchmark AFLOW Data Sets for Machine Learning
datasetposted on 08.03.2020 by Conrad Clement, Steven Kauwe, Taylor Sparks
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Materials informatics is increasingly finding ways to exploit machine learning algorithms. Techniques such as decision trees, ensemble methods, support vector machines, and a variety of neural network architectures are used to predict likely material characteristics and property values. Supplemented with laboratory synthesis, applications of machine learning to compound discovery and characterization represent one of the most promising research directions in materials informatics. A shortcoming of this trend, in its current form, is a lack of standardized materials data sets on which to train, validate, and test model effectiveness. Applied machine learning research depends on benchmark data to make sense of its results. Fixed, predetermined data sets allow for rigorous model assessment and comparison. Machine learning publications that don't refer to benchmarks are often hard to contextualize and reproduce. In this data descriptor article, we present a collection of data sets of different material properties taken from the AFLOW database. We describe them, the procedures that generated them, and their use as potential benchmarks. We provide a compressed ZIP file containing the data sets, and a GitHub repository of associated Python code. Finally, we discuss opportunities for future work incorporating the data sets and creating similar benchmark collections.