figshare
Browse

NOMAD Chemical Formulas and Calculation IDs

Version 3 2022-03-20, 03:28
Version 2 2022-03-07, 22:46
Version 1 2022-03-07, 21:32
dataset
posted on 2022-03-20, 03:28 authored by Sterling G. BairdSterling G. Baird
all-formula.csv contains two columns: calc_id (Calculation ID) and formula (Chemical Formula). These were restricted to VASP DFT calculations, and do not include noble gases nor radioactive elements. Some calculation IDs have missing chemical formulas. The list has also been filtered down to unique (non-reduced) chemical formulas in unique-formula.csv along with the calc_id-s for each unique formula. No structural information is included directly in this data. REALLY, what you're probably interested most in is unique-reduced-formula.csv. because it is the most curated and is directly usable with e.g. pymatgen. What this contains is three columns: calc_id, reduced_formula, and factor which correspond to the Calculation ID, the reduced formula (e.g. Si2O4 --> SiO2), and the factor (e.g. for Si2O4 --> SiO2 the factor is 2). The formulas were first parsed via pymatgen.core.Composition class. Going from all-formula.csv to unique-formula.csv to unique-reduced-formula.csv gives 11680557 --> 764431 --> 695612 rows.

Finally, bad-formula.csv just contains the formulas that were skipped during processing (i.e. couldn't be processed with pymatgen.core.Composition for various reasons, 15 in total).

The data was downloaded on 2022-03-07. See the links below (esp. nomad-examples GitHub repository) for details on the data download and filtering process.

Funding

CAREER: SusChEM: Data Mining to Reduce the Risk in Discovering New Sustainable Thermoelectric Materials

Directorate for Mathematical & Physical Sciences

Find out more...

History