Materials Project Time Split Data

dataset

posted on 2022-06-04, 20:22 authored by Sterling G. BairdSterling G. Baird, Taylor SparksTaylor Sparks

Full and dummy snapshots (2022-06-04) of data for _{mp-time-split} encoded via _{matminer convenience functions} grabbed via the _{new Materials Project API}. The dataset is restricted to experimentally verified compounds with no more than 52 sites. No other filtering criteria were applied. The snapshots were developed for _{sparks-baird/mp-time-split} as a benchmark dataset for materials generative modeling. Compressed version of the files (.gz) are also available.

dtypes

```python

from pprint import pprint

from matminer.utils.io import load_dataframe_from_json

filepath = "insert/path/to/file/here.json"

expt_df = load_dataframe_from_json(filepath)

pprint(expt_df.iloc[0].apply(type).to_dict())

```

{'discovery': , 'energy_above_hull': , 'formation_energy_per_atom': , 'material_id': , 'references': , 'structure': , 'theoretical': , 'year': }

index/mpids

(just the number for the index). Note that `material_id`-s that begin with "mvc-" have the "mvc" dropped and the hyphen (minus sign) is left to distinguish between "mp-" and "mvc-" types while still allowing for sorting. E.g. `mvc-001` -> -1.

{146: MPID(mp-146), 925: MPID(mp-925), 1282: MPID(mp-1282), 1335: MPID(mp-1335), 12778: MPID(mp-12778), 2540: MPID(mp-2540), 316: MPID(mp-316), 1395: MPID(mp-1395), 2678: MPID(mp-2678), 1281: MPID(mp-1281), 1251: MPID(mp-1251)}