Materials Project Time Split Data
Full and dummy snapshots (2022-06-04) of data for mp-time-split encoded via matminer convenience functions grabbed via the new Materials Project API. The dataset is restricted to experimentally verified compounds with no more than 52 sites. No other filtering criteria were applied. The snapshots were developed for sparks-baird/mp-time-split as a benchmark dataset for materials generative modeling. Compressed version of the files (.gz) are also available.
dtypes
```python
from pprint import pprint
from matminer.utils.io import load_dataframe_from_json
filepath = "insert/path/to/file/here.json"
expt_df = load_dataframe_from_json(filepath)
pprint(expt_df.iloc[0].apply(type).to_dict())
```
{'discovery': , 'energy_above_hull': , 'formation_energy_per_atom': , 'material_id': , 'references': , 'structure': , 'theoretical': , 'year': }
index/mpids
(just the number for the index). Note that `material_id`-s that begin with "mvc-" have the "mvc" dropped and the hyphen (minus sign) is left to distinguish between "mp-" and "mvc-" types while still allowing for sorting. E.g. `mvc-001` -> -1.
{146: MPID(mp-146), 925: MPID(mp-925), 1282: MPID(mp-1282), 1335: MPID(mp-1335), 12778: MPID(mp-12778), 2540: MPID(mp-2540), 316: MPID(mp-316), 1395: MPID(mp-1395), 2678: MPID(mp-2678), 1281: MPID(mp-1281), 1251: MPID(mp-1251)}
Funding
CAREER: SusChEM: Data Mining to Reduce the Risk in Discovering New Sustainable Thermoelectric Materials
Directorate for Mathematical & Physical Sciences
Find out more...