Improving deep representation learning for crystal structures by learning and hybridizing human-designed descriptors

Gong, Sheng

doi:10.6084/m9.figshare.19654224.v5

Improving deep representation learning for crystal structures by learning and hybridizing human-designed descriptors

journal contribution

posted on 2023-08-31, 20:08 authored by Sheng GongSheng Gong

Update: on 07/11/2023, we upload an extended version including matformer and MEGNet as two more models studied in this work.

--------------------------------------------------------------------------------------------------------------------------------------------------------

Update: on 12/11/2022, we upload a corrected version of the datasets and models. In the previous one, the models and datasets for internal energy, Cv, and poly_electronic from dealignn were wrong.

______________________________________________

This folder contains all the datasets and trained models for the paper entitled: "Examining graph neural networks for crystal structures: limitations and opportunities for capturing periodicity". https://arxiv.org/abs/2208.05039.

Please note that, since the paper is more about insights of exisiting GNN methods than a completely new code, we provide the revised versions of CGCNN and ALIGNN in this repository. Anyone interested in the revised codes should download them from here, and run each in the same environemt as CGCNN or ALIGNN.

CGCNN: https://github.com/txie-93/cgcnn

ALIGNN: https://github.com/usnistgov/alignn

decgcnn: contains the revised CGCNN codes for the de-CGCNN.

Please specify the feature list at the begnning of main.py and predict.py, and input the path of the descriptors file into the two scripts. The usage of "main.py" and "predict.py" here are the same as that of orignal CGCNN.

dealignn: contains the revised ALIGNN codes for the de-ALIGNN.

Please specify the feature list and path of the descriptors file at the begnning of fine-tuning.py

Also, please put the script "degraph.py" at the path "/anaconda3/envs/ALIGNN/lib/python3.8/site-packages/jarvis/core" or similar path in your environment.

datasets: contains all the datasets and test results.

descriptors_mp: all descriptors of all structures in the MP database

learning_descriptors: dataset for learning descriptors reported in Figure 2.

mp_selected_prop.csv: dataset of all properties of all structures in the MP database

all_kappa: all kappa from TEDesignLab database

test_(model_name)_(prop_name/descriptor_name).csv: test set and prediction result of (prop_name/descriptor_name) from (model_name)

1d: contains the datasets of 1d chains in Figure 3

sample_short/long: structures of the short/long chains

test_results_(default/nconv/neigh)_(short/long).csv: test results for default/more conv. layers/more neighbors CGCNN for the short/long chains.

trained_models: contains all the trained models for property predictions in Figure 4.

(model_name)_(prop_name).pt: trained model of (prop_name) from (model_name)

Please note that all properties from MP in test_XX.csv and from XX.pt are normalized. The normalizer can be found at datasets/normalizer.npy. Normalization is conducted by (property - median of property) / (90 percentile - 10 percentile).

Improving deep representation learning for crystal structures by learning and hybridizing human-designed descriptors

Funding

Toyota Research Institute

History

Usage metrics

Categories

Keywords

Licence

Exports