Developing Deep Learning-based Large-scale Organic Reaction Classification Model via Sigma-profiles
The "Train_AE.zip" contains the scripts for training an auto-encoder.
The "Train_DL_Models.zip" contains the scripts for training deep learning-based models.
The "sigma_profiles_dict.npy" contains the sigma-profiles of millions of different molecules. The SMILES of a molecule is used as key to query the corresponding sigma-profiles.
The "sorted_agent_dict.npy" contains the statistical results of USPTO_TPL dataset concerning the frequency of occurrence of agents. The agents are shown in an descending manner.
The "sorted_agent_combination_dict.npy" contains the statistical results of USPTO_TPL dataset concerning the frequency of occurrence of agent combinations. The combinations are shown in an descending manner.
The "USPTO_TPL_own_version.xlsx" contains the reactions that used for training/validation/testing.