figshare
Browse
1/1
2 files

Data for Lingo3DMol

Version 3 2023-11-27, 11:10
Version 2 2023-11-25, 01:52
Version 1 2023-11-13, 04:58
dataset
posted on 2023-11-27, 11:10 authored by Wei Feng, Lvwei Wang, Zaiyun Lin, Yanhao Zhu, Han Wang, Jianqiang Dong, Rong Bai, Huting Wang, Jielong Zhou, Wei Peng, Bo Huang, Wenbiao Zhou

"DUDE_pocket.tar.gz" contains the pocket structures of the 101 DUDE targets used for model evaluation.

"lingo3dmol_confs.tar.gz" includes molecules generated by Lingo3DMol for the 101 DUD-E targets. It includes 3D conformations directly generated by the model without any force field-based refinement.

"pocket2mol_confs.tar.gz" consists of molecules generated by Pocket2Mol for the 101 DUD-E targets. It includes 3D conformations directly generated by the model without any force field-based refinement.

"targetdiff_confs.tar.gz" contains molecules generated by TargitDiff for the 101 DUD-E targets. It includes 3D conformations directly generated by the model without any force field-based refinement.

"random_confs.tar.gz" contains random molecules for the 101 DUD-E targets. It includes 3D conformations obtained through docking.

The molecule files in "*_confs.tar.gz" files are named using the format {pdb_id}-{mol_id}.

Additional information about the generated molecules, including mol_id, SMILES, and metric scores involved in the evaluation, can be found in the "*_moleculars_metric_score_with_dude.csv" files.

"Training_data.tar.gz" contains all the complex structures used for model fine-tuning.

"Training_data_homology_with_DUDE.csv" contains information about the PDB IDs in the training set and their maximum sequence identity with the DUD-E targets used in the evaluation.

"Pretraining_molecules_SMILES" is a dataset that contains a specific subset of data used for pretraining. This dataset consists of 1.4 million publicly available molecules that were utilized during the pretraining phase.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC