Data for Lingo3DMol
"DUDE_pocket.tar.gz" contains the pocket structures of the 101 DUDE targets used for model evaluation.
"lingo3dmol_confs.tar.gz" includes molecules generated by Lingo3DMol for the 101 DUD-E targets. It includes 3D conformations directly generated by the model without any force field-based refinement.
"pocket2mol_confs.tar.gz" consists of molecules generated by Pocket2Mol for the 101 DUD-E targets. It includes 3D conformations directly generated by the model without any force field-based refinement.
"targetdiff_confs.tar.gz" contains molecules generated by TargitDiff for the 101 DUD-E targets. It includes 3D conformations directly generated by the model without any force field-based refinement.
"random_confs.tar.gz" contains random molecules for the 101 DUD-E targets. It includes 3D conformations obtained through docking.
The molecule files in "*_confs.tar.gz" files are named using the format {pdb_id}-{mol_id}.
Additional information about the generated molecules, including mol_id, SMILES, and metric scores involved in the evaluation, can be found in the "*_moleculars_metric_score_with_dude.csv" files.
"Training_data.tar.gz" contains all the complex structures used for model fine-tuning.
"Training_data_homology_with_DUDE.csv" contains information about the PDB IDs in the training set and their maximum sequence identity with the DUD-E targets used in the evaluation.
"Pretraining_molecules_SMILES" is a dataset that contains a specific subset of data used for pretraining. This dataset consists of 1.4 million publicly available molecules that were utilized during the pretraining phase.