figshare
Browse
Lingo3DMol-main.zip (822.4 kB)

Code for Lingo3DMol

Download (822.4 kB)
Version 4 2023-11-27, 11:10
Version 3 2023-11-25, 01:52
Version 2 2023-11-25, 01:43
Version 1 2023-11-25, 01:40
software
posted on 2023-11-27, 11:10 authored by Wei Feng, Lvwei Wang, Zaiyun Lin, Yanhao Zhu, Han Wang, Jianqiang Dong, Rong Bai, Huting Wang, Jielong Zhou, Wei Peng, Bo Huang, Wenbiao Zhou

Lingo3DMol is a pocket-based 3D molecule generation method that combines the ability of language model with the ability to generate 3D coordinates and geometric deep learning to produce high-quality molecules.

System Requirements
Hardware requirements
A standard computer with GPU with at least 5GB graphic memory.

OS Requirements
This package is supported for macOS and Linux. The package has been tested on the following systems: Linux: Ubuntu 16.04 macOS: Ventura (13.0)

Install via conda yaml file
Typical install time: 40 minutes

conda create -n lingo3dmol python=3.8
conda activate lingo3dmol
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install scipy==1.7.3 pandas==1.5.1 numpy==1.20.3 rdkit==2022.09.1

Datasets
We provide DUD-E pocket files for sampling under \dataset folder. Please Unzipdude_pocket.zip.

Model Checkpoints
Download and move these checkpoint to the \checkpoint folder.

https://stonewise-lingo3dmol-public.s3.cn-northwest-1.amazonaws.com.cn/contact.pkl md5sum:6a9313726141fcf9201b9b9470dc2a7e

https://stonewise-lingo3dmol-public.s3.cn-northwest-1.amazonaws.com.cn/gen_mol.pkl md5sum:452bd401667184ae43c9818e5bdb133b

Sampling
To inference using the model on DUD-E set, run this code:

sh run.sh

Expected output
The output should be generated molecules in mol format.

Main Parameters Help
--coc_dis Define the collision distance with the pocket (A)
--nci_thrs Define the threshold of the nci prediction model
--topk Select top k nci
--max_run_hours Define the max run hours
--gennums Define the minimum generation numbers
--USE_THRESHOLD Define sampled only categories larger than expectation
--isMultiSample Define sampled use multinomial
--isGuideSample frag-based sampled use this definition
--OnceMolGen no frag-based sample
--gen_frag_set Define the number of collections generated, from which the top twenty percent is selected (batch size)
--prod_time "go_factory()" func repeat time
--tempture "proj1" used Tsoftmax
--frag_len_add Defines the minimum length of each generated fragment/ bigger fragments faster generated
License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC