Geometries and Dipole Moments calculated by B3LYP/6-31G(d,p) for 10071 Organic Molecular Structures
* Florbela Pereira and Joao Aires-de-Sousa:
Machine Learning for the Prediction of Molecular Dipole Moments Obtained by Density Functional Theory.
J. Cheminf. (2018)
This data set is publicly available at http://dx.doi.org/10.6084/m9.figshare.5716246
dipole_moments_10071mols_sdf.tar.gz - 10071 molecules in the MDL SDFile format including the atomic coordinates of equilibrium geometries calculated by B3LYP/6-31G(d,p).
dipole_moments_10071mols.xlsx – Dipole moments calculated by B3LYP/6-31G(d,p) for 10071 neutral organic molecules.
Molecular structures were retrieved from the ZINC database , PubChem database  and the GDB-13 database  of small organic molecules containing up to 7 atoms of C, N, O, F, S, Cl and Br. The structures were standardized with ChemAxon Standardizer (JChem 15.4.6, 2015, ChemAxon Ltd., Budapest, Hungary, http://www.chemaxon.com) and OpenBabel (Open Babel Package, version 2.3.1 http://openbabel.org) for neutralization and inclusion of all hydrogen atoms. Duplicated molecules were discarded, based on canonical SMILES and InChI codes (stereoisomers were considered as duplicated structures). The final database consists of 10,071 molecules with molecular weights (MWs) in the range 40 – 251 g/mol, and containing up to 19 atoms of elements C, N, O, F, S, Cl, Br, and P. The total number of atoms in a molecule (including hydrogen atoms) range from 6 to 43.
Molecular geometries were first relaxed by the PM7 methods using the MOPAC software  and then optimized with the GAMESS program  with the B3LYP functional and the 6-31G(d,p) basis set, followed by dipole moment calculation at the same level of theory.
Each molecule is stored in its own file, ending in ".sdf". These are the optimized structures by B3LYP/6-31G(d,p).
The format is the standard MDL SDFile generated with ChemAxon Standardizer and OpenBabel.
Dipole moments are stored in the dipole_moments_10071mols.xlsx file.
Column Content of .xlsx files
1 Molecule ID (as appears in the corresponding .sdf file name)
2 Dipole moment (in Debye).
 Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG: ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 2012, 52:1757-1768.
 Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH: PubChem Substance and Compound databases. Nucleic Acids Res 2016, 44(D1):D1202-13.
 Blum LC, Reymond J-L: 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 2009, 131: 8732-8733.
 MOPAC2012, James J. P. Stewart, Stewart Computational Chemistry, Colorado Springs, CO, USA, http://OpenMOPAC.net (2012).
 Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JJ, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA: General atomic and molecular electronic structure system. J Comput Chem 1993, 14:1347-1363. GAMESS Version 1 May 2013 (R1).