SeleX-CS: A New Consensus Scoring Algorithm for Hit Discovery and Lead Optimization

Identifying active compounds (hits) that bind to biological targets of pharmaceutical relevance is the cornerstone of drug design efforts. Structure based virtual screening, namely, the <i>in silico</i> evaluation of binding energies and geometries between a protein and its putative ligands, has emerged over the past few years as a promising approach in this field. The success of the method relies on the availability of reliable 3-dimensional (3D) structures of the target protein and its candidate ligands (the screening library), a reliable docking method that can fit the different ligands into the protein’s binding site, and an accurate scoring function that can rank the resulting binding modes in accord with their binding affinities. This last requirement is arguably the most difficult to meet due to the complexity of the binding process. A potential solution to this so-called scoring problem is the usage of multiple scoring functions in an approach known as consensus scoring. Several consensus scoring methods were suggested in the literature and have generally demonstrated an improved ranking of screening libraries relative to individual scoring functions. Nevertheless, current consensus scoring strategies suffer from several shortcomings, in particular, strong dependence on the initial parameters and an incomplete treatment of inactive compounds. In this work we present a new consensus scoring algorithm (SeleX-Consensus Scoring abbreviated to SeleX-CS) specifically designed to address these limitations: (i) A subset of the initial set of the scoring functions is allowed to form the consensus score, and this subset is optimized via a Monte Carlo/Simulated Annealing procedure. (ii) Rank redundancy between the members of the screening library is removed. (iii) The method explicitly considers the presence of inactive compounds. The new algorithm was applied to the ranking of screening libraries targeting two G-protein coupled receptors (GPCR). Excellent enrichment factors were obtained in both cases: For the cannabinoid receptor 1 (CB1), SeleX-CS outperformed the best single score and afforded an enrichment factor of 41 at 1% of the screening library compared with the best single score value of 15 (GOLD_Fitness). For the chemokine receptor type 2 (CCR2) SeleX-CS afforded an enrichment factor of 72 (again at 1% of the screening library) once more outperforming any single score (enrichment factor of 20 by G_SCORE). Moreover, SeleX-CS demonstrated success rates of 67% (CCR2) and 73% (CB1) when applied to ranking an external test set. In both cases, the new algorithm also afforded good derichment of inactive compounds (i.e., the ability to push inactive compounds to the bottom of the ranked library). The method was then extended to rank a lead optimization series targeting the Kv4.3 potassium ion channel, resulting in a Spearman’s correlation coefficient, ρ = 0.63 (<i>n</i> = 40), between the SeleX-CS-based rank and the actual pKi values. These results suggest that SeleX-CS is a powerful method for ranking screening libraries in the lead discovery phase and also merits consideration as a lead optimization tool.