figshare
Browse
Levatic_et_al_JMC_2013_Pgp_substrate_classifier_dataset.csv (68.33 kB)

Accurate Models for P-gp Drug Recognition Induced from a Cancer Cell Line Cytotoxicity Screen - QSAR dataset

Download (0 kB)
dataset
posted on 2014-12-26, 16:49 authored by Jurica Levatić, Fran SupekFran Supek

P-glycoprotein (P-gp, MDR1) is a promiscuous drug efflux pump of substantial pharmacological importance. Taking advantage of large-scale cytotoxicity screening data involving 60 cancer cell lines (NCI-60), we correlated the differential biological activities of ∼13 000 compounds against cellular P-gp levels.

We created a large set of 934 high-confidence P-gp substrates or nonsubstrates by enforcing agreement with an orthogonal criterion involving P-gp overexpressing ADR-RES cells. A support vector machine (SVM) was 86.7% accurate in discriminating P-gp substrates on independent test data, exceeding previous models.

Two molecular features had an overarching influence: nearly all P-gp substrates were large (>35 atoms including H) and dense (specific volume of <7.3 Å3/atom) molecules. Seven other descriptors and 24 molecular fragments (“effluxophores”) were found enriched in the (non)substrates and incorporated into interpretable rule-based models.

Biological experiments on an independent P-gp overexpressing cell line, the vincristine-resistant VK2, allowed us to reclassify six compounds previously annotated as substrates, validating our method’s predictive ability. Models are freely available at pgp.biozyne.com.

 

Legend: the column header "NSC" denotes a NCI compound ID number. Class "-1" is a P-gp nonsubstrate, and "1" is a substrate, according to the criteria described in our publication in J Med Chem, 2013, 56 (14).

DOI: 10.1021/jm400328s

History