Erlotinib (left panel) and sorafenib (right panel) for log(pval_clinical) of the Pearson correlation coefficient for each training model’s prediction of the clinical response(x-axis) versus the log(pval_IC<sub>50</sub>) for the correlation coefficient of each model’s prediction of IC<sub>50</sub> versus the mean of each gene’s expression in the training model (y-axis). David G. Covell 10.1371/journal.pone.0181991.g001 https://plos.figshare.com/articles/figure/Erlotinib_left_panel_and_sorafenib_right_panel_for_log_pval_clinical_of_the_Pearson_correlation_coefficient_for_each_training_model_s_prediction_of_the_clinical_response_x-axis_versus_the_log_pval_IC_sub_50_sub_for_the_correlation_coefficient_of_each_mode/5286796 <p>These results represent 20 million random picks of 30 tumor cells and 300 genes from the CGP database of IC<sub>50</sub> values for erlotinib and sorafenib. For erlotinib, only 53 simulations achieved the arbitrary threshold requirements of log(pval_IC<sub>50</sub>) < -11, log(pval_clinical) < -6, ppv<sub>clinical</sub> < 0.45 and npv<sub>clinical</sub><0.45 and. These models appear as the red circles in the left panel. For sorafenib only 48 simulations achieved the threshold requirements of log(pval_IC<sub>50</sub>) < -8.5, log(pval_clinical) < -8.5, ppv<sub>clinical</sub> < 0.65 and npv<sub>clinical</sub> < 0.65). Ppv and npv calculations require selection of a boundary between good and poor responses. These calculations use the mean of the predictive values as this boundary. Evident from this figure is the occurrence of training models with excellent correlative statistics that fail to meet the thresholds for ppv and npv.</p> 2017-08-08 23:44:55 data mining approach analysis quantifying pathway fitness drug sensitivity data erlotinib baseline gene expressions pathway-gene biomarkers treatment prediction errors response novel data mining procedure ridge regression modeling sorafenib