softwareposted on 2020-06-03, 06:27 authored by Anup Kumar HalderAnup Kumar Halder, Soumyendu BandyopadhyaySoumyendu Bandyopadhyay, Piyali ChatterjeePiyali Chatterjee, Mita NasipuriMita Nasipuri, Dariusz PlewczynskiDariusz Plewczynski, Subhadip BasuSubhadip Basu
Over the years, several methods have been proposed for the computational PPI prediction with different performance evaluation strategies. While attempting to benchmark performance scores, most of these methods often suffer with ill-treated cross-validation strategies, adhoc selection of positive/negative samples etc. To address these issues, in our proposed multi-level feature based PPI prediction approach JUPPI, using sequence, domain and GO information as features, a refined evaluation strategy has been introduced. During the evaluation process, we first extract high quality negative data using three-stage filtering, and then introduce a pair-input based cross validation strategy with three difficulty levels for test-set predictions. Our proposed evaluation strategy reduces the component-level overlapping issue in test sets. Performance of JUPPI is compared with those of the state-of-the-art approaches in this domain and tested on six independent PPI datasets. In almost all the datasets, JUPPI outperforms the state-of-the-art not only at human proteome level for PPI prediction, but also for prediction of interactors for intrinsic disordered human proteins.