figshare
Browse
1/1
2 files

A Tailored Multivariate Mixture Model for Detecting Proteins of Concordant Change Among Virulent Strains of Clostridium Perfringens

dataset
posted on 2017-08-04, 19:23 authored by Kun Chen, Neha Mishra, Joan Smyth, Haim Bar, Elizabeth Schifano, Lynn Kuo, Ming-Hui Chen

Necrotic enteritis (NE) is a serious disease of poultry caused by the bacterium C. perfringens. To identify proteins of C. perfringens that confer virulence with respect to NE, the protein secretions of four NE disease-producing strains and one baseline nondisease-producing strain of C. perfringens were examined. The problem then becomes a clustering task, for the identification of two extreme groups of proteins that were produced at either concordantly higher or concordantly lower levels across all four disease-producing strains compared to the baseline, when most of the proteins do not exhibit significant change across all strains. However, the existence of some nuisance proteins of discordant change may severely distort any biologically meaningful cluster pattern. We develop a tailored multivariate clustering approach to robustly identify the proteins of concordant change. Using a three-component normal mixture model as the skeleton, our approach incorporates several constraints to account for biological expectations and data characteristics. More importantly, we adopt a sparse mean-shift parameterization in the reference distribution, coupled with a regularized estimation approach, to flexibly accommodate proteins of discordant change. We explore the connections and differences between our approach and other robust clustering methods, and resolve the issue of unbounded likelihood under an eigenvalue-ratio condition. Simulation studies demonstrate the superior performance of our method compared with a number of alternative approaches. Our protein analysis along with further biological investigations may shed light on the discovery of the complete set of virulence factors in NE. Supplementary materials for this article are available online.

Funding

Kun Chen’s research was partially supported by the National Science Foundation grant DMS-1613295 and the National Institutes of Health (NIH) grant #U01HL114494. Haim Bar’s research was partially supported by the National Science Foundation grant DMS-1612625. M.-H. Chen’s research was partially supported by NIH grants #GM70335 and #P01CA142538. The authors gratefully acknowledge funding from the U. S. Poultry & Egg Association which enabled the generation of the proteomics data used in this study (Project #F052).

History