Dataset for: An EM-type approach for classification of bivariate MALDI-MS data and identification of high fertility markers

Dairy cows are responsible for a fair amount of gas emissions in the atmosphere (mainly methane, ammonia and carbon dioxide) as well as waste outputs. Therefore, identifying high fertility breeding cows and increasing fertility rates can diminish pollution as well as help minimize the effect of global warming and improve the environmental impact of the farming system. As a step to achieve this goal, changes in the lipid composition of the bovine uterus exposed to greater (LF-LCL group) or lower (SF-SCL group) concentrations of progesterone during post-ovulation were investigated by matrix assisted laser desorption ionization-mass spectrometry (MALDI-MS). Two measurements were made for each cow and after preprocessing the data, the measurements available to analysis consist of relative intensities at significant 76 mass-to-charge ratio ({\it m/z}) values identifying specific ions in the spectra. Due to the small sample size, 7 cows in LF-LCL group and 10 cows in SF-SCL group, the usual methods could not discriminate between groups. A model-based approach was therefore proposed and due to the discrete nature of the data, a truncated mixture of bivariate beta distributions was fitted to the data using EM algorithm. However, unlike the usual approach for mixture density estimation problems, to each 76 {\it m/z} value we assign an unobserved label shared by all cows in the same group. The role of these labels is similar to frailty effect in survival models, in which all cows in a given group would share some random effect due to group effect. These labels will be used to identify {\it m/z} values which could potentially account for different fertility rates.