Table_1_Monitoring of Nitrification in Chloraminated Drinking Water Distribution Systems With Microbiome Bioindicators Using Supervised Machine Learning.PDF
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Many drinking water utilities in the United States using chloramine as disinfectant treatment in their drinking water distribution systems (DWDS) have experienced nitrification episodes, which detrimentally impact the water quality. Identification of potential predictors of nitrification in DWDS may be used to optimize current nitrification monitoring plans and ultimately helps to safeguard drinking water and public health. In this study, we explored the water microbiome from a chloraminated DWDS simulator operated through successive operational schemes of stable and nitrification events and utilized the 16S rRNA gene dataset to generate high-resolution taxonomic profiles for bioindicator discovery. Analysis of the microbiome revealed both an enrichment and depletion of various bacterial populations associated with nitrification. A supervised machine learning approach (naïve Bayes classifier) trained with bioindicator profiles (membership and structure) were used to classify water samples. Performance of each model was examined using the area under the curve (AUC) from the receiver-operating characteristic (ROC) and precision-recall (PR) curves. The ROC- and PR-AUC gradually increased to 0.778 and 0.775 when genus-level membership (i.e., presence and absence) was used in the model and increased significantly using structure (i.e., distribution) dataset (AUCs = 1.000, p < 0.01). Community structure significantly improved the predictive ability of the model beyond that of membership only regardless of the type of data (sequence- or taxonomy-based model) we used to represent the microbiome. In comparison, an ATP-based model (bulk biomass) generated a lower AUCs of 0.477 and 0.553 (ROC and PR, respectively), which is equivalent to a random classification. A combination of eight bioindicators was able to correctly classify 85% of instances (nitrification or stable events) with an AUC of 0.825 (sensitivity: 0.729, specificity: 0.894) on a full-scale DWDS test set. Abiotic-based model using total Chlorine/NH2Cl and NH3 generated AUCs of 0.740 and 0.861 (ROC and PR, respectively), corresponding to a sensitivity of 0.250 and a specificity of 0.957. The AUCs increased to > 0.946 with the addition of NO2– concentration, which is indicative of nitrification in the DWDS. This research provides evidence of the feasibility of using bioindicators to predict operational failures in the system (e.g., nitrification).
Read the peer-reviewed publication