posted on 2021-02-19, 21:59authored bySimon YeSimon Ye, Anne Piantadosi, Shibani Mukerji
Supplementary Figure 1. Computational processing workflow Sequencing reads first underwent universal quality control, human depletion (via stringent criteria of >20% kmers within the read classifying specifically to human taxid), and de-duplication (A). These reads were assembled into contigs, and >600bp contigs were BLASTed to recover strong reference matches for long contigs (B). These were used as a ”negative controls” depletion database, after which remaining reads were classified via comprehensive Krakenuniq and Kaiju databases. Viral hits were validated using BLASTn.
Supplementary Figure 2. Correlations between length of hospitalization and diagnostic testing ordered, stratified by clinical diagnosis. Box plots show median length of stay (LOS; horizontal line) and whiskers indicate 1st and 3rd quartile. Dots indicate a LOS greater than 1.5*Interquartile range. There were no significant differences in LOS between clinical diagnosis groups (A). Scatter plot showing the number of CSF tests versus LOS (B) and number of PCR tests versus LOS (C). Colors indicate clinical diagnosis category. LOS moderately correlated with the number of total ID tests ordered (Spearman’s 𝜌 = 0·65, p<0·01; Figure 2B), with number of tests ordered from CSF only (Spearman’s 𝜌 = 0·46, p<0·01; Supplementary Figure 2B)
Supplementary Figure 3. Sequencing metrics for various stages of the computational pipeline The total number of reads in each sequencing library from raw de-multiplexed reads through the stages of quality control/trimming, human depletion, deduplication, and negative depletion (A). The distribution of the percentage of reads retained after each incremental step for all samples (C). Comparison of human abundance for each subject between routine, hybrid capture (HC), methylated DNA depletion (MDD), and hybrid capture plus methylated DNA depletion (HC+MDD) on DNA samples (B). Comparison for RNA (D).
Supplementary Figure 4: Unfiltered metagenomic classifications including contaminants Heatmap shows viral taxa identified in each sample type. Compared to Figure 3, this Figure shows all classified taxa without manually screening out contaminants. Rows are viral taxa, and columns are sample types, some with enhanced sequencing methods (HC and/or MDD). Only classifications with over 100 unique kmers, and at least 1 BLAST confirmed read are shown. Rows are grouped by whether they are RNA viruses vs DNA viruses (top vs bottom section). Color intensity corresponds to the RPM of the taxa. Red boxes correspond to detection in RNA libraries while blue boxes correspond to detection in DNA libraries. Stars represent the clinical diagnosis. Gray shaded columns represent samples that did not undergo DNA or RNA sequencing. The yellow bars indicate nucleated cell count in the CSF for each subject. The four groupings of columns from top left to bottom right correspond to infections diagnosed with a positive PCR, infections diagnosed by non-molecular techniques, subjects with unknown etiology, and negative controls including water.
Supplementary Figure 5. Results of HSV-2 and HIV-1 specific PCR Amplification curve analysis demonstrated that CSF from subject M029 (blue, positive control) amplified in three out of three replicates (mean Ct = 23.8), consistent with positive mNGS results for HSV-2. CSF from subject M132 (gray) amplified in only one out of three replicates (Ct = 39.8); correspondingly, no HSV-2 reads were detected by mNGS. There was no amplification from the negative control (red) (A). Melting curve analysis demonstrated consistent curves across all positive wells(B). Amplification curve analysis demonstrated that CSF from subject M061 (purple, positive control) amplified in three out of three replicates (mean Ct = 25.2), consistent with positive mNGS results for HIV-1. CSF from subjects M051 (blue) and M010 (gray) amplified at high Ct values, similar to the negative control (red) (C). Melting curve analysis demonstrated that only one replicate from M051 melted in a pattern consistent with the positive controls; the other positive wells melted at lower temperatures, suggestive of nonspecific amplification or primer-dimerization (D). Gel electrophoresis results from PCR products demonstrate a band of the expected size for subject M061 (positive control) and a faint band of the expected size for subject M051, but not M010 or the negative control (E).
Supplementary Figure 6. Enterovirus phylogeny Enterovirus genomes assembled from subjects in this study (red) were aligned with representative reference sequences for each subtype within the enterovirus B species (blue). This allowed classification of viral subtypes as Coxsackie B4 for M007 and echovirus 30 for M072, M108, and M126. Interestingly, viruses from M108 and M126, who were admitted approximately one month apart from one another and had no known epidemiological links, differed by only 0.6% (42 nucleotides), suggesting a common local circulating strain. Abbreviations: EV = enterovirus; Echo = echovirus; Cox = coxsackievirus.
Supplementary Figure 7. Clinical course, laboratory findings and mNGS for subjects diagnosed with Varicella Zoster Virus Eight subjects were diagnosed clinically with varicella zoster virus (VZV)-related neurological diseases. Cerebrospinal fluid (CSF) metagenomic next generation sequencing (mNGS) was positive in three cases of VZV meningoencephalitis (red bar), and negative for the other five subjects. Positive cases had acute onset of symptoms and no prior antiviral treatment or minimal exposure (treatment and symptoms bars). CSF white blood cells at time of clinical VZV testing and mNGS VZV results are shown.
Supplementary Figure 8: Detection of atypical bacteria Heatmap shows the recovery of sequencing reads for a subset of atypical bacteria. Only samples classified with over 100 unique kmers, and at least 1 BLAST confirmed read are shown. Color intensity corresponds to RPM in DNA samples.
Funding
Immune Activation, Cerebral Metabolic Activity and Depression in Treated HIV-Infection