Potential of bacteriophage proteins as recognition molecules for pathogen detection

Abstract Bacterial pathogens are leading causes of infections with high mortality worldwide having a great impact on healthcare systems and the food industry. Gold standard methods for bacterial detection mainly rely on culture-based technologies and biochemical tests which are laborious and time-consuming. Regardless of several developments in existing methods, the goal of achieving high sensitivity and specificity, as well as a low detection limit, remains unaccomplished. In past years, various biorecognition elements, such as antibodies, enzymes, aptamers, or nucleic acids, have been widely used, being crucial for the pathogens detection in different complex matrices. However, these molecules are usually associated with high detection limits, demand laborious and costly production, and usually present cross-reactivity. (Bacterio)phage-encoded proteins, especially the receptor binding proteins (RBPs) and cell-wall binding domains (CBDs) of endolysins, are responsible for the phage binding to the bacterial surface receptors in different stages of the phage lytic cycle. Due to their remarkable properties, such as high specificity, sensitivity, stability, and ability to be easily engineered, they are appointed as excellent candidates to replace conventional recognition molecules, thereby contributing to the improvement of the detection methods. Moreover, they offer several possibilities of application in a variety of detection systems, such as magnetic, optical, and electrochemical. Herein we provide a review of phage-derived bacterial binding proteins, namely the RBPs and CBDs, with the prospect to be employed as recognition elements for bacteria. Moreover, we summarize and discuss the various existing methods based on these proteins for the detection of nosocomial and foodborne pathogens.


Introduction
A large variety of bacteria is present in the environment and can be responsible for causing severe illness in humans and animals. Detection and identification of pathogenic bacteria are of great importance in diverse fields, such as food and water safety, public health, or even bioterrorism prevention. In an era of multidrug resistance, bacteria are increasingly responsible for high mortality worldwide, engendering high costs and an overload of healthcare facilities [1][2][3][4][5].
The standard methods for bacterial detection are mainly culture-based, implying some disadvantages, such as being laborious and time-consuming [6,7]. The culture-independent approaches, such as matrixassisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS), nucleic acid amplification, enzyme-linked immunosorbent assay (ELISA), as well as whole-genome sequencing (WGS), have rendered pathogen detection more reproducible and easier, some enabling the detection of non-culturable organisms and presenting the potential for automation [8][9][10][11][12]. However, these methods still need further improvements as they often require: expensive machinery, specialized personnel, knowledge of genome sequence or mass spectrometry profile, and some lengthy pre-enrichment steps. Also, the specificity can be affected by the detection of more abundant and closely related non-target species, originating false-positive or negative results [9,13,14]. Another issue that can be appointed is low sensitivity in the presence of irrelevant and interfering components since most of the CONTACT Carla M. Carvalho carla.carvalho@inl.int clinical, food, or environmental samples are complex matrices [15,16].
Overall, a diagnostic method should be sensitive, specific, and enable multiplex bacterial detection to allow an easier and scalable implementation for practical application in the food industry, water monitoring, and clinical diagnosis. Accordingly, the biorecognition molecules used in these assays are of utmost importance to meet the challenges for the development of novel diagnostic methodologies and overcome some of the mentioned limitations. These molecules can be used in the sample preparation allowing the isolation of the infectious agent(s) while permitting a reduction of the sample volume due to the exclusion of non-target components and potential inhibitors, and also contributing to the concentration of the target agent(s) [15]. Therefore, the ideal biorecognition molecule should be: inexpensive, enable the discrimination of the target pathogen from the background microflora, and provide high specificity and selectivity to the detection method [17]. The most used recognition molecules for bacterial detection are antibodies, nucleic acids, enzymes, and aptamers ( Figure 1). However, these molecules entail certain weaknesses and thus do not achieve the full requirements to be an ideal recognition element. Examples are low physical, chemical, and enzymatic resistance, high probability of reaction inhibition due to sample impurity (especially in the case of nucleic acids), cross-reactivity, and/or laborious and costly production [17][18][19][20][21][22][23].
Bacteriophages (or phages) are viruses that infect and reproduce inside a host bacterial cell, using its cellular machinery, being obligatory parasites [24,25]. Due to their high specificity, sensitivity, and stability, whole phages have been used for several years as biorecognition elements in diverse detection systems, such as molecular-based methods [26], biosensors [27,28], and lab-on-chip devices [29]. Nonetheless, they imply some disadvantages, such as: their lytic and/or enzymatic activity can render bacterial lysis and signal instability [30,31]; their correct immobilization is laborious and thus is commonly performed randomly, resulting in low sensitivity and poor bacterial capture [32]; are large, which decreases the signal sensitivity, namely in sensor systems that depend upon distance (e.g., magnetoresistive, SPR-Surface Plasmon Resonance) [33]; and Figure 1. Scheme illustrating the different recognition elements used for bacterial detection, the several approaches employed for their application, and the various systems that can be combined with these molecules for bacterial detection and identification. PCR stands for polymerase chain reaction, ELISA for enzyme-linked immunosorbent assay, and FISH for fluorescence in situ hybridization.
their purification from the host bacteria is sometimes difficult, and require specialized techniques to ensure complete removal of host components [34].
Phages are an extraordinary source of proteins of biotechnological interest, especially the phage-derived bacteria-binding proteins, i.e., the receptor binding proteins (RBPs) and cell-wall binding domains (CBDs) from endolysins ( Figure 2), which have been successfully used in a variety of detection systems [35][36][37][38][39]. In particular, these proteins present: high sensitivity and specificity, small size, high stability to extreme pH and temperature values, resistance to proteases and detergents [40][41][42], ease of recombinant overexpression [43], efficient binding to bacteria living in conditions that commonly cause degradation of other recognition molecules, such as antibodies [42], as well as the ability for detection of bacterial spores [44][45][46]. Moreover, these phage proteins can be tailored or improved to identify different target bacteria. Also, by genetic engineering, desired tags can be added to the phage protein sequence to achieve oriented surface functionalization on biosensing platforms [24,38,[47][48][49][50][51] (Figure 3). Hence, there is interest in exploiting these phage-derived binding proteins as recognition molecules since they present advantages over others (Table 1) and can overcome many of the above-mentioned issues related to the use of the whole phage particles [19,20,[52][53][54].
This paper provides an overview of the phage-binding proteins, particularly the RBPs and CBDs, and exploits their potential as recognition molecules of bacteria, and also highlights the existing technologies that rely on the use of these proteins for the detection of nosocomial and foodborne pathogens.

Receptor binding proteins-structure and application for diagnosis
The initial step in the phage infection process is the reversible binding of the phage to the bacterial cell, followed by irreversible attachment to specific receptors in the bacterial cell surface [55]. These actions are dictated by phage proteins, known as RBPs, which are typically located at the tip of the phage's tail (called tail spike proteins, TSPs) or as part of the tail fibers [24,43] ( Figure 2). . Schematic representation of the phage lytic cycle highlighting the role of the RBPs and endolysins and respective cellwall binding domains (CBDs) in the initial and final stages of the cycle, respectively. RBPs are responsible for the bacterial host recognition in the first stage of the lytic cycle whereas endolysins (driven or not by their CBD's specific recognition) are crucial for the lysis of the host bacterial cells, enabling the progeny phages to be released. Note: The endolysins from a phage infecting a Gram-positive bacteria present a modular organization with a CBD and one or more enzymatic active domains (EAD) while the endolysins of phages infecting Gram-negative bacteria are mainly globular with a single EAD, but rarely can show a modular organization.
Due to their properties, RBPs have been used as recognition molecules and combined with different techniques for the detection of bacteria ( Figure 3, Table S1).

Function and structure of RBPs
RBPs are responsible for the phage infection spectrum by specifically binding to receptors on the bacterial surface, such as proteins, polysaccharides, lipopolysaccharides (LPS), and carbohydrate moieties [56,57]. While common receptors for phages infecting Gram-negative bacteria are LPS or bacterial surface proteins (porins and transport proteins), in Gram-positive bacteria, peptidoglycan (PG), teichoic acids, or exposed polysaccharides are more frequent [57,58]. Nonetheless, not always the same RBPs and receptors are involved during reversible and irreversible binding [56][57][58][59]. One example can be observed in the infection process of phage SPP1 to Bacillus subtilis, where the host's cell wall teichoic acid (WTA) is the receptor for reversible binding  and irreversible binding occurs when the phage binds to the cell membrane protein YueB [60,61]. Moreover, the well-studied T4 Escherichia coli phage has been described to have both long and short tail fibers involved in the adsorption process of the phage to the bacterial cell receptors [62]. RBPs are normally identified through bioinformatic analysis of phage genomes where possible candidates are compared with protein databases to find homologies with other known RBP-coding genes, such as tail fiber or tail spike genes [58]. This identification becomes challenging due to the great variability of phage morphology and phage-host interactions [58,59,63,64]. Simpson et al. proposed a laboratory approach using phage genome expression libraries and protein screens for discovering RBPs that are known to recognize at least one strain of the target bacteria [43].
After identification, cloning, and expression of RBPs, the functional analysis of these proteins is required to evaluate their specificity and sensitivity against the target bacteria [58]. The genes encoding these proteins can be fused with genes for fluorescent proteins, such as GFP (Green Fluorescent Protein) and mCherry, enabling its functional analysis by fluorescent microscopy or spectrofluorometry [65,66]. Further insights into the conformational changes that occur in RBPs upon phage adsorption can be obtained through X-ray crystallography and cryo-electron microscopy and morphological information from transmission electron microscopy [31,[67][68][69][70][71][72][73][74][75][76][77][78].
Even though the variability among RBP sequences is immense, typically the protein's N-terminus is connected to the phage head while the central part or Cterminus is available for binding to bacteria [24,42,79]. Duplessis et al. showed that when aligning several RBP sequences from seven phages infecting Streptococcus thermophilus, the N-terminal regions are highly conserved, while the C-terminal domain diverges in a region that may be responsible for host recognition [80]. In phages infecting Gram-negative hosts, sequence conservation at the N-terminus has also been observed in the RBPs, which emphasizes the fact that the C-terminal region of these proteins is under evolutionary pressure and their genes evolve more rapidly, through horizontal transfer [81][82][83][84]. A shorter derivate of the RBP gp47 of the Campylobacter jejuni phage NCTC 12673 was constructed after having localized its binding domain to bacterial cells at the C-terminal. The truncation of the RBP enabled high yields in its production and did not compromise its ability to recognize both C. jejuni and Campylobacter coli [84]. Moreover, Le et al.
reported a recombinant phage where the replacement of a tail fiber gene resulted in altered host specificity. These authors also demonstrated that a single point mutation at the C-terminal of a putative tail fiber gene resulted in the modification of the lytic spectrum of the mutant phage JG004-m0 in comparison with the parental phage [85].
The presence of enzymatic properties, derived from depolymerases, capable of degrading the biopolymers forming the bacterial cell envelope has been reported in RBPs of phages that infect Gram-positive and Gramnegative bacteria [24,42]. For example, hydrolytic activity has been observed in TSPs, namely ɸ 29 TSPs can degrade teichoic acids of B. subtilis [86], and P22 and HK620 TSPs can cleave LPS of Salmonella enterica [87,88] and E. coli [83], respectively.
Latka et al. presented a bioinformatic analysis of various Klebsiella RBPs' architecture, highlighting the homology between RBP domains in different Klebsiella phages. Similar domains for attachment of an RBP to the phage tail (anchor domain) or for branching of RBPs (T4gp10-like domain) occur frequently among the Klebsiella phages. The organization and homologies between anchor, T4gp10-like, enzymatic and structural domains reveal the evolutionary linkages between phages and their RBPs. Moreover, while the structural domains are located at the N-terminus, enzymatic domains are typically located in the middle of the protein sequence [89]. The authors demonstrated that the ability of the phages to change their host spectrum is due to the strong horizontal gene transfer in the RBPs' enzymatic domains. The presence of multiple RBPs with depolymerases having distinct specificities within the same phage also dictates the phages' spectrum. This occurrence has been previously reported in phages, particularly in those infecting Gram-negative bacteria [72,[89][90][91][92][93]. Pan et al. have reported the specificity of different tail fibers from the UK64-1 ophage against distinct Klebsiella capsular types by exhibiting each protein depolymerase activity capable of degrading the capsular components of Klebsiella bacterial strains [91].
A comparison of RBPs structures of different phages has revealed a trimer-like structural organization of RBPs. Occasionally, the RBPs' C-terminal may contain a chaperone protein that aids in the RBPs' three-dimensional folding and trimerization [94]. The role of the Cterminal in host recognition is perceivable not only in the low homology in its amino acid sequence but also in the RBP' structure [95,96]. In Gram-positive bacteria, RBPs belonging to Lactobacillus lactis infecting phages have been extensively studied [31,68,[95][96][97][98][99][100][101][102]. The first RBPs from Lactococcus phages that were structurally characterized were from phages 936-type p2 [97], bIL170 [95], and P335 TP901-1 [100]. It was observed by Sciara et al. that the RBPs from these phages are composed of three separate domains, corresponding to the shoulder, neck, and head, characterized by different structural elements. The neck is a b-prism domain that connects the N-terminus (shoulder), containing an a-helix bundle, and the head (C-terminal), responsible for binding to the host receptors [99]. The RBP from Staphylococcus aureus ɸ 11 phage has also been described as having three identified domains, the N-terminal region, a central part, and the C-terminal domain. The first is composed of a triple-helical bundle while the central domain has three b-propeller domains. Finally, the C-terminal domain contains two domains formed by three five-stranded anti-parallel b-sheets, one from each monomer, that are covered on their surface-exposed side by loops and one short a-helix each [67,71]. Moreover, the structure of Podoviridae phage T7 RBP was described by Garcia-Doval et al., where the crystal structure of the C-terminal is composed of a mainly b-structured pyramid domain, with three short a-helices at its end, and a globular tip, whose monomers contain an eight-stranded b-sandwich. Mutants in this tip domain have been linked to changes in the phage's host range, suggesting its role in the interaction with receptors on the cell surface [103]. P22 S. enterica phage has been immensely studied by X-ray crystallography, nuclear magnetic resonance and fluorescence spectroscopy [87,104,105]. Each P22 phage contains six receptor-binding proteins with multiple sites for LPS binding through the b-helical subunits of the RBP [87]. The P22 RBP has also allowed the analysis of the force of interaction with its O-antigen polysaccharide receptor by immobilizing the protein on the surface of atomic force microscope cantilevers [106]. Characterization of other RBPs has also been reported, namely for phages of Acinetobacter baumannii [107,108], C. jejuni [84,109], Salmonella spp. [75,77,88], and Klebsiella pneumoniae [89].
The RBPs' structure is influenced by the nature of the bacterial cell receptor itself (e.g., protein or carbohydrate). Gram-positive infecting phages binding to protein receptors will normally have the end of the tail fibers sharp or spiked whereas a larger baseplate forms when a phage attaches to a carbohydrate receptor [58,110]. This observation is visible in RBPs binding to protein receptors belonging to B. subtilis SPP1 phage, Bacillus anthracis c phage, and c2-type phages, whereas phages P2, Tuc2009, and TP901-1 from L. lactis are known to bind to a carbohydrate through a larger baseplate [102,111].
Recently, Dunne et al. studied the correlation between the RBPs' amino acid sequences, the features of the structural domains, and the phages' host ranges that were predicted phylogenetically and through serotyping [112]. In this study, it was possible to produce polyvalent phages with mutated RBPs that resulted in extended host ranges. Thus, the importance of the structure of RBPs and the understanding of their interaction with host receptors cannot be understated if intended to use these proteins in downstream applications.

Exploitation of RBPs for bacterial detection
Due to their unique properties, RBPs have been emerging as promising diagnostic tools for pathogen detection (Table S1), especially when combined with diverse detection platforms ( Figure 3).
Often, RBP-based detection techniques rely on the Cterminal of the RBP for the recognition process and binding to the cell wall receptors of the target bacteria, while the N-terminal can be easily engineered to achieve an oriented immobilization of the RBP [24,42]. In fact, several studies showed that different tags can be fused to the N-terminal of the RBPs sequence without affecting their binding affinity [24,113]. These tags include: cysteine (Cys-tag) [31,114], glutathione-S-transferase (GST-tag) [38,84,115] or poly-histidine (His-tag) [49][50][51]65,[116][117][118][119][120][121]. The oriented immobilization of RBPs has been demonstrated to improve the capture efficiency and sensitivity of bacterial detection when compared to unoriented immobilization. Examples include the immobilization on surface plasmon resonance (SPR) substrates of the GST-tagged gp48 to detect C. jejuni [38] and the Cys-tagged RBPs from phage P22 for S. enterica serovar Typhimurium [31]. Also, the RBPs J and Det7T with a His-tag allowed the detection of E. coli at 2 Â 10 4 CFU/mL [118] and S. Typhimurium for a concentration range of 5 Â 10 4 -5 Â 10 7 CFU/mL in spiked apple juice [119], respectively. Tay et al. [114] also presented a very interesting RBP-based detection method using a Surface-enhanced Raman scattering (SERS) based biosensor, where nanoaggregate embedded beads (NAEBs) were conjugated using three different cross-linking strategies with an RBP from the Salmonella phage P22. The first method of RBP-NAEB conjugation was achieved using N-hydroxysuccinimide ester (NHS-maleimide) to cross-link the NH 2 -terminated NAEB to the Cys-tag of the RBP. The second and third techniques relied on the binding affinity between the RBP's His-tags and the metal ions (Zn 2þ and Ni 2þ , respectively) on the silica shells of NAEBs. All these three strategies allowed specific detection of Salmonella, with a detection limit of a single cell [114]. The RBP gp37 from the well-studied phage T4 was also immobilized through its Hist-tag on the surface of a long-period grating (LPG)-based sensor, allowing to detect E. coli B LPS [49] and E. coli [51,116]. Similar to LPG-based sensors, an additional label-free bacterial LPS detection method has been developed coupling the gp37 with highly sensitive microwave sensors [50]. These sensors were capable of measuring capacitance and conductance changes for the specific gp37-LPS E. coli B interaction, while nonspecific interactions resulted in signal minimal changes [50].
The majority of the magnetic separation (MS) techniques for the concentration of bacterial cells reported in the literature rely on antibodies. The use of RBPs presents several advantages due to their inherent properties [24,122], allowing to overcome the drawbacks of the immunomagnetic separation (IMS) assays, particularly low stability, high associated costs, sample matrix interference during signal detection, and low specificity, while promoting a fast interaction with the target pathogens [47,[123][124][125]. There are some studies reporting the successful implementation of RBPs as affinity molecules for a simple pre-enrichment through the functionalizing of magnetic particles (MPs), allowing to concentrate and separate pathogens from different samples by using an external magnet [38,47,48,117,120,126]. While most of the techniques described were only tested using bacterial suspensions [38,48,117,126], there are a few studies that report the successful recovery of pathogens in more complex spiked samples, particularly milk [47,120], chicken [47,120], and human urine [126].
These magnetic-based techniques can also be coupled with other detection systems, such as biosensors [117], molecular methods [47], or MALDI-MS [48], enabling a selective separation of cells from samples with low bacterial concentrations without requiring a pre-enrichment step. Cunha et al. [117] used Nickelcoated MPs functionalized with two different Histagged RBPs (gp18 and gp109) to develop a specific multiplex capture assay that was able to individual and simultaneous recover Enterococcus faecalis and S. aureus cells for concentrations as low as 10 CFU/mL. Additionally, this technique was coupled with a magnetoresistive platform for rapid multiplex detection of both bacteria using RBPs gp18 and gp109 as bioprobes immobilized on the sensors' surface [117]. Poshtiban et al. [47] combined the RBP gp48-MPs with real-time PCR for the detection of foodborne bacteria. In a different study, the authors have also successfully conjugated the RBP gp48 with microresonators for the specific detection of C. jejuni cells [115]. Bai et al. [48] reported that MPs functionalized with two RBPs, namely TF2 and TF6 from A. baumannii phages were able to recognize and conduct target bacterium conjugates under magnetic isolation for posterior detection through MALDI À MS within 10 min. Also, He et al. [126] functionalized MPs with the RBP P069 for the capture of Pseudomonas aeruginosa and posteriorly bioluminescence (BL) detection. After the bacterial cells were captured by the P069 functionalized MPs and lysed using hexadecyltrimethylammonium bromide (CTAB), which allowed the release of the intracellular adenosine triphosphate (ATP), an ATP BL solution was added to trigger the signal for P. aeruginosa detection [126].
Besides its use on magnetic-based assays, RBPs can also work as a potential alternative to antibodies for the detection of pathogens in immunological techniques, such as ELISA-like tail spike adsorption assay (ELITA) [39,127]. In particular, ELITA is a microtiter plate format screening assay used to detect the O-antigen serotype, an important molecule for the detection of Shigella flexneri [127] and Salmonella [39]. These studies resulted in the rapid and highly specific detection of O-serogroup strains by the RBPs of the Salmonella phages 9NA and P22 [39] and the RBP of the Shigella phage SF6, proving the RBP-based ELITA to be more specific in the detection of the O-antigen serotype of Shigella cells than PCR [127]. Furthermore, to exclude the matrix effect on the signal in ELITA, these RBPs were tagged with a fluorescent dye label to evoke a fluorescence emission signal in the visible spectrum upon O-antigen binding, detected through flow cytometry [39] or fluorescence spectroscopy [127]. Also, as a detection and quantification assay, the Salmonella cells captured by the gp37-gp38 functionalized beads were used in a self-sandwich-based detection assay, particularly the enzymelinked long tail fiber assay (ELLTA), resulting in the detection of S. Typhimurium for a concentration of 10 2 CFU/mL [120].
Other studies have used fluorescent probes to allow the detection of bacteria through fluorescence-based methods [126,128]. Shi et al. [128] developed a detection method based on a dual-site recognition of P. aeruginosa in different spiked samples by using the RBP P069 and polymyxin B (PMB), a polypeptide antibiotic with an affinity for the outer membrane of Gram-negative bacteria. The P069 was immobilized in microplate wells for the specific recognition of P. aeruginosa and the bacteria were detected by spectrofluorometry due to the fluorescein isothiocyanate (FITC)-conjugated PMB bound to the cells [128]. He et al. [126] reported a sandwich fluorescence method for P. aeruginosa detection, using the RBP P069 as the primary probe and tetraethyl rhodamine isothiocyanate (TRITC)-labeled P069 for the fluorescent detection. The method allowed the detection of P. aeruginosa with a concentration of 1.7 Â 10 2 CFU/mL, revealing a higher sensitivity compared to the antibody-based detection [126].
Another approach is to use recombinant RBPs, in which the proteins are genetically fused with different fluorescent proteins for the specific detection of bacteria using fluorescence-based methods, such as microscopy [44,65,66,84,129] and spectrofluorometry [65,66]. Particularly, in a study by Braun et al. [44], bacterial spores from B. anthracis were tested with fluorescent fused RBPs from four different Bacillus phages under fluorescence microscopy, where fluorescence signals were only detectable for germinated spores, revealing that it is likely that RBPs are not able to bind to nongerminated spores. The ability to fuse RBPs with fluorescent tags brings several new possibilities since different colored fluorescent RBPs can be combined and used for the multiplex detection of different bacterial species. Santos et al. [65] utilized two RBPs, one targeting Staphylococcus and one targeting Enterococcus species, each fused with a different colored-fluorescent protein, enabling the multiplex detection of these pathogens in artificially contaminated horse blood by spectrofluorometry. In a more recent study, this assay was used as a semiquantitative assessment of the specificity and sensitivity of an RBP (gp86) from a K. pneumoniae phage fused with a red fluorescent protein [66].
Hence, RBPs have been proved to be promising biorecognition probes with several application possibilities for bacterial detection. Further research should be conveyed to expand the applicability of new RBPs for the detection of other bacteria and to improve the detection limits on complex samples, particularly in food and clinical samples.

Cell-wall binding domains of endolysinsstructure and applications for diagnosis
Endolysins are phage-encoded enzymes produced by newly formed progeny phages at the final stage of the lytic cycle which are responsible for breaking down the PG of the bacterial cell wall from within ( Figure 2). These enzymes can be assisted by the phage holins, which create pores in the plasma membrane, enabling endolysins to reach and degrade the host cell wall thereby releasing the progeny virions [130,131].
Generally, endolysins are composed of one or more enzymatically active domain(s) (EAD) functioning as cleaving agents of the bacterial PG, and a CBD responsible for the specific recognition of the ligands (commonly carbohydrate components) in the target cell wall structure [131]. In Gram-positive bacteria, as soon as the bacterial cell wall is degraded by endolysins, progeny phages are free to leave and infect other susceptible host cells, while in Gram-negative bacteria the process is not so simple due to the presence of an outer membrane (OM), protecting the PG layer from the lytic activity [132]. Therefore, endolysins are usually divided according to their origin, those derived from phages infecting Gram-positive or Gram-negative bacteria, which generally impact their modular structure. The globular proteins with a single EAD are mostly found among the endolysins encoded by Gram-negative host phages, rarely showing a modular organization with a specific CBD module [131,[133][134][135]. The few endolysins showing this structure are composed of a C-terminal EAD and N-terminal CBD [136,137], like the KZ144 endolysin from the Pseudomonas phage ɸ KZ [138]. On the other hand, modular enzymes composed of EAD(s) and a CBD connected by a linker, are prevalent among Gram-positive specific phages [131,133,134].
Due to their outstanding properties like high affinity and specificity, CBDs have shown promising results when combined with different techniques for the detection of bacteria ( Figure 3, Table S2).

Function and structure of CBDs
Generally, the CBD module from Gram-positive derived endolysins is reported as the major responsible for the endolysin specificity [139,140], being hypothesized that CBD ensures proper orientation of the enzyme toward its substrate, which may be molecules within the cell envelope, such as PG components or even cell wallassociated molecules [140,141] (Figure 2). The interaction between CBDs and respective ligands is known to be charge-dependent and frequently described as possessing exceptionally high affinity [142,143]. Interestingly, complete CBD sequences are not always needed for binding purposes since a 10 amino acid motif of the PlyG CBD demonstrated to be enough for the specific recognition of B. anthracis when coupled to fluorescent quantum dots (QDs) [144]. Santos et al. investigated the C-terminal portion of Paenibacillus larvae PlyPl23 lysin, to find the smallest amino acid sequence capable of retaining the binding affinity to the bacterial cells. The authors concluded that a truncated version (containing 63 amino acids, from the residues 161 to 223) of the P. larvae CBD was able to maintain the same binding ability as a complete CBD [145]. On the other hand, some studies described an enhanced CBD binding affinity when it was combined with an amidase domain [146,147].
The CBDs have been classified into different types according to their sequence or structure similarities, and binding epitopes on the bacterial cell wall, being possible to group them in agreement with the conserved binding modules that have been described in the literature. These include the domains: choline-binding (CW_binding_1), SH3 or SH3b, CW_7 and PG_binding_1 (3 helix bundle), LysM, and the a/b structures [140,148], among others.
The choline-binding elements are most widespread in pneumococcal endolysins and are commonly recognized as binding to choline-containing teichoic acids within the cell wall. The CpI-1 is the most studied pneumococcal endolysin, in which the N-terminal super helical moiety of the CBD is comprised of stacked choline-binding repeats that produce choline-binding positions among the repeats [141,149]. SH3b domains are generally reported as having the glycine-rich interpeptide bridge, commonly present in most staphylococcal strains, as a recognition and binding target, leading to the assumption that the SH3b is responsible for broad recognition of conserved ligand epitopes among a bacterial genus [150]. Although there are exceptions [151], the recently characterized endolysins possessing the SH3 corroborate this theory [146,147].
Concerning CBDs constituted of a three-helix bundle (CW_7 and PG_binding_1 modules), they are present in both Gram-positive and modular Gram-negative endolysins. The three-helix bundle seems to be a frequently used conformation for cell wall binding, although, the target epitopes in the cell wall may be varied [148]. As an example, the Gram-positive endolysin Cpl-7 employs three repeats of the three-helix bundle CW_7 fold to attach to N-acetyl-D-glucosaminyl-(b1,4)-N-acetylmuramyl-L-alanyl-d-isoglutamine in PG [152]. On the other hand, the CBDs from Gram-negative bacteria endolysins (such as the AP3gp15 and gp144) are comprised of a single three-helix bundle and have in common specific repeats of a motif, hypothesized to be implicated in PG binding [153,154]. This motif usually appears in lytic enzymes targeting bacterial cell walls and cell surfacerelated proteins [154].
The LysM domain is very common in PG hydrolases and is reported to bind chitin, a b-(1!4)-linked N-acetylglucosamine oligosaccharide ((GlcNAc)n), present in the sugar-backbone of the PG [155]. Enterococcus phage phiEf11 endolysin is predicted to have a LysM as CBD [156].
Some CBDs present an arrangement of secondary structure elements comprising a central parallel (such as Clostridia endolysins CTP1L and CD27L) or antiparallel (such as the Bacillus phage lysin G, PlyG) four-stranded b-sheet flanked by two helices [157]. CTP1L, PlyG, and CD27L endolysins although having low sequence identity showed a structural similarity [148,157,158].
The binding spectra of CBDs are commonly broader than the host ranges of the corresponding phage, able of showing specificity for an entire bacterial genus [159][160][161][162]. As an example, the CBD from LysF1 endolysin revealed binding affinity against staphylococci and to a Streptococcus species [162]. Yet some CBDs exhibit specificity down to species [163], genotypes [145], and serovars [164]. For example, pneumococcal endolysins have a CBD that binds choline, only found in pneumococcal teichoic acids, leading to specificity at the species level [163], whereas the CBD of Listeria endolysin PlyP35 specifically binds to terminal N-acetylglucosamine residues present in teichoic acids of a specific Listeria monocytogenes serovar, originating a CBD specificity at the serovar level [164]. Regarding the CBDs from Gram-negative host phages endolysins, most of them are predicted to bind A1c PG (conserved domains PG_binding_1 and PG_binding_3), not having a specific target in the cell wall [137]. Therefore, the CBDs derived from Gram-negative endolysins do not restrict their specificity [137,138].

Exploitation of CBDs for bacterial detection
The phage endolysin CBDs have been proving as successful recognition molecules and several studies have reported their application for bacteria detection over the years [37,143,[165][166][167] (Table S2). The CBDs are commonly much smaller compared to antibodies, usually having 10-20 kDa against the 150 kDa that most antibodies possess [37]. Furthermore, the number of CBD binding sites on a bacterial cell is reported to be at least 10 7 [168] and their equilibrium association constants to their carbohydrate ligands are high [143,169] and comparable with those of secondary antibodies against bacterial cell surface antigens [170,171]. Another benefit of the use of CBDs is the fact that they have different binding specificities, which can be attractive depending on the purpose. According to the desired method, with less or higher specificity, the more suitable CBD can be chosen, if available. All these characteristics make CBDs very attractive from a bacterial detection point of view, highlighting their potential as an alternative to antibodies or other biorecognition molecules (Table 1).
Fluorescent labeling of CBDs is the most common approach to target bacterial pathogens and was firstly described for the detection of foodborne bacteria [143].
To date, fluorescent-fused CBDs were developed to recognize and detect L. monocytogenes [35,164], Clostridium spp. [172], B. cereus [173,174], Staphylococcus spp. [33,113,147,175], P. larvae [145] and B. anthracis [144]. The labeling of CBDs with fluorescent proteins opens the possibility of combining various colored-fluorescent tags, enabling the identification and differentiation between serovars [164], and possibly between strains [175]. It also allows a multiplex bacterial detection if CBDs that target distinct host bacteria are labeled with different fluorescent proteins. Schmelcher et al. conducted one of the first studies in which different CBDs modules from L. monocytogenes phage endolysins were fused to three different fluorescent proteins allowing the detection and differentiation of the five major serovars reported for Listeria in pure and mixed cultures and food items [164].
Interestingly, by swapping or combining different CBDs in a recombinant endolysin, the CBD modules may cooperate to achieve: higher binding affinities [169], increased lytic activity [169,176], swapped specificity [169,177,178], extended lytic spectrum [179] or binding spectrum [169], or improved lytic effect at high ionic strengths [169]. Nevertheless, to accomplish this, it is necessary to conduct deep studies characterizing a variety of CBD modules individually and constructing several chimeric proteins to understand which chimera has the best performance. Schmelcher et al. synthesized a panoply of chimeric peptides composed by the fusion of different Listeria CBDs in various combinations of orientations. The chimeric proteins which combined the original CBD500 and CBDP35 were the ones that revealed a broader host range than the parental CBD, being able to recognize almost all types of Listeria serovars tested. Besides this, the authors also made a chimera with a duplication of a CBD domain and demonstrated a 50-fold increase in the binding affinity of the chimeric protein CBD500-500 over the original CBD500 [169].
Besides the fluorescent labeling, CBDs can also be used as probes to functionalize MPs for MS of bacterial pathogens from several samples, and work as a preenrichment step before quantification by other detection systems, such as spectrophotometry, PCR, or biosensors [35,164,166,167,175,180]. Recently, Kretzer et al. reported a sensitive and fast method to detect Listeria from foods resulting from the combination of a CBDbased bacterial MS and a bioluminescent reporter phage. The developed method demonstrated to detect 0.1-1 CFU/g of Listeria in food samples in a total assay time of 22 h [180]. Another example was described by Park et al., which combined CBD-MS with ATP bioluminescence assay, to obtain a detection of 10 3 CFU/mL of B. cereus in blood [166].
Likewise, CBDs have been coupled with other methodologies as described by Kwon et al. which synthesized biotinylated CBDs for the specific detection of Staphylococcus, Bacillus, and Listeria, individually and in mixed populations [181]. The first approach was an ELISA sandwich assay, where biotin-tagged CBDs were applied as probes to capture bacteria, further detected by absorbance measurements. On the other hand, the authors developed a quantitative PCR (qPCR) method based on biotin-DNA barcodes, which were mixed with CBD-streptavidin complexes. The resulting CBD-streptavidin-DNA complexes were incubated with individual and mixed populations of bacteria and detected and quantified by qPCR. Although the two methodologies have proven efficient as detection systems, the qPCR showed the best results, being able to detect 2, 3, and 6 CFU/mL of S. aureus, B. anthracis, and Listeria innocua, respectively [181]. More recently, Yang et al. developed a lateral flow sandwich assay by combining a CBD with fluorescent microspheres and a nitrocellulose membrane with specific antibodies to capture methicillinresistant Staphylococcus aureus (MRSA). The assay is possible to be completed within 10 min and achieved detection levels of 10 2 CFU/mL [182].
CBDs have shown to be good probes for concentrating and detecting pathogens from complex matrices which generally impair the success of several detection methodologies, such as PCR, due to the interference of numerous components present in the sample [16,180,183,184]. CBDs when combined with a variety of platforms for signal acquisition and quantification, such as biosensors, spectrometry, or flow cytometry, allowed the detection of bacteria in blood [147,166], plasma [144], urine [175], water [175] and food samples [35,36,167,172,173], as highlighted in Table S2.
G omez-Torres et al. reported a CBD from the CTP1L endolysin able to recognize specifically Clostridium spp. strains. The authors fused the CBD with a GFP, which allowed the direct recognition of Clostridium tyrobutyricum vegetative cells in the matrix of late blowing defect cheese and binding to spores from several Clostridium species [172]. Other examples of CBD-based detection in food, include milk, rice, meat, poultry, and other dairy products using techniques, such as PCR [36,167], MS [35,36,167,185], or biosensors [173,186].
Likewise, studies on the detection of B. anthracis, the causative agent of anthrax, have been carried out, using the high affinity and sensitivity of a gamma-phage lysin-derived CBD [45,144], coupled with a QD based-fluorometric assay, making possible the detection in spiked plasma [144].
More recently, Costa et al. described an amidase-SH3 protein able to recognize and detect Staphylococcus spp. in blood. The authors constructed three GFP-fused proteins and the amidase-SH3 protein, showing the highest binding efficiency, was applied for detection of 1-5 CFU/mL of S. aureus in spiked blood samples using a flow cytometry assay [147].
As major drawbacks, the CBDs cannot detect Gramnegative pathogens because of the OM that shields the PG layer which is the target of CBDs from Gram-negative infecting phages. Also, although these proteins detect dead or compromised cells if the receptors on the bacterial cell wall are intact, this issue has been overcome using a pre-enrichment step [147,166,180]. Apart from some restrictions, CBDs have been proving very effective as a diagnostic tool in diverse detection platforms and further investigations should be conducted to continuously enhance their properties, improve the detection limits on real samples and find new endolysin CBDs targeting other bacterial species.

Conclusions and prospects
Bacterial pathogens remain the principal causative agents of several nosocomial and foodborne diseases, which are becoming more problematic and difficult to appropriately manage due to antimicrobial resistance. Despite the endeavors in diagnostic technologies, rapid and accurate methodologies for bacterial detection are still lacking. For this, the choice of the recognition molecule is of utmost importance to enable a specific and targeted identification of the causative agent while disregarding the matrix-related inhibitors and non-target pathogens. The most commonly used probes often present hurdles concerning the cross-reactivity or costeffectiveness when applied for multiplex detection of bacterial pathogens.
The studies presented herein show that phage RBPs and CBDs present exceptional characteristics that exceed the standard recognition molecules, such as superior sensitivity, specificity, and stability under extreme conditions which enable their use on-site. Moreover, due to the progress in biotechnology and synthetic biology, chimeric proteins can be designed with improved properties for diagnostics and adaptability to different bacterial targets and detection systems.
Despite the advances in phage protein technologies which over the past years seem to be exceptional, efforts have mostly been confined to laboratory research. Therefore, some challenges must be surpassed for their commercialization and application as a routine diagnostic tool in clinical practices or in the food industry. It is noteworthy that the number of available and well-characterized phage proteins for different pathogens is still limited and there is no established library that can be used to find the most suitable RBP or CBD for each causative agent. The overall process for obtaining these proteins demands bioinformatic skills to find the most promising proteins in the sequence of phages deposited in the GenBank database from the National Center for Biotechnology Information (NCBI). Moreover, further knowledge of molecular biology techniques is required for their synthesis, recombinant expression, and purification. Additionally, the functional analysis of these proteins is mandatory to assess their ability to bind to the target bacterium. Throughout this process, some constraints occur, namely the difficulty in the expression and purification of some proteins and the inability of some of the selected proteins to adsorb to the target bacterium. Moreover, there is a lack of studies evaluating their effectiveness using different analytical methods in real samples, especially clinical specimens. Therefore, an effort should be made in this direction to convince the industry that phage proteins are trustworthy affinity tools for bacterial detection. This will enable the creation of a portfolio of CBDs and RBPs targeting different bacteria, validated in terms of affinity, specificity, and reproducibility for different applications. The phage proteins can then be selected or/and customized according to the purpose of the experimental methodology.
The RBPs and CBDs are produced by the recombinant expression which is a fast and low-cost technology that overcomes the hurdles associated with the in vivo approaches used for antibodies' generation [187]. Also, with the recent developments in genetic engineering and biotechnology, it is amenable to say that established manufacturing methods for the large-scale overexpression of these proteins will arise, providing inexpensive phage proteins' probes and surpassing the batch-to-batch variations that occur with other recognition molecules (e.g., antibodies) [187].
The phage proteins can also be coupled with analytical techniques or biosensing platforms and after validation, these kits/devices can be commercialized in accordance with international quality standards, such as the European Union (EU) Directive 98/79/EC or Food and Drug Administration (FDA) regulations. This process entails time and significant financial resources that can be compensated by the cost-benefit of these phage proteins which will have a major impact on the implementation of these technologies.
There are already some phage-based diagnostic tools applied for bacterial detection available on the market, such as the MicroPhage KeyPath technology (MicroPhage, Inc, Colorado, USA), which has been approved by the FDA for the diagnosis of bloodstream infections [188]. Another example is the Actiphage (PBD Biotech Ltd, Suffolk, United Kingdom), a novel approach for the detection of Mycobacterium in blood or milk [189].
The commercialization of the Vitek Immunodiagnostic Assay System UP (VIDAS UP, Biomerieux, Craponne, France) which uses recombinant RBPs to detect foodborne pathogens paves the way for the introduction of phage protein-based diagnostic tools on the market [190].
Overall, the potential of phage-encoded proteins is extraordinary, and research should be continuously performed to find new binding proteins for other relevant bacterial species and improve the properties of the existing ones. Also, the use of these recognition molecules in integrated point-of-care (POC) and point-of-use (POU) technologies capable of proper sample preparation and bacterial detection on a single device, present themselves as very promising approaches for practical application in the clinical, food safety, and environmental fields.