Reconstruction of biofilm images: combining local and global structural parameters

Digitized images can be used for quantitative comparison of biofilms grown under different conditions. Using biofilm image reconstruction, it was previously found that biofilms with a completely different look can have nearly identical structural parameters and that the most commonly utilized global structural parameters were not sufficient to uniquely define these biofilms. Here, additional local and global parameters are introduced to show that these parameters considerably increase the reliability of the image reconstruction process. Assessment using human evaluators indicated that the correct identification rate of the reconstructed images increased from 50% to 72% with the introduction of the new parameters into the reconstruction procedure. An expanded set of parameters especially improved the identification of biofilm structures with internal orientational features and of structures in which colony sizes and spatial locations varied. Hence, the newly introduced structural parameter sets helped to better classify the biofilms by incorporating finer local structural details into the reconstruction process.


Introduction
Biofilms are sessile communities of microorganisms residing in a matrix formed by extracellular polymeric substances (EPS) and biomass (Costerton et al. 1995;O'Toole et al. 2000). These macroscale structures generally develop at phase interfaces and are the dominant mode of life for bacteria (Costerton et al. 1987). The role of biofilms has been studied in industrial processes (Nicolella et al. 2000;Brooks & Flint 2008;Rosche et al. 2009), environmental ecology (Davey & O'Toole 2000;Morris & Monier 2003), and human health (Lindsay & Von Holy 2006;Marsh 2006). The physical structure of a biofilm determines how resident microorganisms interact with their environment, primarily due to mass transport and energy transmission limitations that result in feedback loops between chemo-and photo-clines, community speciation, and phenotype variability (Picioreanu et al. 2000;Wimpenny et al. 2000;Lindemann et al. 2013;Renslow et al. 2013). Therefore, it is critical for researchers to understand and quantify the structure of biofilms, so that they can be controlled and engineered.
There has been a significant amount of research effort dedicated to quantifying biofilm structure, including developing an array of quantifiable parameters that can be calculated using biofilm image analysis software in an automated manner (Heydorn et al. 2000;Beyenal et al. 2004;Daims et al. 2006;Mueller et al. 2006;Milferstedt et al. 2008). Although a large array of structural parameters can be quantified from a biofilm image, there has been a trend in the literature to use only a limited number of parameters (de Carvalho & da Fonseca 2007;Rodriguez & Bishop 2007;Luef et al. 2009). It is generally hypothesized that these commonly used structural parameters can characterize the differences or similarities between biofilm structures, and are capable of uniquely describing biofilm structure. In principle, for a parameter set to uniquely describe biofilm structure, two similar looking biofilm images should have nearly identical parameter values, and two dissimilar looking images should have different parameter values.
Selection of the structural parameters generally depends on the preferences of the researchers rather than having a scientific basis. The hypothesis that a typical parameter set can uniquely identify biofilm structures was recently tested by the authors' group (Renslow et al. 2011) by using biofilm image reconstruction as a tool. The primary goals of the study were: (1) to test the belief that commonly used structural parameters can characterize the differences or similarities between biofilm structures; and (2) to address the general problem of how to select meaningful parameters that can describe biofilm structure. The biofilm image reconstruction approach consists of calculating image parameters from an experimental biofilm image, and then generating random synthetic images that have nearly identical parameter values to those in the chosen parameter set using a computer algorithm (Figure 1). If selected parameters describe biofilms appropriately, then the experimental and reconstructed images should look very similar to each other. The reconstruction process was used to determine the ability of structural parameters to uniquely describe biofilm structures. In the study by Renslow et al. (2011) it was found that two biofilms with identical structural parameter values can look different. The results obtained led to the conclusion that the set of parameters that are typically used in biofilm research may not reveal the essential differences between biofilm structures (Renslow et al. 2011). Therefore, the authors' earlier study established that reliable biofilm structure reconstruction requires a more suitable set of parameters.
The goal of the present work was to introduce new local and global parameters to describe biofilm structure and test them using a generalized image reconstruction algorithm. Towards this goal, new parameters were introduced that characterize (1) the shape and orientation of cell colonies, which can be thought of as 'intra-colony' properties; and (2) the relative placement and relative orientation of colonies, which can be thought of as 'intercolony' properties. Intra-colony properties are the average major axis length (MJLave), average minor axis length (MNLave), maximum axis length (MJLmax), maximum minor axis length (MNLmax), median axis length (MJLmed), and median minor axis length (MNLmed). Inter-colony parameters related to the relative placement of colonies are the average of nearest colony separation distances (ARSmin), the average of median colony separation distances (ARSmed), and the average of size weighted colony separation distances (ARSwgh). Inter-colony parameters associated with the relative orientation of colonies are the average of colony size weighted eccentricities (WEave) and the average of size weighted relative colony orientations (WROave and W2ROave). Lastly, a co-occurrence matrix based description of local biomass distributions (COO) is defined. For categorization purposes, these parameters are classified as global and local parameters. Global parameters are the areal porosity (AP), perimeter (P), average diffusion distance (ADD), maximum diffusion distance (MDD), major axis length (MJL) and minor axis length (MNL). These parameters provide an overall characterization of the biofilms. Local parameters are ARSmin, ARSmed, ARSwgh, WEave, WROave, W2ROave, and the COO matrices. These parameters provide information about the local properties and about the relative positioning of the cell colonies. The previously developed biofilm image reconstruction algorithm was modified to incorporate the new parameters, to assess whether they were better descriptors of biofilm structures.

Biofilm images
Biofilm images generated by the authors' group and used in their earlier image reconstruction research (Renslow et al. 2011) were used in this study also. Repeated use of the same images, and the treatment of the earlier study as a benchmark to compare against, allowed for a quantitative evaluation of the modified algorithm and the ability of newly added parameters to correctly identify biofilm structures. The preparation of the biofilms and the acquisition of the images have been described in detail elsewhere (Lewandowski & Beyenal 2007). Briefly, the biofilms were composed of a mixed culture of environmental isolates consisting of Pseudomonas aeruginosa (ATCC 700829), Pseudomonas fluorescens (ATCC 700830), and Klebsiella pneumoniae (ATCC 700831). Open channel reactors perfused with aerated defined media were used to grow the biofilms, similar to those described in previous publications (Beyenal & Lewandowski 2000;Lewandowski & Beyenal 2007). To generate a variety of biofilm morphologies, a range of Figure 1. Image reconstruction is a tool for assessing structural parameters and their ability to uniquely describe biofilm architecture. This figure shows a schematic description of the biofilm image reconstruction process. First, biofilms are grown and then imaged, typically using a microscope. Image analysis is performed on the original biofilm image to quantify structural parameters. Next, the parameter values are used as input in the reconstruction process. Image reconstruction generates reconstructed images with nearly (eg within the optimization accuracy) identical structural parameter values as the original biofilm image. If the reconstructed images are similar to the original image, the selected parameters are known to uniquely describe the biofilm structure. flow velocities (3.2-10 cm s −1 ), glucose concentrations (50-150 mg l −1 ), and growth periods (five to eight days) were used. The biofilms were grown in a flat-plate flow reactor where the bottom of the reactor was made of glass. The reactors were imaged with a Nikon (Nikon Instruments Inc., Japan) inverted microscope, and the images were taken using an Olympus (Olympus, Tokyo, Japan) A20PL 10× objective (0.40, 160/0.17).
Gray level images (640 × 480 pixels) were taken and then reduced to binary images using Otsu's (1979) method with automated threshold detection. The binary images were reduced in size to 100 × 75 pixels for the reconstruction process. As in Renslow et al. (2011), a total of 15 different biofilm images were used. The images were named 'Image A' through 'Image O' in order of decreasing areal porosity (ranging from 0.922 to 0.290).

Description and calculations of structural parameters
The binary biofilm images were treated as consisting of biofilm occupied voxels containing biomass or EPS surrounded by void interstitial spaces. Occupied regions are divided into colonies, where a colony is an isolated biofilm area surrounded by void areas that completely separate it from other colonies ( Figure 2). In the literature, colonies are often referred to as cell clusters; however, this should not be confused with data clustering, which is discussed below.
The following commonly used areal parameters were used in the authors' earlier image reconstruction study (Renslow et al. 2011): areal porosity (AP), perimeter (P), average diffusion distance (ADD), maximum diffusion distance (MDD), average horizontal run length (AHRL), average vertical run length (AVRL), and fractal dimension (FD). Four of these structural parameters (AP, P, ADD and MDD) were also used in the present study. As discussed below, AHRL and AVRL were replaced with comparable quantities based on the major and minor lengths of the colonies, which are believed to be more suitable and biologically relevant. The FD parameter was omitted for three reasons: (1) the value added by it to the reconstruction process was limited (Renslow et al. 2011): including FD as a reconstruction parameter introduced unrealistic artifacts, which were a few pixels in size, into the images; (2) its inclusion significantly slowed down the reconstruction process due to its long computation time; and (3) FD quantifies the degree of irregularity in the perimeter of colonies (Beyenal et al. 2004). Newly added local parameters can incorporate such irregularities into the reconstruction process.

Areal parameters
The definition and computation of the areal parameters have been described earlier (Lewandowski & Beyenal 2007;Renslow et al. 2011). Briefly: Areal porosity (AP) When the pixels of the image are of the same size, for binary black and white images, AP is simply the ratio of the void pixels to the total number of pixels in the image, and it is a unitless parameter. As in the authors' earlier study, AP is fixed during optimization based image reconstructions from the onset as equal to the experimental AP value. In other words, the steps of the optimization algorithm do not alter the AP value while the image is optimized against the target structures. A full description of the reconstruction process is given below.

Perimeter (P)
This is the total number of pixels forming the edges of colonies, ie the pixels that define the boundary layer between the interstitial (ie void 0) and cell colony filled (1) regions. It has the unit of pixel size l p . Optimization of the reconstructed images proceeds by minimizing the disparity, ie the difference corresponding to an error, between the reconstructed and experimental images. Error associated with each parameter was defined as the relative fractional error, ie ε X = |X R -X E |/X E , where X R and X E respectively are the computed during reconstruction and experimental values of property X. The error due to the perimeter difference between the reconstructed and experimental biofilm structures was then ε P = |P R -P E |/P E .
Average diffusion distance (ADD) and maximum diffusion distance (MDD) ADD and MDD are the average and maximum values of the diffusion distances of the cell colony pixels, respectively. Diffusion distance (DD) is defined as the minimum distance from a colony pixel to the nearest void pixel in an image. A larger DD indicates a larger distance that a substrate has to diffuse in the cell colony. The ADD is computed as the average DD over all colony pixels, and MDD is the maximum of the DD values of the pixels. The unit of these distances is the pixel size l p . For the associated error, because ADD and MDD provide an estimate of related properties, their error were combined as ε DD 2 = (ε ADD 2 + ε MDD 2 )/2.

Average horizontal run length (AHRL) and average vertical run length (AVRL)
AHRL and AVRL are the average run lengths in the horizontal and vertical directions, respectively. Run length is the number of consecutive colony pixels in the image. These run lengths provide the information about average continuity of colonies along the two orthogonal Cartesian directions. They are important when the direction of cell cluster growth is studied; for example, if biofilms are grown under high shear stress applied in vertical direction, AVRL will increase compared to biofilms grown at lower shear stress. Since the goal of this work was the reconstruction of the images rather than quantification of parameter dependence on the environmental conditions, AHRL and AVRL were replaced with the major and minor axis lengths of the colonies (MJL and MNL, respectively, see below) because MJL and MNL provide similar continuity and orientation information for colonies but, unlike AHRL and AVRL, their definition is independent of the image orientation and eliminates the need to know the flow direction during biofilm growth. MJL and MNL could potentially be more successful in image reconstruction when images are rotated arbitrarily. The above set of areal parameters was shown to have limited ability to quantify the differences between experimental and reconstructed images (Renslow et al. 2011). Therefore, in this second-generation version of the biofilm image reconstruction algorithm, additional parameters were included for a more thorough structural description of biofilms. Newly added parameters make use of colony-based properties ( Figure 2). In other words, a biofilm image is first analyzed to identify its colonies whose definition is given above. This step of the algorithm uses the bwconncomp routine of MATLAB (MathWorks, Natick, MA, USA), which identifies the colonies by finding the connected pixels of the binary images. For example, Figure 2 shows an image that has two colonies where one colony has an elongated elliptical shape and the second has a more rounded shape and the orientations of the two colonies are almost perpendicular to each other. Various properties of the identified colonies are then computed using the regionprops MATLAB routine. These intra-colony properties are supplemented with inter-colony properties to determine the new parameters, as follows.

Axes lengths (MJL and MNL) of colonies
The regionprops routine computes the normalized second central moments of each colony, and reports the size and orientation of the equivalent ellipse that has the same moments (for example the red ellipses in Figure 2). Note that colony axes length determination is not affected by the image orientation, ie by the definitions of Cartesian coordinate directions. Therefore, unlike AHRL and AVRL, MJL and MNL provide measures for colonies independently of the image orientation, as noted above.
The computed axes lengths of the colonies are then statistically analyzed to find the average (MJLave and MNLave), maximum (MJLmax, MNLmax), and median (MJLmed, MNLmed) major and minor axis lengths of the colonies. Since these six parameters are interrelated, as with the diffusion distance parameter, the various colony size information is included in the optimization process using weighted errors, ie ε ML 2 = 1/6 × Σ i ε i 2 , where the sum is over the parameters MJLave, MNLave, MJLmax, MNLmax, MJLmed, and MNLmed.
The new parameters introduced in this study were computed using the in-house coded MATLAB functions, as follows.

Parameters associated with relative placement of colonies
How the colonies of a biofilm are placed with respect to each other is an important property of the biofilms. To include relative location information in the structure reconstruction, the centroids of the colonies (eg shown with small circles in Figures 2 and 3) are identified first. The centroid is the center of mass coordinates of the colony and it was computed using the regionprops MATLAB routine. The regionprops routine detects the colonies (ie separated regions) in an image and returns a large array of information about colonies, including their centroid position. Using the computed centroid locations of the colonies, the distance vector R ij is defined as the distance between colonies i and j (and j ≠ i) (eg the brown line in Figure 3). Then for each colony i, the nearest colony distance R i,min , median of all colony distances R i,med , and the size-weighted (ie mass-weighted) colony distance R i,wd over the other colonies j of the image are determined as: R i,min = minimum of the elements of R ij; R i,med = median of all R ij and R i,wd = mean of m j × R ij where m j is the size of colony j, ie R i,wd is the mean of the colony size weighted colony-to-colony distances. Using these quantities for every colony i, the following structural parameters are defined: Average nearest colony separation distance (ARSmin): this parameter is the mean of R i,min over all the colonies; Average median colony separation distance (ARSmed): this parameter is the mean of R i,med over all the colonies; Average weighted colony separation distance (ARSwgh): this parameter is the mean of R i,wd over all the colonies.
The associated colony separation distance based error is then computed as ε RS 2 = 1/3 × Σ i ε i 2 where the sum is over the parameters ARSmin, ARSmed, and ARSwgh.

Parameters associated with relative orientation of colonies
Fluid flow in the spaces adjoining biofilms gives the biofilms distinctive characteristics by modifying the growth and relative orientation of the colonies. To account for relative orientation effects, eccentricity (ie elongation of colonies), distribution of the colony orientation with respect to a fixed axis (ie the overall major orientation of the structure), and the orientational correlation of neighboring colonies (ie local information about relative colony orientations) are included in the reconstruction algorithm, as follows.
Average colony size weighted eccentricity (WEave) The eccentricities E i of the colonies were computed using the regionprops MATLAB routine. In addition to the above-mentioned centroid location, the regionprops routine computes a vast array of information about each colony, such as the orientation of the colonies (ie placement of its principal axis based on the mass moment distribution of the colony). Eccentricity is the ratio of the distance between the foci to the major axis length of the equivalent ellipse with the same second-moment. For an ellipse with major axis length a and minor axis length b, eccentricity can be expressed as (1 -(b/a) 2 ) 1/2 . Eccentricity provides a measure of the internal elongation of the colonies and it ranges from 0 for circles to 1 for line segments. To avoid the small roundish clusters dominating the optimization, the average of the colony size weighted eccentricities m j × E i were utilized to characterize the eccentricity associated error.
Average size weighted relative colony orientations (WROave and W2ROave) Another measure of colony placement is the distribution of their orientations. The orientation O i of each colony was computed using the regionprops MATLAB routine. This is the angle between the major axis of the ellipse and a reference axis (indicated by the yellow lines in Figure 3). The reference axis was chosen as the x-axis in the reported simulations. The relative colony orientations were included in the optimization using two different parameters: size weighted relative orientation measure, The smaller (< 90°) of the two angles formed by the crossing orientation vectors of colonies was selected as the relative angle (O i -O j ) in the analysis. Note that, since the error criteria include the difference in the angles only, positioning of the reference axis is irrelevant in the analysis. The optimization error is characterized using the parameters WOave and W2Oave, which are average values of the size and double size weighted relative orientation measures over all the colonies, respectively.
The associated relative colony orientation based error is then computed as ε ORI 2 = 1/3 × Σi ε i 2 , where the sum is over the parameters WEave, WROave, and W2ROave. Figure 3. This schematic diagram illustrates how relative placement and orientation of colonies are defined. Colony orientations are defined by the angle between the major principal axis of the ellipse and a reference axis, which was chosen as the horizontal axis in this study. Inter-colony distance is defined as the distance between the centroids of the ellipses (indicated with brown circles).

Co-occurrence matrix based description of local biomass distributions
These new parameters provide information about the local biofilm structure. Analogous to the pair radial distribution function description of condensed matter that define the probability of observing two objects at a particular distance from each other (Friedman 1985;Raineri et al. 1992;Resat et al. 1995), the co-occurrence matrix description of images specifies the pairwise intensity relationship between a pixel of interest (central pixel) and a pixel at a [m n] distance from it. This local structural information can be a valuable predictor of the biofilm reconstruction process. MATLAB's graycomatrix function computes the co-occurrence matrices for the image by treating each pixel at a time as the central pixel and reports a count of pairwise intensity distributions at a given distance (Figure 4).
Since the analysis used binary gray scale, the co-occurrence matrix is a two-by-two matrix with four entries [ , are defined in a similar manner. It should be noted that co-occurrence matrices computed using the graycomatrix MATLAB routine were earlier used in a different context (Beyenal et al. 2004), to obtain a textual parameter related to gray-level changes between adjacent pixels.
Because single pixel based co-occurrence statistics can be noisy and may have considerable fluctuations during the reconstruction process, instead of a single-pixel based approach, the authors preferred to pursue the co-occurrence computation on a radial region basis (Friedman 1985;Raineri et al. 1992;Resat et al. 1995). This was done by defining concentric circles around a central pixel and computing the co-occurrence statistics by counting the pixel states over the pixels that belong to five 'circular' regions ( Figure 5). These regions were defined as: (a) 1 ≤ m 2 + n 2 ≥ 2; (b) 4 ≤ m 2 + n 2 ≥ 5; (c) 8 ≤ m 2 + n 2 ≥ 13; (d) 16 ≤ m 2 + n 2 ≥ 20; and (e) 25 ≤ m 2 + n 2 ≥ 29. Thus, this analysis returns five two-by-two G COO matrices, one each for every circular region. To compute the associated error, first, the error matrices with elements ε ij = |G COO,R,ij -G COO,E,ij |/G COO, E,ij are constructed for each of the five matrices. The overall error is then computed as ε COO 2 = sum of ε ij 2 over the matrix elements of the five error matrices.

Global parameters
The AP, P, ADD, MDD, MJL and MNL parameters are collectively named as the set of global structure parameters because they provide an overall characterization of the biofilms.

Local parameters
The parameters that provide information about the local properties and information about the relative positioning of the colonies (ie local structures), ARSmin, ARSmed, ARSwgh, WEave, WROave, W2ROave, and the COO matrices, are collectively named as the local parameters.   be used in the optimization error score. Then: (1) a random image that has the same AP as the target is constructed;

Biofilm image reconstruction
(2) parameters are calculated for the reconstructed image; and (3) total error, ε T , is calculated using: where the three error terms are: These errors have been discussed above.
(4) One of the four possible moves for optimization is then selected: (a) one filled pixel and one void pixel are randomly selected and swapped; (b) N filled pixels and N void pixels are randomly selected and swapped, where the number of flipped pixels is randomly selected from a uniform distribution in the [0:M] range where M = 2.5/(AP × (1 -AP)); (c) N filled pixels and N void pixels that are on the perimeter are randomly selected and swapped, where the number of flipped pixels is randomly selected from a uniform distribution in the [0:5] range; (d) a colony is randomly selected and moved by one pixel in a chosen direction, where the direction of the move is randomly picked from four possible ± x and ± y directions.
(5) The properties of the resulting image are computed. (6) The total error, ε T , is calculated using Equation (1). (7) If the total error increases after a change, the change is rejected. If it decreases, the reconfigured image is accepted. The method is repeated from step (4) unless the number of simulation steps exceeds the simulation run length.
(8) Optimization simulations are stopped after 750,000 steps. The error during the optimization decreased steeply at the beginning of the simulations (typically for about 150,000 steps). The error then hardly changed afterwards, and the acceptance rates of the attempted changes were extremely low. For example, in the simulation for Image K, ε T dropped from 10.14 to 0.8828 after 100,000 steps and to 0.8274 after 250,000 steps. The total error was 0.8227 after 750,000 steps, ie a decrease of only 0.6% during the last 500,000 steps of the simulation. The trend in the error decrease during the simulation was similar during the reconstruction of the other images. Therefore, the chosen optimization run lengths were very conservative and converged solutions were obtained.
For every image in the library of 15 biofilm images studied (Renslow et al. 2011), optimization simulations were run three times starting with random images. The evaluation analysis described in the Results section used the image with the best optimization score among the three reconstructed images for every experimental target image.
It should be noted that the methodology presented here omits the material details of the biofilm, such as the To avoid noisy statistics, co-occurrence analysis was performed by grouping the pixels to define radial regions as shown in the figure. Numbers in the pixels report their r 2 = m 2 + n 2 from the central pixel. Five radial regions were defined as: (A) 1 ≤ r 2 ≤ 2 corresponding to an 8-pixel size region; (B) 4 ≤ r 2 ≤ 5 corresponding to a 12-pixel size region; (C) 8 ≤ r 2 ≤ 13 corresponding to a 24-pixel size region; (D) 16 ≤ r 2 ≤ 20 corresponding to a 24-pixel size region; and (E) 25 ≤ r 2 ≤ 29 corresponding to a 28-pixel size region. For the image in Figure 4, consider region C as an example: if the central pixel contains biomass (ie is white) then two of the pixels in region C ( The overall radial co-occurrence matrices are computed by assuming that each pixel of the image is the 'central pixel' one at a time, finding the matrix entries for that instance, and then adding up the data obtained for each pixel to obtain the statistical distribution for the whole image. Figure 6. Algorithm for reconstructing biofilm images using a selected set of image parameters. Biofilm structure quantification steps are shown outside the large box and image reconstruction steps are shown inside the large box, starting at step 1. ratio of the cell biomass to the EPS content of the biofilm. In its current version, the authors' algorithm can only handle binary representation of biofilms, ie whether or not a biofilm exists, but not their material content. Material content information is starting to become available and can in principle be incorporated into the extended version of the algorithm. But this will require modification of the reconstruction algorithm from binary to multi-objective form to handle the biofilm contents separately, ie to reconstruct the images using both the cellular biomass and EPS information as multiple object types in the images.

Assessment of reconstructed biofilm images
Because the same set of images (labeled Images A to O; Figure 7) were used as in a previous work (Renslow et al. 2011), the success of the expanded set of parameters could be demonstrated more effectively. To assess the quality of the reconstructions, the images were subjected to evaluation by a group of senior undergraduate students who had previously been exposed to biofilm topics in their studies. Thirty-eight students participated in the evaluation survey. Two sheets of images were prepared ( Supplementary Figures S1 and S2) [Supplementary information is available via a multimedia link on the article online webpage]. The first page contained the 15 experimental images as original targets ( Figure S1). The second page contained 30 reconstructed images, numbered from 1 to 30 ( Figure S2). Half of these reconstructed images were obtained and communicated in the authors' earlier study (Renslow et al. 2011), and the remaining 15 were constructed using the algorithm reported here (Figure 7). The earlier study used only the global parameters for reconstruction, and these images are labeled 'G-PS', while this study used a combined set of global and local parameters, and these are labeled 'GL-PS'. These images were placed in random order on Figure 7. Left column: experimental images used as target in the image reconstruction analysis. The other two columns report the reconstructed images that were obtained using the GL-PS (center column) and the G-PS (right column) parameters. the page that was distributed to the evaluators ( Figure S2). The students were tasked to pick the best matching reconstructed image for every experimental biofilm image. Since, in principle, there were two matches for every target image (one each for the previous and this study), students were allowed to pick more than one match. If more than one match was picked, they were asked to provide goodness-of-fit scores of the matched images for a total score of 100% among their selections. The students were instructed that picking one match for each target image is desirable but they could choose two or more matches if necessary. They were told to limit their selections to not more than three matches. However, some of them selected more and these selections were still included in the analysis.

Clustering-based analysis of the correlations in image selections
Some of the images in the selection panel are very similar to each other and can easily be mistaken for each other. To further test if the evaluations were sensible, the relationships between biofilm image selections by the evaluators was investigated by using the evaluations of the reconstructed images in a cluster analysis based on the correlations in the image selection patterns among evaluators. The cluster analysis grouped the 15 experimental images based on their selection patterns among the 30 reconstructed images. The analysis was performed using MATLAB's hierarchical clustering routine, where the average distance criterion was used to assign the clusters and their connection values. For each image (A to O), a selection vector with 30 entries (reconstructed images) was constructed based on the student matches. In the cluster analysis, 1−CORR X,Y values were used as the distance measure, where CORR X,Y is the correlation between selection patterns of images X and Y. For example, if images X and Y are selected mostly from the same group of reconstructed images, ie if their selections were mistaken for each other, their cluster analysis distance would be small. If they are picked from different groups, correlation would be low and the distance would be larger depending on the dissimilarity of the selections.

Comparing experimental and reconstructed images
Collected match scores were analyzed in several ways. First, how individual students evaluated the reconstructed images was analyzed by assuming that both reconstructions, ie the images reconstructed using only the earlier global parameter set (G-PS) and with the expanded set of global and local parameters (GL-PS), were correct selections. For 15 target images, the correct selections of 38 students ranged from 26.7% to 94.7% with an average of 67.0 ± 16.8%. In other words, roughly in two out of three cases, students correctly identified the image pairs. This quite high ratio indicates that the human evaluation process by the students was satisfactory and reasonable.

Identification scores of the biofilm images
Second, the evaluation of each individual image was analyzed to compute their correct identification percentages. Table 1 reports the results sorted based on correct identification scores. The results in Table 1 indicate the percentage of times an image (A to O) was correctly identified from the set of reconstructed pictures (30 total reconstructed images with two possible correct selections, one each for reconstructions with the G-PS and GL-PS parameter sets). In other words, it tabulates how good the experimental images were matched to the reconstructions. Interestingly, there is no obvious trend in terms of which types of biofilms are better identified when reconstructed. For example, images with the best match scores had either large or low AP values and their textures were both fuzzy and sharp. Table 1 also tabulates how the correctly identified biofilm images were partitioned between the reconstructions with the G-PS and GL-PS parameter sets. The image reconstructed with GL-PS was selected over the corresponding reconstruction with G-PS in 10 out of the 15 images, and the results for image A were comparable. The images reconstructed with G-PS were identified at a higher rate for only four (D, G, I, and J) cases. Interestingly, reconstructed images for D, G, I, and J using the G-PS parameter set were found to match the experimental images best in the authors' previous study, where a different type of evaluation, a rank ordering survey, was used for scoring (Renslow et al. 2011). The only obvious common characteristic of these biofilm structures is that they contained rounded and isolated colonies that were rather uniformly distributed (Figure 7).

Identification accuracy of the reconstructed biofilm images
The results in Table 1 report the goodness of the match of the experimental images to their reconstructions. Another way of analyzing the evaluation data is to compute the statistics of how each reconstruction scored (Table 2). For each of the 30 reconstructed images, Table 2 reports the statistics about whether the selection was correct or incorrect when the reconstructed image was selected as a match to one of the experimental (ie target) images. Note that the reconstructed images may not be picked during the evaluations. Therefore, the data tabulated in Table 2 provide information about the goodness of selection for a particular reconstruction, ie the accuracy of its selection when picked as a match. This analysis indicated that the correct identification rate of the reconstructed images increased from 50% to 72% when the additional local parameters were used in the reconstruction. This is a substantial increase in the identification rate.
Most notably, and not surprisingly, addition of the local parameters into the reconstruction process increased the identification accuracy of the biofilm structures with internal orientational features (K, E, H, C, and F), and for structures in which colony sizes and spatial locations were varying (O, N, B, and L). For these structures, the selection accuracy of the GL-PS reconstructions For an image X, this column reports the ratio (X corr,GLPS + X corr,GPS )/X all , where X corr,GLPS and X corr,GPS are the number of times the reconstructed image for X generated using the previous (X corr,GPS ) or expanded (X corr,GLPS ) parameter sets was correctly identified by the student evaluators. X all is the total number of times X was selected (correctly or incorrectly). b For an image X, these columns report the percentage of cases that the image correctly identifying X was reconstructed using the expanded (GL-PS), X corr,GLPS /(X corr,GLPS + X corr,GPS ), or only the global (G-PS), X corr,GPS /(X corr,GLPS + X corr,GPS ), parameter set, respectively. c Average over the 15 images. When reconstructed images were selected as a match for image X, these columns report if the match correctly (N X,corr ) or incorrectly (N X,incorr ) identified the matching image. GL-PS and G-PS columns report the evaluations of the reconstructions with the combined (global + local) parameters and only the global parameters, respectively. Note that selection counts are not integers because the human evaluators were allowed to make multiple selections during matching, which resulted in fractional counts. b These columns report the ratio N X,corr /(N X,corr + N X,incorr ) as percentage.
c This column reports the fractional improvement of the GL-PS case over the G-PS case, ie correct ratio(GL-PS)correct ratio(G-PS). d Arithmetic averages over the 15 images.
increased by > 17% over the G-PS reconstructions, and the improvement was > 33% for the majority of cases (Table 2). A small increase/decrease for the image M and A cases was due to their already very high identification rates.
To investigate which of the added local variables could be causing the expanded parameter set to perform worse for some of the images (that is, images D, G, I, and J), the error terms associated with the structural image parameters were examined (Table 3). Interestingly, there is no obvious trend in the error terms that points to why the reconstructions with the expanded parameter set reduced the identification accuracy for those images.

Correlations in image selection patterns
Clustering-based analysis was used to investigate the relationships between the biofilm image selections by the evaluators. Using the image evaluation data, the image selections were clustered to group the 15 experimental images based on their selection patterns among the 30 reconstructed images (cf Methods section). This analysis indicated that four pairs of images were often selected for each other (Figure 8): images C and F; I and L; G and J; and, H and E. All these pairs visually resemble each other (Figure 7), and therefore this finding shows that the evaluation procedure was realistic. Removing the selections where image G was selected as image J improved the accurate selection rate of its reconstruction with the GL-PS parameters from 49.7% (Table 2) to 68.1%. Similarly, removing the entries for image L increased the accurate selection rate of image I from 41.9% to 61.9%. This observation partially explains the poorer performance of the reconstruction with the expanded parameter set for these images.

Conclusions
One critical aspect of biofilm studies is the identification of key structural features that allow for establishing similarity measures between biofilms grown under varied growth conditions. An ability to quantitatively determine the key structural properties would enable researchers to determine which of the observed structural differences lead to functional consequences. This study presents the second-generation version of the authors' biofilm image reconstruction algorithm and introduces new parameters for a more detailed description of the local structural properties of biofilms. The new parameters allow for the biofilm structure analysis to be pursued at the local level using colony-based properties. The experimentally obtained biofilm images were reconstructed using these new parameters and previously introduced parameters, and evaluated the reconstructed images. Error terms are: ε G, error for global variables; ε R , error for relative positioning of colonies; ε L , error for local distribution; and the total error ε T = ε G + ε R + ε L . Further details on the terms can be found in the Methods/Biofilm image reconstruction section. Figure 8. Cluster analysis of how the images were matched during image evaluations. The figure shows how the matching of the biofilm images to the reconstructed images correlates between the evaluator selections. The horizontal axis shows the (1 − CORR) 'distance' where CORR is the correlation between the selection patterns of a pair of images. If two images were matched for the same group of reconstructed images, their CORR would be high, so the distance would be low and this would indicate that the images are visually similar and can be confused for each other.
Based on the results obtained it is concluded that: (1) Supplementing the previous structural parameters with additional parameters considerably increased the reliability of the image reconstruction process. (2) Assessment of the reconstructed images using human evaluators indicated that the correct identification rate of the reconstructed images increased from 50% to 72% with the introduction of the new parameters into the reconstruction procedure. This is a very substantial increase in the identification rate. (3) An expanded set of parameters especially improved the identification of biofilm structures with internal orientational features and of structures in which colony sizes and spatial locations were varying where the improvement in the identification rate was > 33% for the majority of such cases.
Hence, the newly introduced structure parameter set would allow biofilm researchers to better classify biofilms to determine which properties are critical and relate to the underlying biofilm processes by incorporating finer local structure details into the reconstruction process.
In principle, the degrees of freedom in the characterization of physical systems can have a major impact on the quality of the description of a system. Therefore, one unknown aspect of the results presented is whether the observed 22% increase in the correct identification rate of the reconstructed images is due to the improved ability of the extended structural parameter set or is simply due to the increase in the number of parameters. Information theory based formalisms, such as the Akaike information criterion (AIC) (Akaike 1974) can be useful for objective model comparison (Shankaran et al. 2012). AIC accounts for differences in the number of parameters during examination of whether an increase in the number of variable parameters results in a sufficient increase in prediction accuracy to warrant selection of the complex model (Shankaran et al. 2012). Such evaluations, however, were not pursued here because the authors wanted to compare the selection of the reconstructed images with their older set for which they had a set of evaluation data at hand already. Since the parameter sets used in this and the earlier studies overlapped only partially, the information theory based criterion would not be appropriate to quantify the value added by the new parameters. Therefore, parameter evaluation studies will be the subject of a future publication where the contributions of all the incorporated parameters will be quantified to eliminate those that make limited information contributions and to add new parameters that provide more useful local structural information on biofilms. However, the presented results showing that the newly introduced local structure parameters in particular improved the identification accuracy of the biofilms with obvious local structural features (such as irregularly shaped clusters, and relative orientations) clearly point to their usefulness in describing biofilm structures.