A Very Large Cardiac Channel Data Database (VLCD) Used to Evaluate Global Image Coherence (GIC) as an In Vivo Image Quality Metric

Ultrasound image quality is of utmost importance for a clinician to reach a correct diagnosis. Conventionally, image quality is evaluated using metrics to determine the contrast and resolution. These metrics require localization of specific regions and targets in the image such as a region of interest (ROI), a background region, and/or a point scatterer. Such objects can all be difficult to identify in in-vivo images, especially for automatic evaluation of image quality in large amounts of data. Using a matrix array probe, we have recorded a Very Large cardiac Channel data Database (VLCD) to evaluate coherence as an in vivo image quality metric. The VLCD consists of 33280 individual image frames from 538 recordings of 106 patients. We also introduce a global image coherence (GIC), an in vivo image quality metric that does not require any identified ROI since it is defined as an average coherence value calculated from all the data pixels used to form the image, below a preselected range. The GIC is shown to be a quantitative metric for in vivo image quality when applied to the VLCD. We demonstrate, on a subset of the dataset, that the GIC correlates well with the conventional metrics contrast ratio (CR) and the generalized contrast-to-noise ratio (gCNR) with <inline-formula> <tex-math notation="LaTeX">${R}$ </tex-math></inline-formula> = 0.74 (<inline-formula> <tex-math notation="LaTeX">${p} < 0.005$ </tex-math></inline-formula>) and <inline-formula> <tex-math notation="LaTeX">${R}$ </tex-math></inline-formula> = 0.62 (<inline-formula> <tex-math notation="LaTeX">${p} < 0.005$ </tex-math></inline-formula>), respectively. There exist multiple methods to estimate the coherence of the received signal across the ultrasound array. We further show that all coherence measures investigated in this study are highly correlated (<inline-formula> <tex-math notation="LaTeX">${R} >$ </tex-math></inline-formula> 0.9 and <inline-formula> <tex-math notation="LaTeX">${p} < 0.001$ </tex-math></inline-formula>) when applied to the VLCD. Thus, even though there are differences in the implementation of coherence measures, all quantify the similarity of the signal across the array and can be averaged into a GIC to evaluate image quality automatically and quantitatively.

Abstract-Ultrasound image quality is of utmost importance for a clinician to reach a correct diagnosis.Conventionally, image quality is evaluated using metrics to determine the contrast and resolution.These metrics require localization of specific regions and targets in the image such as a region of interest (ROI), a background region, and/or a point scatterer.Such objects can all be difficult to identify in in-vivo images, especially for automatic evaluation of image quality in large amounts of data.Using a matrix array probe, we have recorded a Very Large cardiac Channel data Database (VLCD) to evaluate coherence as an in vivo image quality metric.The VLCD consists of 33 280 individual image frames from 538 recordings of 106 patients.We also introduce a global image coherence (GIC), an in vivo image quality metric that does not require any identified ROI since it is defined as an average coherence value calculated from all the data pixels used to form the image, below a preselected range.The GIC is shown to be a quantitative metric for in vivo image quality when applied to the VLCD.We demonstrate, on a subset of the dataset, that the GIC correlates well with the conventional metrics contrast ratio (CR) and the generalized contrast-to-noise ratio (gCNR) with R = 0.74 (p < 0.005) and R = 0.62 (p < 0.005), respectively.There exist multiple methods to estimate the coherence of the received signal across the ultrasound array.We further show that all coherence measures investigated in this study are highly correlated (R > 0.9 and p < 0.001) when applied to the VLCD.Thus, even though there are differences in the implementation of coherence measures, all quantify the similarity of the signal across the array and can be

I. INTRODUCTION
V ARYING image quality in ultrasound cardiac imag- ing results in an undesirable lottery for the clinician performing the scan.Image quality varies from patient to patient, from view to view, and between clinicians.Slight adaptations to the probe's position in relation to the ribs can even change image quality from frame to frame.Despite its variability, image quality is of utmost importance for clinicians to reach the correct diagnosis [1].In echocardiography, image quality is degraded from the common sources of noise in ultrasound imaging; phase aberration, reverberation clutter, • The GIC is a convenient image quality metric, elaborating on the results in the literature that coherence is a crucial tool to evaluate image quality in ultrasound imaging.
off-axis scattering, and thermal noise.A quantitative metric of ultrasound image quality could assist the clinician performing the scan and allow an automatic selection of the recording with the highest image quality, possibly improving diagnostics.Another important aspect is that such a metric may give an indication on how reliable the measurements made in a certain image are.This is useful both while recording and measuring and when reviewing a list of measurements made on a recording.Finally, a quantitative metric can be used to optimize imaging parameters and beamforming methods.
The flexibility of software beamforming has introduced a myriad of adaptive beamforming methods presented in the literature [2], [3], [4], [5], [6], [7].Adaptive beamforming methods aim at improving image quality by adapting the processing based on the received signals.A second aspect of software beamforming is the ability to easily collect raw channel data, allowing processing and further analysis of data offline.This results in unprecedented flexibility for researchers to prototype new beamforming methods on relevant clinical in vivo data.We have recorded in vivo cardiac channel data on a GE Vingmed Ultrasound Vivid E95 ultrasound system using the 4Vc-D matrix array probe (GE Vingmed Ultrasound AS, Horten, Norway) in a Very Large cardiac Channel data Database (VLCD) consisting of 33 280 individual image frames from 538 recordings of 106 patients.
One category of adaptive beamformers is often denoted coherence beamformers since they in various ways utilize the coherence, or similarity, of the received signals.There are several ways to measure the coherence, presented in the literature.We will briefly review some of the most used coherence measures in Section II.Some of these calculate spatial coherence as the similarity of the delayed signals across array elements as a function of element separation or lag.One of these spatial coherence measures, the lag-one coherence (LOC), was suggested as a metric for ultrasound image quality in [8].Long et al. [8] argued that LOC and coherence in general are sensitive to all the major forms of ultrasonic noise.Coherence is reduced by focusing errors, phase aberrations, and off-axis scattering since the delayed signals across the aperture in these cases are decorrelated [9], [10].Thermal noise lowers the coherence since uncorrelated noise across the delayed array element signals introduces a delta function in the spatial coherence that scales with amplitude based on the relative noise power [11].In addition, reverberation clutter has been shown to have a similar effect on the coherence as thermal noise [12].
Conventionally, image quality of ultrasound images is evaluated by measuring contrast and resolution.The most widely used contrast metrics are the contrast ratio (CR) [13], contrastto-noise ratio (CNR) [14], and more recently the generalized CNR (gCNR) [15].Even though both CR and CNR are shown to correlate strongly with assessments by human observers, they can both be manipulated by alterations to the dynamic range of images [16].While the gCNR is immune to dynamic range alterations [15], all three of these contrast metrics have a major drawback for in vivo usability, namely, the need to identify two regions; a region of interest (ROI) usually tissue, against a background typically consisting of noise.Such regions can be hard to identify in in vivo images and often require manual interaction.The LOC can be used to measure in vivo image quality since it is a single ROI measurement.However, as described in [8], an ROI needs to be identified.
We will further elaborate on the results from [8] and introduce and demonstrate a metric we denote global image coherence (GIC).GIC does not require any identified or segmented ROI since it is defined as an average coherence value calculated from all the data pixels used to form the image, below a preselected range.We will also study the similarity between various available coherence measures.More formally, the vast amount of recorded channel data in the VLCD allows us to empirically test the following hypotheses.
1) Published coherence measures are strongly correlated.
2) GIC can be used as a quantitative metric for in vivo image quality.This current article also builds upon our recent study [17] where we used a variant of the GIC to estimate image quality improvements resulting from an aberration correction algorithm.This was a clinical study where four clinicians evaluated cardiac cine loops with and without aberration correction in a blinded and left-right randomized side-by-side setup.However, in that study, GIC was used as a metric on the same channel data, with aberration correction processing as the only difference between the images.In this article, we aim to use GIC as a general evaluation of image quality comparing images from different recordings, different patients, and different views.
This article is organized as follows.Section II briefly reviews some of the most used coherence measures introduced in the literature, which we further investigate and compare.Section III describes the VLCD, and the details of the implemented beamforming, automated cardiac view classification, and statistical analysis used in this article.Here, we also define Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the GIC mathematically.Section IV presents the results of the various coherence measures applied to the entire database, as well as using GIC to evaluate the image quality.The results are discussed in Section V with concluding remarks in Section VI.

II. BACKGROUND AND THEORY
Let us assume a 2-D matrix array of M elements in the azimuth direction (along the x-axis) and N elements in the elevation direction (along the y-axis) in a Cartesian coordinate system.In order to acquire 2-D images, in the x z plane, we transmit focused beams steered in different directions in this plane.A received channel signal on element m in azimuth and n in elevation, with the appropriate propagation time delay applied for a specific pixel (x, z), is here defined as s mn (x, z) ≡ s mn . ( Equations ( 2), ( 4), ( 8), ( 10), ( 12), (15), in Sections II-A-II-F, describe the pixel value in the image b for a given beamforming method.A 2-D image is acquired specifying the pixels in the 2-D x z plane; however, it can be extended to calculate values for a full 3-D volume in x yz.
In this article, we focus on 2-D sector scan images acquired by a cardiac 2-D matrix array transducer.Since these are sector scan images we define the pixels in polar coordinates using an angle θ and a range depth r .

A. Delay-and-Sum
The conventional delay-and-sum (DAS) implementation is the coherent combination of the signals received by all elements, yielding w mn s mn (2) where w mn is the receive apodization for element mn, a static term often determined from the F-number and pixel depth r .

B. Coherence Factor
The coherence factor (CF) was first introduced by Mallart and Fink [9], as the ratio between the coherent and incoherent energy across the aperture (3) The CF has been used as an adaptive weight to increase image quality [2] as

C. Phase Coherence Factor
The phase coherence factor (PCF) was introduced by Camacho et al. [3] as where γ is a parameter to adjust the sensitivity of PCF to outof-focus signals, σ 0 = π/ √ 3 is the nominal standard deviation of a uniform distribution between −π and π, and p is given by where φ φ φ = [φ 1,1 φ 1,2 , . . ., φ M,N ] is the instantaneous phase across the 2-D aperture and σ (φ φ φ) is its standard deviation.
We use γ = 1 in this study.To avoid phase wrapping discontinuity, a set of auxiliary phases The beamformed image is computed using PCF as an adaptive weight

D. Circular Coherence Factor
Camacho and Fritsch [18] published a slight modification of the PCF, namely, the circular coherence factor (CCF).The CCF is defined as the square root of the variance of the instantaneous phase across the aperture φ φ φ The CCF is described to "[. . .] fall off faster than PCF, representing a stricter focusing quality measurement" [18].
The CCF can also be used as an adaptive weight to the beamformed image so that

E. Short Lag Spatial Coherence
The short lag spatial coherence (SLSC) algorithm was introduced by Lediju et al. [4].Even though SLSC can be calculated for a 2-D matrix array [19], we will simplify our implementation and collapse the elevation dimension of the array by summing the N elements in elevation after the propagation delay has been applied.The spatial correlation for a 1-D array can be calculated as where s is the delayed signal, r is the depth sample index, and l is the distance, or lag, in number of elements between two points on the aperture.The sum over r results in a correlation over a given kernel size, r 2 − r 1 of pixels.The SLSC is calculated as the sum over the first Q lags Thus, notice that b SLSC is an image of the coherence and not the backscattered signal amplitude as with DAS.The SLSC is a visualization of the spatial coherence of backscattered ultrasound waves, building upon the theoretical prediction of the van Cittert-Zernike (VCZ) theorem.The implications of the VCZ theorem for pulse-echo ultrasonic imaging are discussed in [9] and [20].In this study, we used Q = 9 and a kernel size of λ for SLSC.

F. Lag-One Coherence
Assuming the simplified implementation where we collapse the elevation dimension of the 2-D matrix array, the LOC [8] can be calculated using the same expression for the spatial coherence as SLSC in (11), but only evaluated at lag l = 1.

G. Generalized Coherence Factor
Again, assuming the simplified implementation where we collapse the elevation dimension of the 2-D matrix array, the generalized coherence factor (GCF) is defined as [ where S is the M-point Fourier spectra over the aperture of the delayed channel data where n ∈ [−(M/2), (M/2)−1] is the spatial frequency index in which M is assumed to be even, d is the pitch of the array, and M 0 is an arbitrary constant within [0, (M/2)−1] that specifies the low spatial frequency region, thus going from −M 0 to M 0 .Note that if M 0 = 0, the GCF simplifies to the CF.In this study, we used M 0 = 4.We collapsed the elevation dimension of the 2-D matrix array to avoid a 2-D Fourier spectra of the 2-D aperture data and thus had to introduce a significantly changed implementation of the GCF.The beamformed DAS image can then be multiplied with GCF

A. Very Large Cardiac Channel Data Database
The VLCD database was recorded at St. Olavs Hospital, Trondheim University Hospital, Trondheim, Norway, in 2018 and 2019.The study was approved by the Regional Committee for Medical and Health Research Ethics in Central Norway.All patients provided written informed consent.All data were recorded by two experienced echocardiographers.
The data were recorded with a GE Vingmed Ultrasound Vivid E95 ultrasound system and the 4Vc-D matrix array probe.The scanner was set up with the native Cardiac_E application and a frequency setting of 1.7/3.4MHz for secondharmonic imaging.Thus, the reconstructed images in this study are from second-harmonic data.The system features, adaptive contrast enhancement (ACE) and high definition (HD), were turned off.The 4Vc-D matrix array probe has an aperture of size 21.5 × 15.6 mm mapped to 10 × 19 subaperture (SAP) channels (each channel consists of several prebeamformed elements), in the azimuth and elevation directions.
The Vivid E95 has a software beamforming architecture and a special feature on the system, provided to us, which allows recording of channel data.These are raw in-phase and quadrature (IQ) sampled ultrasound data from each individual SAP prior to general beamforming and image processing.At least one cine loop containing one heart cycle was recorded from the five standard views: parasternal long axis (PLAX), parasternal short axis (PSAX), apical four-chamber (A4C), apical two-chamber (A2C), and apical long axis (ALAX).Some patients have several recordings of the same cardiac view.For some patients, some cardiac views were not recorded due to technical difficulties.A total of 535 channel data recordings were collected from 106 patients containing 33 280 individual image frames.On average, there are 64 frames per recording with a frame rate of 40 frames per second (FPS).A typical size of the channel data file is between 2 and 5 GB and the total approximate size of the database is 1.5 TB.

B. Global Image Coherence
A single value of the coherence per image frame that we denote the GIC can be calculated from any of the coherence measures described in Section II.Every coherence measure results in one coherence value for every pixel in the image.The GIC is calculated by averaging the pixel coherence values below a predefined range R 1 until the end depth of the image R 2 .Mathematically, for the CF but valid for all coherence measures, this can be described as where CF is calculated as in (3), is the number of reconstructed lines in azimuth, and indicated by the red ROI in Fig. 1(a) and (c).Even though an advantage of the GIC is that no manual or segmented ROI is needed, we decided to avoid the region close to the probe to avoid dependency of expanding aperture in the receive apodization when calculating CF, as well as avoiding the top region in the cardiac images, which is often quite noisy with potential rib interference and reverberations.
The coherence calculated as CF is illustrated for a frame with high image quality in Fig. 1(a) and lower image quality in Fig. 1(c) with the corresponding b-mode images in Fig. 1(b) and (d), respectively.
In all the results presented in this article, the GIC is scaled with 100, as indicated on the applicable axis, for better readability.

C. Conventional Image Quality Contrast Metrics
1) Contrast Ratio: One of the most used measures of contrast in ultrasound imaging is the CR [13] where } are, respectively, the mean signal power inside an ROI and a background region (B), in which b denotes the summed signal from (2).The CR Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.can take any positive real value, and CR → ∞ as µ B → 0. However, it is often expressed in decibels as CR[dB] = 10 log 10 CR. (18) 2) Generalized Contrast-to-Noise Ratio: The gCNR was introduced in [15] and [21] as a robust and quantitative contrast metric and is calculated using the probability density function (pdf) of an ROI and a background region (B), for which the contrast, or detectability, is measured as This estimates the area of the overlap between the two pdf curves.Two regions with pdfs resulting in a large overlap will have smaller gCNR and detectability compared to two regions with less overlap.This gives a fair and robust comparison between images and was specifically introduced to address the problem of dynamic range alterations [18] in some modern adaptive beamformers but is also valid on conventional DAS images as it is used here.

D. Validating GIC Using Conventional Contrast Metrics
To validate the GIC, we compare and correlate it to the conventional contrast image quality metrics CR and gCNR.However, to estimate the CR and gCNR, we need to manually segment an ROI and background region B. We therefore created a subset of the VLCD containing 20 images of the A4C view.The datasets were selected randomly while making sure that we spanned low to high image quality.This was done by selecting the datasets classified as the A4C view, sorting them with GIC ranging from low to high, and then uniformly drawing 20 datasets from the sorted list of datasets.The manual segmentation of the ROI and background is done as indicated by high and one low image quality case in Fig. 2. We chose to segment the major part of the interventricular septum and the heart wall, as the ROI indicated by the blue mask, and compare it with the major part of the left ventricle segmented as background B indicated by the red mask.The validation is done by correlating the resulting GIC value with the resulting gCNR and CR value from each frame.

E. Data Processing
The beamforming was performed in MATLAB (The Mathworks, Inc., Natick, MA, USA) using the generalized beamformer in the UltraSound ToolBox (USTB, ht.tps://w.ww.USTB.no)[22] with retrospective transmit beamforming and a hybrid transmit delay model as described in [23].To limit the computation time, the beamforming performed for the estimation of the coherence measures used a sector scan of four times the number of transmit directions in the azimuth direction [four multiple line acquisitions (MLAs)] and 256 depth pixels.For the images displayed in this article, we used six MLAs per transmit and 512 depth pixels for a more visually pleasing image.

F. View Classification
In order to classify cardiac views, we used the machine learning-based view classifier developed by Østvik et al. [24].Østvik et al. trained a convolutional neural network using a network structure detailed in [24].The model was trained and validated using data from 4582 cine loops from 205 patients recorded in transthoracic echocardioagraphy and classified each cine loop into one of seven standard cardiac views A2C, A4C, ALAX, PLAX, PSAX, subcostal four-chamber (SC4C), subcostal vena cava inferior (SCVC), as well as a nonassignable class "unknown."The reported accuracy was 98.9 ± 0.6 using cross validation on the training/validation data.On a separately recorded dataset, a test set, consisting of data from 2559 cine loops of the A4C, A2C, ALAX, PLAX, and PSAX view from 265 patients in their model, achieved an accuracy of 98.5 ± 0.5.See the discussion for details of the accuracy on our dataset.

G. Statistical Analysis
All statistical analyses were done in MATLAB.To estimate the similarity between coherence measures, and image quality metrics, we calculated the correlation using conventional linear regression and estimation of Pearson's correlation.The statistical difference between the median GIC value for the apical and the parasternal views was tested using a Wilcoxon rank-sum test.

IV. RESULTS
The mean value of the GIC averaged over the number of frames in one recording for every coherence measure described in Section II is plotted in the top plot of Fig. 3, with the same but normalized values in the bottom plot.From the plotted values, we can visually see a high correlation between the various methods for calculating coherence.The high correlation is confirmed in Fig. 4, where we, for every coherence measure, calculate the Pearson's correlation coefficient R, plotted in the bottom-left triangle of the matrix, while the scatter plot with the estimated regression line is plotted in the top-right triangle.
From Fig. 4, we see that all coherence measures correlate with each other with an R value higher than 0.9.With the highest correlation, 0.995 is between the two ways of calculating phase coherence PCF and CCF, and the lowest correlation is between GCF and SLSC.All the p-values are below 0.001, and thus, the correlation is statistically significant.
To visually illustrate the amount of data in the VLCD, the mean GIC (using CF) averaged over all frames in each recording is plotted for every dataset in Fig. 5. Two times the standard deviation is indicated by the whiskers in the plot, while the cardiac views, PLAX, PSAX, A4C, A2C, and ALAX, are indicated in the figure legend.The x-axis shows the patient number as included in the study.Notice that some of the datasets have high GIC (e.g., defined GIC * 100 > 2), while the greater part of the datasets has rather low GIC (GIC * 100 < 2).
If we sort all the recordings by cardiac view, we can investigate the GIC per view through the box-whisker plot in Fig. 6.The boxplots indicate the median as the red line and the 25th and 75th percentile as the top and bottom of the box, while the notches indicate the 95% confidence interval of the median value, and thus, we can notice that the apical views have statistically significantly higher GIC values than the parasternal views.The statistical significance is confirmed by a Wilcoxon rank-sum test with p < 0.001.
To obtain a more in-depth analysis of the GIC, we have highlighted the GIC per dataset with the cardiac view indicated for patients 21-24 in Fig. 7(a)-excluding a double recording of the A2C for patient 21 and correcting a misclassification of the PLAX as PSAX for patient 22.The corresponding b-mode images displayed with 55-dB dynamic range are shown in Fig. 7(b)-(q).Notice that the GIC in Fig. 7(a) corresponds well with the visual interpretation of image quality since A4C, A2C, and ALAX from patient 22 as well as PSAX, A4C, A2C, and ALAX from patient 24 have better image quality than the rest.The same observation is confirmed in Fig. 8 where we plot the GIC and the corresponding b-mode images from patients 85 and 86, and we can observe that patient 85 has better image quality than patient 86.A movie loop of the b-mode images is available in the Supplementary Materials.Since patient 85 did not have a recording of the A2C, we did not display it for patient 86 either.
Fig. 9(a) plots the GIC against the CR given segmentation of the ROI and background as described in III-D.The plotted line is the estimated linear regression line, which in this case illustrates the high correlation between GIC and CR with a Pearson's correlation value of R = 0.74 (p < 0.005).Notice that the subset of the dataset spans from low to high image quality as measured in terms of GIC.Furthermore, the GIC compared to gCNR is plotted in Fig. 9(b), with the line indicating the linear regression results in a fairly high Pearson's correlation value of R = 0.62 (p < 0.005).Notice that the gCNR saturates with higher image quality, and this observation is further elaborated in the discussion.
The GIC for a full recording of low image quality is in Fig. 10, and an image of high image quality is in Fig. 11.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The top pane in both plots is the b-mode with the CF images in the middle pane from four frames indicated in the bottom plot.The bottom is the GIC through all 64 frames.The selected frames are to highlight the parts of the cardiac cycle containing the highest GIC.We can observe that for an image of higher quality, the frame reaching the highest GIC is in the diastole phase of the cardiac cycle, where the left ventricle is at its largest due to the high coherence values in the blood, while the lower quality image has less variation in the GIC over the full recording.

V. DISCUSSION
The results in Figs. 3 and 4 illustrate that there is a clear correlation between all the inspected coherence measures.Coherence is impacted by local effects, including backscatter contrast, absolute scattering strength, clutter level, focal effects, and local aberration profile.These effects might lead to local variations within a single frame between the different coherence measures we have investigated.However, when we average these quantities over an entire image frame and then over all frames in a recording, as we do in our evaluation of the GIC, our results indicate that although there are differences in how the coherence measures are implemented and their absolute values, the information content is very similar.We can therefore confirm our first hypothesis that published coherence measures are strongly correlated.
The second hypothesis that GIC can be used as a quantitative metric for in vivo image quality can be visually confirmed for cardiac images when we compare the measured value of the GIC to the b-mode images in Figs.7(a) and 8.This is in agreement with the findings in [8] where they derived an analytic expression relating coherence through LOC to the channel signal-to-noise ratio.However, when we sort the GIC values by view in Fig. 6, we see that there is a significant difference in the GIC value between the parasternal and the apical views, with the apical views reaching a higher GIC value.This could be because the amount of tissue visible in the image is different in the apical and parasternal views.
In further works, one should aim at normalizing the GIC on the amount of tissue in the images to obtain a GIC value that can be quantitatively compared across any cardiac view.The GIC difference between the apical and parasternal views could also be that the parasternal views are more prone to the ribs blocking the aperture or the lung interfering with the image quality.However, further analysis is needed to explain the difference in GIC between the apical and parasternal views.
When we investigate the GIC per frame in a full recording for a low-quality recording in Fig. 10 and a high-quality recording in Fig. 11, we noticed that frames in the diastole phase had the highest GIC for the recording with high image quality.This is probably because, in the high-quality recordings, the blood speckles are visible in the b-mode image.Since blood is coherent, it is, together with tissue, contributing to the higher GIC, while in the lower quality recording, the blood speckles are not visible in the b-mode image and thus do not contribute to the GIC value.We can also observe in Fig. 5 that higher quality images (higher GIC) seem to have a higher variance of GIC over frames in recording.This is most likely since high-quality recordings have blood contributing to the GIC value, and the amount of blood in the image varies through the cardiac cycle.
The classification accuracy of the view classificator on the test set in [24] was 98.5% ± 0.5%.Our dataset consists of the same cardiac views as in their test set, also recorded on clinical systems from the same vendor GE Vingmed Ultrasound.We, therefore, assume our dataset to be classified with a similar accuracy even though we have not manually gone through all our 538 datasets.As shown in Fig. 5, there was only one dataset resulting in the unknown category (one dataset for patient 10).We can also notice that this dataset was of low quality as indicated by the low GIC value.From Fig. 5, we can also verify that almost all patients had all five cardiac views according to the protocol.Some patients, e.g., patient 2, have multiple recordings of some of the views, while some patients do not have all views, for example, patient 4 lacks the ALAX view.The missing views are most likely due to technical difficulties while storing the raw channel data.One misclassification was observed for patient 22 where the PLAX was incorrectly classified as PSAX.This was corrected from Figs. 5 to 7(a).However, we can once again notice from the GIC value and the images in Fig. 7(c) and (g) that the misclassification is most likely due to the low image quality.
Further confirming our second hypothesis, the validation of the GIC against the conventional contrast metrics CR and gCNR resulting in the plots in Fig. 9 illustrates that we have a high correlation against the conventional contrast metrics.However, there is a higher correlation with CR (R = 0.74 and p < 0.005) than with gCNR (R = 0.62 and p < 0.005).An explanation for this could be that from Fig. 9(b), we can notice that the gCNR saturates for images with higher image quality.This is probably explained by the fact that gCNR estimates image quality in terms of detectability.The ROI, in our case the interventricular septum, is easily detected from the background, the left ventricle, when the image quality reaches a certain level-and thus, the ROI and the background well separated also at medium image quality.Having an image of higher image quality does not necessarily increase the separability.Based on this, it can be debated how well the gCNR is suited as a quantitative image quality metric for different in vivo cardiac images and thus how relevant separability between two ROIs is as an in vivo metric.The gCNR was first and foremost introduced to evaluate advanced and nonlinear beamforming methods by comparing beamforming methods applied to the same data.Here, we are using it to compare quality across different datasets.This adds to the discussion that quantitatively evaluating image quality in in vivo ultrasound images is hard and that we need a collection of metrics to do fair and correct evaluations.We believe that the GIC is a valuable contribution to these image quality metrics.
A more relevant validation of the GIC would be clinical validation with, e.g., clinicians ranking images in terms of image quality and estimating how well this corresponds to the GIC value.Then, one could also assess whether there are quantitative bounds on image quality, for example, if the GIC quantitatively can support classifications into low, medium, and high image quality.Such a study should be performed.However, we have recently published a relevant study in [17] where we used a variant of the GIC to estimate the image quality improvements resulting from an aberration correction algorithm.This was a clinical study where four clinicians evaluated cardiac cine loops with and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.without aberration correction blinded, left-right-randomized, and side-by-side.The clinicians' task was to choose which image they preferred and the aberration corrected image was preferred in 97% of the cases of the 116 recorded cardiac cine loops.This matched well with the values of the GIC, which showed an increase in GIC in all frames after aberration correction.
In this work, we have focused on cardiac ultrasound images, which may be advantageous to the GIC since they have well-defined standard scan views.This allows for easier comparison of GIC between recordings since the clinicians have strived to obtain the same view in all recordings.In other clinical ultrasound applications that do not have standardized views, the GIC will still be valuable to, e.g., optimize beamforming parameters for specific recordings, but it might be harder to compare recordings from different patients.
A possible clinical utility of these results is a case where a clinician is recording several cine loops of a patient, and one of the cine loops should be selected for further postprocessing and estimation of clinical cardiac diagnostic parameters.Automatic selection of the cine loop with the highest GIC value may improve diagnostic accuracy, and we hypothesize that automatically selecting the cine loop with the highest image quality will further support automatic guidance with deep learning tools [25], [26].The GIC could also indicate the reliability of measurements made in a certain image.This is useful both while recording and doing the measurements, but perhaps even more important reviewing a list of measurements made on a recording.Also, having a quantitative measure of in vivo image quality can allow automatic tuning of parameters such as transmit setups [8] and other beamforming parameters to achieve an image of the highest image quality for every patient scanned.Perhaps, this can tilt the odds in favor of the clinician in the lottery of cardiac image quality.Therefore, the GIC can strengthen echocardiographic diagnostics and ultimately improve the care of cardiac patients.

VI. CONCLUSION
The acquisition of the VLCD consisting of 33 280 individual frames from 538 recordings of 106 patients allowed us to do an empirical study of coherence as an in vivo image quality metric.We demonstrate that all the coherence measures investigated in this study are highly correlated (R > 0.9 and p < 0.001) across the database, illustrating that even though there are differences in the implementation and absolute values of the coherence measures, their information content is very similar.Any coherence measure can be averaged across the pixels in an image frame into a GIC and we demonstrated empirically that this value can be used as a quantitative value for in vivo image quality.We validated the GIC against the conventional contrast metrics CR and the gCNR on a subset of the full dataset and obtained a correlation of R = 0.74 (p < 0.005) and R = 0.62 (p < 0.005), respectively, demonstrating a high correlation against conventional contrast metrics.We used the CF to implement the GIC, and however, the exact choice of coherence measure is probably not critical.

Fig. 1 .
Fig. 1.Coherence calculated as CF for a frame with (a) high image quality and (c) lower image quality with (b) and (d) corresponding b-mode images.The red ROI in (a) and (c) indicates the regions used to compute the GIC, selected below a predefined range R 1 until the end depth of the image R 2 .The GIC is indicated in the title of the image in (a) and (c), and notice that the GIC is higher for the image in (a) than in (c).

Fig. 2 .
Fig. 2. Two of the manually segmented images used to estimate the conventional contrast metrics CR and gCNR to compare against the results of the GIC.The blue mask is the segmentation of the ROI, the major part of the heart wall, while the red region, background B, is the major part of the left ventricle.(a) High-quality image.(b) Lowquality image.

Fig. 3 .
Fig. 3. Mean value over the number of frames for datasets for all coherence measures.The top plot shows raw values from each method, while the bottom shows the values normalized to the same range.

Fig. 4 .
Fig. 4. Top right of the matrix displays the scatter plots between each coherence measure on all datasets, while the left triangle indicates the resulting Pearson's correlation coefficient.All correlation coefficients have a p-value < 0.001.

Fig. 5 .
Fig. 5. Mean and two times the standard deviation of the GIC for all the datasets in the VLCD with view indicated by the marker in the legend.This figure is a graphical illustration of the VLCD.

Fig. 6 .
Fig. 6.Boxplot of the GIC from each cardiac view.The red line indicates the median, the bottom and top edges indicate the 25th and 75th percentiles, respectively, and the whiskers are the highest and lowest values.The notches in the box indicate the 95% confidence interval of the median value.Notice that the apical views have statistically significantly higher GIC values than the parasternal views confirmed by a Wilcoxon rank-sum test.

Fig. 9 .
Fig. 9. (a) GIC is plotted against the estimated CR resulting in a correlation value of R = 0.74 (p < 0.005).(b) GIC compared to gCNR results in a correlation value of R = 0.62 (p < 0.005).Notice that the subset of datasets spans from low to high image quality in terms of GIC.

Fig. 10 .
Fig.10.Top pane is b-mode with the coherence images in the middle pane from four frames indicated in the bottom plot.The bottom is the GIC through all 64 frames of a lower quality image.The selected frames illustrate frames with lower and higher GIC.

Fig. 11 .
Fig.11.Top pane is the b-mode with the coherence images in the middle pane from four frames indicated in the bottom plot.The bottom is the GIC through all 64 frames of a higher quality image.The selected frames illustrate frames with lower and higher GIC.