An evaluation of multiplex bead-based analysis of cytokines and soluble proteins in archived lithium heparin plasma, EDTA plasma and serum samples.

Abstract Objective: To assess the usability of archived plasma and serum by multiplex (Luminex) analysis of circulating proteins (analytes) by evaluating the day to day variation, the effect of several freeze-thaw cycles, and the influence of the media and choice of anticoagulant. Methods: Nineteen analytes in plasma and serum from 86 head and neck cancer patients and 33 controls were evaluated: EGFR, leptin, OPN, VEGFR-1, VEGFR-2, IL-2, IL-13, PDGF-bb, TNF, PAI-1, SDF-1a, IL-4, IL-6, IL-8, eotaxin, G-CSF, VEGF, GRO-a, and HGF. Results: The correlation between measurements of the same samples analyzed on different dates was reasonable. However, samples run on different dates could exhibit different absolute values. The 75th percentile of the fold differences for samples run on different dates was 2.2. No significant difference was found between one and four freeze-thaw cycles (except for HGF), and the correlation was high. We found significant differences in mean concentrations of the majority of analytes in different media and with different anticoagulants. Only the following analytes did not show difference in mean concentrations: EDTA plasma vs. serum: leptin and VEGFR-2, LH plasma vs. serum: IL-2, IL-13, and VEGF, LH plasma levels vs. EDTA plasma: IL-2 and IL-4. Conclusion: Stored serum, LH plasma, and EDTA plasma from clinical trials can be used for analysis of circulating cytokines and proteins. Variations in measurements occur, but are within reasonable ranges. The optimal type of media depends on the analytes, as different analytes have low number of measurements below the lower limit of quantification and higher dynamic ranges in different media.


Introduction
Changes in the levels of circulating cytokines and soluble markers of immune activation are associated with several diseases and have been suggested as biomarkers for treatment outcome and prognosis [1][2][3][4]. With the introduction of multiplex immunoassays, it has become possible to test large numbers of analytes in a small quantity of test material, making the technology valuable in clinical trials [5][6][7]. There are several challenges when measuring circulating biomarkers in clinical and research settings. Among these are pre-analytical factors concerning the test individual (e.g. hormone status, circadian rhythm, age, gender) [8][9][10], as well as technical challenges in sample handling, choice of medium, storage and other factors concerning the performance of the assay [11].
It has been shown that the levels of cytokines are affected by the duration of storage of the test material [12], the duration of contact between serum or plasma and blood cells [13][14][15], the number of freeze-thaw cycles [16][17][18], choice of medium (plasma or serum), and, the anticoagulant used [19].
Plasma and serum are not exchangeable when evaluating the measurements of cytokines. Serum constitutes the soluble fraction of clotted blood, thus lacking fibrinogen, platelets and other coagulation factors. During the clotting process blood cells may release cytokines [20]. Plasma is the soluble fraction of anticoagulated blood. Several anticoagulants can be used (including ethylene diamine tetra-acetic acid (EDTA), lithium heparin (LH) sodium heparin and sodium citrate), and the choice can influence the measurable cytokine concentration [19].
When working with circulating biomarkers in translational research on archived material it may be difficult to address all pre-analytical and sample handling challenges. It is therefore of great importance to obtain knowledge on how to approach these limitations when retrospectively analyzing stored patient material from clinical trials.
The aim of the present study was to evaluate the concentration of 19 cytokines and blood proteins previously evaluated as biomarkers in head and neck cancer [21][22][23][24] for day to day variation, evaluate the influence of freezethaw cycles, to evaluate the influence of the media (plasma versus serum), and the choice of anticoagulant (LH or EDTA) on the cytokine/protein concentration when measured by multiplex immunoassays [3,[25][26][27][28].

Patient cohorts
Eighty-six patients treated for head and neck cancer at the Department of Oncology at Aarhus University Hospital, Denmark, between July 2005 and September 2011 were enrolled in this study (cohorts from the DAHANCA 18 and 24 trial, cohort A and B, respectively) [29,30]. The patients were treated with primary radiotherapy, with a total dose of 66-68 Gy in 33-34 fractions, six fractions a week. Locally advanced stages received weekly low-dose cisplatin.
In addition, 33 healthy controls were enrolled between March 2012 and November 2014 (18 from the DAHANCA 25 B cohort and 15 from the blood bank, cohort C and D, respectively) [31]. To assess the analyte concentrations in the archived material in different preparations, matched samples were collected from the same individual at the same time. The patient and control characteristics are presented in Table 1.

Sample collection and processing
Blood samples were used with permission from the Danish Research Ethics Committee (case number 1-10-72-519-12). The procedures followed were in accordance with the Helsinki Declaration of 1975 (revised in 1983). Data handling procedures were approved by the Danish Data Protection Agency (case number 2014-41-3510). The registry for use of tissue was consulted before the use of patient material.
All blood samples were obtained by venipuncture. The patient samples were collected pre-treatment. The serum samples were taken in dry vials, the plasma samples were taken in LH or EDTA vials using BD vacutainer vials and kept on ice until separation. Both serum and plasma samples were separated within 3 h of collection. The samples were thawed in an ice bath, mixed and centrifuged at 1500 g in 10 min at 4 C.

Analytes
All sample dilutions were prepared in duplicates and all assay runs were performed by the same operator. A 50 lL volume of each sample was added to a 96-well plate containing 50 lL of fluorescent antibody coated beads. Streptavidin-PE antibodies were added to the plate following incubation and washing steps. Washing steps were performed with Bio-Plex Pro TM wash station. Ultimately 125 lL assay buffer was added to the wells, the plate was shaken at 1100 RPM for 30 s and analyzed on the Luminex 100 (Bio-Plex 200 System). The following 19 analytes were analyzed according to the manufacturer's protocol in three different pre-mixed bead-based antibody assays (Bio-Plex Pro TM human Reagent Kit from Bio-Rad) using the Bio-Plex manager software (version 6.1): 5-plex: EGFR, leptin, OPN, VEGFR-1, VEGFR-2. 6-plex: IL-2, IL-13, PDGF-bb, TNF, PAI-1, SDF-1a. 8-plex: IL-4, IL-6, IL-8, eotaxin, G-CSF, VEGF, GRO-a, HGF.
The standard curves for the 19 analytes were evaluated and fitted using five-parameter logistic regression correcting for asymmetry in the curve shape. The Coefficient of variation (CV) was calculated as: CV ¼ r/l Â 100%. (l ¼ average value of the observed value. r ¼ standard deviation). The intra-assay CV values are reported in Table 2. The recovery rate was defined as the (observed concentration/expected concentration) Ã 100 ¼ recovery. An acceptable recovery range was set to be between 70 and 130%. For measured values out of range (OOR) the values above the upper limit of quantification (ULOQ) were replaced by the highest recorded value of the standard curve. For values below the lower limit of quantification (LLOQ) the OOR Table 1. Patient and control characteristics. values were replaced by the lowest recorded value of the standard curve divided by two.

Analysis
The following technical factors were tested: 1. First versus second analysis of the same sample: The same sample was run on two independent analyses. 2. Rerun of the same 96-well plate on the same date: Fiftynine of the patient plasma samples were run twice on the same 96-well plate for both the 5-, 6-and 8-plex kit. 3. Influence of freeze-thaw cycles: For eight 6-plex samples and six 8-plex samples obtained from cohort D, LH plasma aliquots were analyzed after one freeze-thaw cycle and again after four thaw-cycles. For each thaw cycle the samples were kept undisturbed on ice until they were visually thawed. The samples were placed in an ice bath for additionally one hour until they were refrozen at À80 C for a minimum of one week until the next thaw. 4. Serum versus plasma: Serum sample analysis results from the same individual were compared to EDTA plasma and LH plasma in the 5-, 6-and 8-plex. 5. LH plasma versus EDTA plasma: The EDTA plasma and LH plasma samples analyses results from the same individual were compared in the 5-, 6-and 8-plex.

Statistics
The same sample run on a different date, serum and plasma, EDTA plasma and LH plasma, as well as samples undergoing 1 versus 4 freeze-thaw cycles were assessed with Bland-Altman plots for systematic bias. After assessing the normality of the differences, the mean differences were compared by a paired t-test. No correction was made for multiple testing. The strength of the monotonic relationship between the paired data was evaluated by Spearman's correlation coefficient. For the purpose of assessing measurement variation in the same samples run on different dates, the measurement error expressed as the standard deviation of the paired measurements assuming that the difference is zero, is reported. Reported p-values are two-sided with a 0.05 significance level. In order to assess the total variation, the fold differences for all the paired measurements in range were listed and the 75% percentile is reported as an estimate of the variation. All analyses were performed using Stata version 12. Unsupervised hierarchical clustering of the log2 transformed median centered analyte concentrations were performed using Cluster (Version 3.0, http://bonsai.hgc.jp/ $mdehoon/software/cluster/software.htm) and a colour profile was made using TreeView (Version 1.1.6r2, http://jtreeview.sourceforge.net).

Same samples run on different dates
Seven 5-plex samples, nine 6-plex samples and five 8-plex LH plasma samples from cohort D were evaluated for repeatability of the measured concentration on two different dates. The samples had comparable storage times and had been exposed to identical freeze-thaw cycles.
To test the possible technical variation between the two measurements, two individual runs were plotted against each other, and Bland-Altman plots were performed, for the individual analytes (see Figure 1(a-c) for selected analytes and Supplementary Figure 1 for all analytes). The data showed that the variation between the absolute concentrations measured in two identical samples on two different dates can vary considerably. The paired t-test indicated that there was no significant difference between the first and the second measurement for 14/19 of the analytes, but a significant difference was observed for OPN, VEGFR-1, VEGFR-2, PDGF-bb and SDF-1a. The correlation between measurements of the same sample on a different date, evaluated by the Spearman correlation, was above 0.4 for 12/19 analytes (Table 3). For the analytes with a weaker correlation (OPN, VEGFR-1, SDF1a) the scatter plots revealed that points are gathered close together with a small spread compared to the range. For IL-2, IL-4 and TNF 10/18, 8/10 and 10/18 of the obtained data-points were below the LLOQ rendering Spearman's rho less accountable. The fold differences for the paired measurements performed on different days were ranked and the 75% percentile is reported as an estimate of the variation that occur when a sample is measured repeatedly. The estimated fold variation was 2.2 ( Figure 1d) and was not correlated with the actual or relative levels of the analytes (Supplementary Figure 2).

Re-run of the same 96-well plate on the same date
The agreement when reanalyzing the same plate immediately after the first analysis after a wash with 125 lL assay buffer was evaluated with scatter-and Bland-Altman plots (see Supplementary Figure 3). This way the treatment of the

Influence of freeze-thaw cycles
The samples had undergone between one and four freezethaw cycles. To evaluate the influence of repeated freezethaw cycles, eight 6-plex samples and six 8-plex LH plasma samples from the same individual (cohort D), were investigated. These samples were analyzed after both one and four freeze-thaw cycles. The samples were analyzed on the same plate in order to avoid inter-assay variation. Selected variables are illustrated in Figure 2. For all the evaluated analytes but one, there was no significant difference from 0 in the   Table 3. Paired t-test, Spearman's rank correlation coefficient, limits of agreement, measurement error, and mean of the first and the second measurement for the same sample run on different dates. The average difference is the concentration of the second run minus the concentration in the first run. The limits of agreement is the 95% prediction interval. The measurement error is expressed as the standard deviation of the paired measurements assuming that the difference is zero. Number of samples: 5-plex (EGFR, Leptin, OPN, VEGFR-1 and VEGFR-2): n ¼ 7; 6-plex (IL-2, IL-13, PDGF-bb, TNF, PAI-1 and SDF1a): n ¼ 9; 8-plex (IL-4, IL-6, IL-8, Eotaxin, G-CSF, VEGF, GRO-a and HGF): n ¼ 5.

Analyte
Paired t-test Spearman's q mean difference between the measurements after one and four freeze-thaw cycles (see Supplementary Table 2). HGF appeared to be significantly lower after four freeze-thaw cycles. However, the limits of agreement included zero. The Spearman correlation between measurements of the same sample after one and four freeze-thaw cycles seemed acceptable with a reasonable monotone correlation for the majority of the analytes. For the analytes with a weaker correlation (IL-2, IL-13, SDF1a, IL-4, IL-8 and GRO-a) the scatter plots revealed that points were gathered close together with a small spread compared to the range (illustrated in Supplementary Figure 4). For IL-2 and IL-4, 11/16 and 7/12 were OOR rendering Spearman's rho less accountable. Performed Bland-Altman plots revealed no systematic bias, and all the calculated limits of agreement included zero (see Supplementary Figure 4). The 75th percentile of the fold differences was 1.7.

Serum versus EDTA plasma
EDTA plasma from the 18 cohort C controls were compared to the corresponding serum values in the 5-, 6-and 8-plex. The plasma and serum samples were run on separate plates for the 5-, 6-and 8-plex, respectively. For all the analytes but leptin and VEGFR-2, there was a significant difference between the mean serum and EDTA plasma concentration (see Table 4, Figure 3(a-c) for selected analytes and Supplementary Figure 5). For Leptin, PDGF-bb, PAI-1, eotaxin, VEGF and HGF the values were lower in EDTA plasma compared to serum, whereas for EGFR, OPN, VEGFR-1, IL-2, IL-13, TNF, SDF-1a, IL-4, IL-6, IL-8, G-CSF and GRO-a, the mean concentration was higher in EDTA plasma compared to serum. In general, the mean serum values were lower than the EDTA plasma values (see Figure  4a). For the majority of analytes the Spearman correlation was not strong (except for leptin, OPN, eotaxin (not significant) and GRO-a). Ranking the fold differences of the EDTA plasma samples and the paired serum samples the 75th percentile was 2.7.

Serum versus LH plasma
LH plasma from the 27 cohort B patients were compared to the corresponding serum values in the 5-, 6-and 8-plex. The plasma and serum samples were run on separate plates for the 5-, 6-and 8-plex, respectively. For the majority of the analytes (except IL-2, IL-13 and VEGF), there was a significant difference between LH plasma and serum concentrations (see Table 5, Figure 3(d-f) for selected analytes and supplementary Figure 6). Figure 4(b) shows a colour profile of the relative sample concentration in the two media. For the majority of analytes the Spearman correlation was not strong between the LH plasma samples and the serum samples from the same patient (except EGFR, leptin, IL-8, VEGF and HGF). The 75th percentile of the ranked fold differences was 5.5 between the serum and the LH plasma samples.

LH plasma versus EDTA plasma
EDTA plasma and LH plasma samples from cohort D was taken from the same individual, and the corresponding values compared in the 5-, 6-and 8-plex. The corresponding samples were run on the same plate on the same day and exposed to the same number of freeze-thaw cycles.
The paired t-test indicated that there was significant difference between the mean LH plasma and EDTA plasma concentration for all analytes, except IL-2 and IL-4 (see Table 6). The EDTA plasma samples had lower concentrations than the LH samples except for IL-13, PDGF-bb, TNF, SDF-1a and eotaxin (see Figure 4c). The Spearman correlation only indicated a strong monotone relationship for leptin, VEGFR-1, VEGFR-2 and eotaxin (see Figure 3(g-i) for selected analytes and Supplementary Figure 7). Ranking the fold differences of the EDTA plasma samples and the paired LH plasma samples the 75th percentile was 3.6.

Discussion
With this study the usability of archived plasma and serum for multiplex analysis of the concentration of 19 circulating proteins was assessed by evaluating the day to day variation when analyzing the same sample, the effect of several freezethaw cycles, and the influence of the media and choice of anticoagulant on the measured value.
The correlation between measurements of the same samples with the same number of freeze-thaw cycles but analyzed on different dates was found to be reasonable for the majority of the analytes. The 75th percentile of the fold differences was 2.2. Only a small proportion of this difference could be attributed to technical variation in measuring the analytes. Thus, when the plates were measured twice (technical replicates using the same reaction mixture), the 75th percentile of the fold differences was 1.2. We did not find any indication of higher variation in analytes with low absolute levels of expression, nor did we see any correlation with the relative levels of expression. Also, we did not find any clear correlation between samples at the lower range of the standard curves and the fold differences (data not shown). Further studies using samples where the amount of material is not restricted are needed to clarify the relative importance of sample and assay variation for the overall fold differences.
No significant difference was found between one and four freeze-thaw cycles in LH plasma (except for HGF) and the correlation between samples was high. Although many cytokines remain stable during repeated freeze-thaw cycles [18,[32][33][34][35][36], it has previously been shown, that the number of freeze-thaw cycles may influence the measured concentrations of some cytokines [12,16,18]. TNF in EDTA plasma has been shown to increase significantly after three freezethaw cycles [18], and IL-6 has been shown to decrease in heparin plasma with an increasing number of freeze-thaw cycles [33]. These findings could not be confirmed in this study as the concentration of TNF and IL-6 in heparin plasma remained stable after one and four freeze-thaw cycles.
The measured concentrations of cytokines and proteins in the blood are known to vary depending on the type of media and the choice of anticoagulant when preparing plasma [20,34,37,38]. Overall, we found significant differences in mean concentrations of the majority of analytes when analysing samples taken from the same individuals in different media, and with different anticoagulants.
In serum, blood cells might get activated during clot formation and cytokines may be released as a result (e.g. IL-1, IL-6, CXCL8, and VEGF) [39]. We included IL-6 and VEGF and consistent results were obtained. In serum, analytes could also be sequestered into the clot or thrombin activation in the clot formation could result in cleavage of the analyte [40], and for this reason plasma samples Table 4. OOR samples, paired t-test, Spearman's rank correlation coefficient, average difference with limits of agreement, and mean measurement for corresponding EDTA plasma and serum samples. The average difference is the concentration in the EDTA sample minus the concentration in corresponding serum sample. The limits of agreement is the 95% prediction interval. Number of samples: 5-plex (EGFR, Leptin, OPN, VEGFR-1 and VEGFR-2): n ¼ 18; 6-plex (IL-2, IL-13, PDGF-bb, TNF, PAI-1 and SDF1a): n ¼ 18; 8-plex (IL-4, IL-6, IL-8, Eotaxin, G-CSF, VEGF, GRO-a and HGF): n ¼ 18.  might better reflect the in vivo cytokine levels. In concordance with our results, lower concentrations of OPN has previously been found in serum compared to EDTA plasma [40]. We found PAI-1 levels in serum to be significantly higher than in both LH and EDTA plasma. As PAI-1 is involved in fibrinolysis and is released by platelets this implies that PAI-1 can be more accurately measured in plasma samples. The anticoagulants EDTA and heparin have different mechanisms of action. EDTA exerts its effect by chelating calcium whereas heparin acts by binding to antithrombin III. Heparin has been described to interact with several molecules including IL-2 and IL-6 and thereby possibly influencing the levels of the measured analyte [13,41]. EDTA has been proposed to cause varying degrees of in vitro activation of platelets resulting in release of platelet-derived angiogenic cytokines such as VEGF, PDGF-bb and other cytokines [42]. However, significantly lower concentrations of the analytes PDGF-bb, eotaxin, VEGF and PAI-1 in EDTA vs. heparin plasma has been reported by others [38]. In general, we found higher concentrations in LH plasma compared to EDTA plasma.
Although different mean concentrations were observed in serum, LH plasma, and EDTA plasma, many of the analytes displayed some level of correlation when comparing values from the same individuals. When analytes did not correlate, it was often caused by a large proportion of values recorded as below the LLOQ in one of the sample types. Different   analytes had large proportions of values recorded as OOR in different media, making it difficult to recommend one type of media for retrospective analysis of stored patient material. All IL-2 samples in cohort B were below the LLOQ in the serum samples, however, none of the IL-2 samples in cohort C were below the LLOQ. The cohort C samples were all close to the lower limit of detection. The two cohorts were run on different plates with different standard curves and thus different LLOQ. Another concern when selecting the type of media for retrospective analysis of stored patient material is the dynamic range of the analytes. In retrospective analysis of clinical trials, values of analytes are often dichotomized as either 'low' or 'high'. A small dynamic range can make it difficult to classify a sample as either 'low' or 'high', as for OPN in Figure 1(b). A large dynamic range, as for PAI-1 in Figure 1(a), will allow more reliable classifications. Different analytes had larger dynamic ranges in different media, making it difficult to recommend one media type over the others.
In conclusion, we have demonstrated that stored serum, LH plasma, and EDTA plasma from clinical trials can be used for analysis of circulating markers, even if pre-analytical inter-and intra-individual variation are not recorded and corrected for. We have found that variations in measurements related to the number of freeze-thaw are within Table 5. OOR samples, paired t-test, Spearman's rank correlation coefficient, average difference with limits of agreement, and mean measurement for corresponding LH plasma and serum samples. The average difference is the concentration in the LH sample minus the concentration in corresponding serum sample. The limits of agreement is the 95% prediction interval. Number of samples: 5-plex (EGFR, Leptin, OPN, VEGFR-1 and VEGFR-2): n ¼ 27; 6-plex (IL-2, IL-13, PDGFbb, TNF, PAI-1 and SDF1a): n ¼ 27; 8-plex (IL-4, IL-6, IL-8, Eotaxin, G-CSF, VEGF, GRO-a and HGF): n ¼ 27. n.a., Not analyzed. OOR, Out of range values below the lower limit of quantification are marked with Ú and above the upper limit of quantification are marked with^. Table 6. OOR samples, paired t-test, Spearman's rank correlation coefficient, average difference with limits of agreement, and mean measurement for corresponding LH and EDTA plasma samples. The average difference is the concentration in the LH sample minus the concentration in corresponding EDTA sample. The limits of agreement is the 95% prediction interval. reasonable ranges. We have also demonstrated that the optimal type of media depends on the analytes, as different analytes have low number of measurements below the LLOQ and higher dynamic ranges in different media.