A proposed framework to evaluate the quality and reliability of targeted metabolomics assays from the UK Consortium on Metabolic Phenotyping (MAP/UK)

Targeted metabolite assays that measure tens or hundreds of pre-selected metabolites, typically using liquid chromatography–mass spectrometry, are increasingly being developed and applied to metabolic phenotyping studies. These are used both as standalone phenotyping methods and for the validation of putative metabolic biomarkers obtained from untargeted metabolomics studies. However, there are no widely accepted standards in the scientific community for ensuring reliability of the development and validation of targeted metabolite assays (referred to here as ‘targeted metabolomics’). Most current practices attempt to adopt, with modifications, the strict guidance provided by drug regulatory authorities for analytical methods designed largely for measuring drugs and other xenobiotic analytes. Here, the regulatory guidance provided by the European Medicines Agency, US Food and Drug Administration and International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use are summarized. In this Perspective, we have adapted these guidelines and propose a less onerous ‘tiered’ approach to evaluate the reliability of a wide range of metabolomics analyses, addressing the need for community-accepted, harmonized guidelines for tiers other than full validation. This ‘fit-for-purpose’ tiered approach comprises four levels—discovery, screening, qualification and validation—and is discussed in the context of a range of targeted and untargeted metabolomics assays. Issues arising with targeted multiplexed metabolomics assays, and how these might be addressed, are considered. Furthermore, guidance is provided to assist the community with selecting the appropriate degree of reliability for a series of well-defined applications of metabolomics. In this Perspective, authors from the UK Consortium on Metabolic Phenotyping propose a ‘fit-for-purpose’, four-tiered framework to evaluate the reliability of targeted metabolomics analyses, addressing the need for community-accepted, harmonized guidelines for tiers other than full validation.

M etabolomics, or metabolic phenotyping, is a multidisciplinary field of research that investigates the metabolome. The metabolome refers to the terminal downstream products of the genome consisting of a repertoire of low-molecular-weight biomolecules (known as metabolites) involved in cellular metabolism and other biochemical processes in cells, tissues and bodily fluids, as well as those of exogenous xenobiotic and microbiome origin 1,2 . Metabolomics facilitates the characterization of a system from genomic to metabol(om)ic activity and interaction with the environment, and reveals dynamic insights into multiple metabolic pathways and networks that are the consequences of cellular activity to understand molecular pathophysiology 3 . In addition, metabolomics aims to identify biomolecules (metabolite biomarkers) that modulate the phenotype in physiological and/or disease status, reflective of biological processes as well as dysregulated pathways [4][5][6] .
The analytical approaches applied in metabolomics research are generally categorized as untargeted, targeted or a hybrid approach (sometimes defined as a semi-targeted approach) that combines some aspects of both types of analyses 7 . Untargeted metabolomics is a discovery-based approach in which the objective is to analyze as many detectable metabolites without biological bias, including unknowns, to determine which, if any, are significantly perturbed in the diseased phenotype, followed by post-hoc identification of those putative metabolic biomarkers 8 . Targeted approaches, on the other hand, involve the (multiplexed) analysis of known metabolites, and such methods often focus on quantification of a subset of metabolites representative of key pathways or of metabolites determined to be important from prior untargeted metabolomics 9 . Targeted metabolomics is hypothesis driven, with the significant advantage of quantifying known metabolites with greater sensitivity and selectivity 1 , whereas untargeted metabolomics is hypothesis generating, with the advantage of increased metabolite coverage and potential for biomarker discovery 8 . The major disadvantage of untargeted approaches is that relative responses and not actual concentrations are reported, whereas the major disadvantage of targeted approaches is their limited coverage of the metabolome 10 .
The techniques that are most widely used for untargeted analysis include liquid chromatography-high-resolution mass spectrometry (LC-MS), gas chromatography-mass spectrometry (GC-MS) and 1 H NMR spectroscopy, whereas liquid chromatography-triple quad-tandem mass spectrometry (LC-MS/MS) remains one of the traditional techniques for targeted analysis of a limited numbers of analytes, with another approach being GC-MS, which involves fragmentation of the metabolite during electron ionization 11,12 . One of the challenges in targeted metabolomics is that obtaining suitable internal standards is often difficult. On the other hand, one of the advantages of targeted biomarker assays is that the biology of the biomarker has often already been understood, so the anticipated levels, turnover rate and intraand inter-subject variability are known, thus enabling the analyst to develop the right assays with an appropriate level of validation to generate quality data. However, for newly discovered biomarkers for which little is known, assay development should start with a focus on parallelism, selectivity and sensitivity. Then, at a later stage, the assay could be finetuned to the required acceptance criteria 13 .
Advances in metabolomics have led to new clinical and toxicological diagnostic biomarkers [14][15][16] , which can contribute to stratified medicine and safety assessment of drugs 17,18 . Metabolomics is also central to the screening of innate errors of metabolism 19 . However, there are several challenges in the translation of metabolomics research to clinical and toxicological applications under regulatory control. Issues include analytical reproducibility, accuracy, precision, metabolite identification/quantification, study design, sample handling, lack of harmonized reporting frameworks for published data and metadata, insufficient open-access data to enable data mining by other researchers 20 , lack of harmonization in biobanking, batch-to-batch variation, and between-methods bias 21 . Assessing the reliability of bioanalytical methods for metabolomics is challenging when compared to the validation of other types of bioanalytical methods. Data from the metabolomics field are variable, and heterogeneity among data formats, data analysis pipelines, algorithms and applied statistical methods needs to be addressed. There is a need to define the extent to which assessing the reliability of these methods is required, and the scope of such assessments, as well as how the standards applied and methods for reporting, should be chosen to ensure appropriate data quality for use in regulatory processes 22 . To eliminate some of these problems, communication between the research and regulated clinical and toxicological communities needs to be more fully developed, and the establishment of a system to assess and cross-correlate metabolic profiles obtained by different laboratories and instruments is needed 20 . The new Omics Reporting Framework for  regulatory toxicology, developed by multiple stakeholders from  research laboratories, industry and government regulatory  agencies and coordinated by the Organisation for Economic  Co-operation and Development, provides evidence on how  progress can be made to achieve harmonized reporting of methods, data, metadata and findings and thereby advance the application of metabolomics within regulatory settings 23 . There are a plethora of publications that provide comprehensive guidelines for assessing the quality of untargeted metabolomics assays [24][25][26][27][28] . Although these guidelines provide the foundation for metabolomics system suitability and quality assurance (QA)/quality control (QC) proficiency, a community-initiated approach toward harmonized guidelines that ultimately achieves acceptance via their consensus use for evaluating the reliability of targeted metabolomics within research, clinical and toxicological settings is still required.
Our scientific collaboration, the UK Consortium on Metabolic Phenotyping (MAP/UK; https://mapuk.org), is a partnership of eight specialized research laboratories and two Phenome Centres that has been funded by the Medical Research Council to improve UK-wide metabolic phenotyping expertise and capabilities. The MAP/UK partnership brings together a critical mass of methodological, analytical and computational platforms to develop, optimize, transfer, harmonize and validate efficient, high-quality metabolomics research and training methods, specifically tailored to the growing need for biomedical studies that require robust metabolic phenotyping. The overall aim of the MAP/UK partnership is to investigate new biomarkers within metabolic signatures of disease, novel targeted quantitative metabolomic and hybrid approaches, and to develop untargeted metabolomics to meet gaps in molecular coverage of key disease-related pathways, alongside various other factors, including diet, lifestyle/environment, microbiome and genetics. As a collective of scientists with the aim of harmonization of metabolic phenotyping, existing regulatory guidelines to extract commonalities from these guidelines that can be adopted to 'fitfor-purpose' and tiered approaches for untargeted and targeted metabolomics have been reviewed.
The aim of this manuscript is to propose harmonized guidelines for evaluating the reliability of targeted (multiplexed) mass spectrometry-based metabolomics assays, taking into consideration intra-laboratory precision, accuracy, reproducibility and cross-laboratory harmonization of methods and data acquired on different instrument platforms. First, existing guidelines for bioanalytical method validation, including an existing four-tiered framework applied in drug discovery, are reviewed. Then, after introducing the applications of clinical and toxicological metabolomics in regulatory settings, a new 'fit-for-purpose' four-tiered (discovery, screening, qualification and validation) framework for assessing analytical reliability that is suitable for targeted, hybrid and untargeted metabolomics assays is proposed. A checklist for the bioanalytical process has been provided to facilitate better understanding and emphasize the importance of harmonization at each step, as described in Box 1.

The concept of regulatory bioanalytical validation
An analytical assay starts with a definition of its purpose (i.e., intended application), defining what is 'fit for purpose', followed by method development and optimization, then subsequently by assay validation (dependent upon the tier, as introduced above) and documentation before it can finally be applied for the intended purpose. Validation is defined as a process that provides proof of assay integrity within given specifications, with the parameters of an assay used for quantification being statistically reliable between assays over time. Before initiating a validation study, a well-planned validation protocol should be written and reviewed for scientific soundness and completeness. The protocol should describe the procedure in detail and should include predefined acceptance criteria and predefined statistical methods and should be approved by all participants in the analytical pipeline.
There are numerous validation parameters (accuracy, precision, calibration curve, lower limit of quantification, selectivity/ specificity, carryover, analyte stability, recovery, dilution integrity, system suitability test, matrix effect/factor, parallelism, incurred sample re-analysis, quality control, robustness/ruggedness, hook/ prozone effect and minimum required dilution) to incorporate into the validation process (see Supplementary Table 1 for comparison of validation parameters by multiple guidelines and Supplementary Table 2 for definition of validation parameters). The validation workflow has been summarized in visual format (Fig. 1). This workflow is a modification of a general validation workflow in combination with two extra steps based on our proposed framework to advise analysts in the choice of the appropriate tier of the assay and the depth of the required validation. One should justify the required level of validation to be fit for purpose on the basis of the differing applications of a   Table 1 for the appropriate tier selection and consult Table 2 for degree of validation. SOP, standard operating procedure. QC/QA, quality control/quality assurance. particular method. Theoretically, there are no limits to the extent of validation and verification procedures. However, in practice, there are both time and economic constraints on what can be achieved. Therefore, it is crucial to have optimized guidelines that are generally accepted, harmonized and cost effective 29 . Multiple guidelines exist that describe the regulation of bioanalytical assays, such as those from the U.S. Food and Drug Administration (FDA) 30 , the European Medicines Agency (EMA) 31 , the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) 32 , the Japanese Ministry of Health, Labour and Welfare (MHLW) 33 , Chinese (State) Food and Drug Administration (currently the National Medical Products Administration) 34 , Australian Therapeutic Goods Administration 35 and Brazilian National Health Surveillance Agency (Anvisa) [36][37][38] . The two most well-used bioanalytical guidelines from the EMA and FDA are similar but not identical. The scientific basis for the evaluation of parameters is the same across both guidelines. However, there are also differences in terminology, recommended validation parameters, acceptance criteria and methodology, which can cause confusion among bioanalysts and/or pharmaceutical companies. Standard setting and harmonization were advanced by the ICH, which is an international organization with the mission to achieve greater harmonization worldwide to ensure that safe, effective and high-quality medicines are developed and registered in the most resourceefficient manner. The ICH consolidated best practices from the FDA and EMA guidelines in 2019 into a harmonized M10 bioanalytical method validation draft to clarify any areas of uncertainty between the two guidelines. A comparison between the FDA and EMA guidelines and the consolidated ICH M10 draft guideline are summarized in Supplementary Table 1.
Although these regulatory guidelines are comprehensive, they are largely developed for the measurement of drugs and other xenobiotic analytes. Endogenous biomarkers are often measured in metabolomics, which requires different considerations including matrix effect. Matrix effect refers to a phenomenon usually encountered in LC-MS/MS in which the ionization efficiency of target analytes is altered in the presence of co-eluting compounds in the same matrix. It could cause either ion suppression or ion enhancement. Quantification of matrix effect is termed matrix factor and should be determined within the lowest limit of quantifiction (LLOQ) and upper limit of quantification (ULOQ) of a matrix-matched calibration curve.
LLOQ represents the sensitivity of the assay and determines the lowest concentration of analyte in a sample that can be quantified reliably with an acceptable accuracy and precision. LLOQ should be considered as the lowest point in the calibration curve at which the signal-to-noise (S:N, as defined by signal height divided by noise height) ratio should be ≥5:1. Evaluating these limits by using standard solutions in neat solvent and/or matrix deprived of specific classes of metabolites (such as stripped plasma) is not an ideal solution, because what has been depleted is not defined. Furthermore, measurement of specificity/selectivity for endogenous metabolites is much more challenging because of the presence of multiple isoforms.
Recently, regulatory bodies have begun to address the requirements needed to achieve robust and reliable data in biomarker assays applying 'omics' data. To our knowledge, the Omics subgroup report 22 and C-Path report 39 are the only documents published by the regulatory agencies on assessment of biomarker assays. The Omics subgroup report 22 , on behalf of the EMA and Heads of Medicines Agencies, published in 2017 a checklist to introduce considerations for successful qualification of novel methodologies such as biomarker quantification, clinical outcome assessment, imaging methods and big data approaches. This checklist entails brief recommendations for context of use, selection of endpoints, statistical analysis plan, demonstration of clinical utility, standard of truth/surrogate standard of truth and suitability of the analytical platform, as well as a link to ICH E16 and ICH E18 guidelines that focus on pharmacogenomics biomarkers and sampling and management of genomic data (EMA/750178/2017 document). Furthermore, the FDA in conjunction with the Path Institute (C-Path) published a document entailing broad scientific insight to biomarker assay challenges and a complete description of necessary approaches that can be applied to biomarker qualification 39 . Tiered approaches have been introduced to bioanalytical industries by regulatory bodies such as the FDA and MHLW, as discussed in the following section. These tiers are applicable to targeted quantification of drugs and often single metabolites. Our tiered approach is designed for multiplexed metabolomics assays, with the first two tiers designed for targeted metabolomics. Before introducing our proposed framework to assist bioanalysts in selecting the appropriate tier of validation for a series of well-defined applications of metabolomics, a brief introduction to the existing tiered regulatory guidance for the targeted measurement of single drugs is presented.
Existing tiered regulatory guidance for bioanalysis A fundamental question is how stringently regulatory bodies view these guidelines as being hard rules, or whether they could be adopted as fit for purpose for targeted metabolomics assays and used within a 'tiered' framework. The concept of defensible scientific flexibility has been a debate within the bioanalytical community and in the pharmaceutical industry. The Crystal City III workshop proposed the concept of 'fit-for-purpose' in 2006 as an alternative for the full validation workflow already described by the FDA regulatory documents to address uncertainties from the bioanalytical community as to what level of data scrutiny is required to generate quality data while optimizing resources to meet study objectives with an adequate level of data quality and reliability 40 . Furthermore, the European Bioanalysis Forum proposed consolidation of tiered approaches to include three levels (or tiers) of quality standards for metabolite quantification for screening, qualified and validated assays 41 . Consequently, the MHLW and FDA allowed adjustments and modifications of their bioanalytical method validation guidelines to fit the intended use of the assay, and this perspective was extended to tiered approaches for metabolite quantification [42][43][44] .
The Crystal City VI workshop in 2015 (ref. 45 ) defined a less rigorous level of validation than the FDA guidelines for drug metabolite quantification at early stages of development. The Global Bioanalytical Consortium assigned Team A2 with the objective of providing a framework to rationalize the level of bioanalytical methods for drug characterization and proposed a clear path for implementation and use of tiered approaches 42 . Furthermore, two globally recognized teams within the Global Bioanalytical Consortium (S1 and L1) provided acceptance standards for validation methods for small and large pharmaceutical molecules, respectively 46 . However, different terminologies have been used as part of the 'fit-to-purpose' concept, such as tiered assays, scientific validation, qualified assays and partial validation. Thus, it has been a source of confusion for academia and the biotechnology/pharmaceutical industry because of a lack of clear guidance 42 . More recently, these alternative validation assay workflows in the bioanalytical industry have been categorized into four tiered levels of method performance and evaluation on the basis of the final purpose of the derived analytical data, ranging from the most to least stringent, as follows-level 1) validation, intended for regulatory studies; level 2) qualification; level 3) research; and level 4) the least stringent, defined as 'screening' 42,47,48 . These four tiered levels are described in more detail below, and although these concepts have been designed for drug development and submission to regulatory authorities, they provide a framework that could be adapted for a range of assays used in metabolomics studies. • Level 1. Validated bioanalytical assays are designed for intended pharmaceutical products and thus require the highest level of confidence in analytical results as suitable for regulated good laboratory practice, preclinical/clinical, pharmacokinetic and/or toxicological studies, and identification of active metabolites in safety testing. These mandate that assay precision, accuracy, selectivity, sensitivity and stability of the analytes should be determined throughout the bioanalytical measurement process. FDA-recommended evaluations should be performed 41 . • Level 2. Qualified bioanalytical assays do not need to demonstrate that the measurement methods are as robust as validated assays. This tier is suitable for non-regulated studies in the drug development process, with additional assessment of tissue concentrations or other matrices during preclinical or late discovery phases and in decision-making for context-of-use statements. Single method performance with a statistically appropriate number of QC samples (n ≥ 5) at each level/ concentration and a suitable calibration range, precision and accuracy should be determined. • Level 3. Research-grade bioanalytical assays are suitable for mid-to late-discovery phases of drug development projects for decision-making evaluations and/or verification of additional biomarkers or metabolites for non-good laboratory practice regulated studies. They use limited characterization with calibration standards prepared by using a comparator reference material such as an in situ (in solution) standard, with the concentration estimated by radioactivity measurement, NMR spectroscopy or UV absorption as representative methods. The method provides semi-quantitative analyte concentrations within wider accuracy and precision limits than for the two higher tiers 42 . This approach enables the partial characterization of an analytical method that may eventually move to a qualified or validated assay. It should provide sufficient scientific rigor to ensure that it is fit for purpose and that there is confidence in the data. Method evaluation should be conducted before sample analysis, with the precision and accuracy needed to achieve the more relaxed criteria of 20% relative standard deviation (RSD) and 30% reduction of error at the LLOQ. • Level 4. Screening bioanalytical assays apply a generic method (not specific to the analyte) to provide adequate results for the analyte of interest and are suitable for early discovery and qualitative (present/absent) analysis. Screening assays undergo limited characterization based on relative instrument analyte response when reference material is not available. The assay provides relative analyte measurements (i.e., response and not concentration) only but may still be suitable for decisionmaking processes. An abbreviated set of QC samples with large margins of variability of 30% RSD and 40% reduction of error is advisable. As such, screening bioanalytical assays are most similar to untargeted metabolomics assays. Apart from the four-tiered levels approach in the bioanalytical industry, there is a general concept of 'full' and 'partial' validation. Full validation is necessary when developing and implementing a bioanalytical method for the first time, such as when analytes are added to a panel for bioanalytical quantification. In targeted metabolomics, full validation of a method by the accredited clinical laboratory is required when the result from that assay (e.g., concentration of a biomarker in terms of molarity for liquids or micrograms per milligram for tissue) is used for making a clinical decision. Partial validation is required in the case of bioanalytical method transfers between laboratories or when the method parameters, such as instrument and/or software platform change, or for changes in species (e.g., human plasma to murine plasma) or matrix (e.g., human plasma to human serum/urine). Partial validation can range from as little as one intra-assay accuracy and precision determination to nearly full validation 49 , depending on the degree of change required being undertaken.
The sections above have introduced concepts and terminologies within bioanalytical validation as well as highlighted the need for the standardization of guidelines for the validation of endogenous metabolite analysis with the aim of maximizing the cross-comparability of generated data. In the next section, a flexible and practical framework to assist bioanalysts in selecting the appropriate tier of reliability for multiplexed metabolic biomarker assays, each with a defined use, is proposed.
Proposed tiered framework for assessing the reliability of metabolomics bioanalytical methods Considering that there are a range of applications for metabolomics and new advances in LC-MS techniques for multiplexed measurement of metabolites, there is a clear need to propose a harmonized framework that describes which reliability tier is most fit for purpose for different applications. Evaluation of being fit for purpose involves questions such as: (i) what is the context of use for the assay (i.e., what will the data be used for); (ii) should it be a quantitative, semiquantitative or relatively quantitative assessment; and (iii) what level of uncertainty can be tolerated in the assessment. Consolidating the concept of 'fit-for-purpose' assists bioanalysts in decision making on whether to qualify or validate a biomarker assay and which parameters to choose in addition to the number of appropriate replicates 50 . The end-result of a fit-forpurpose validation of an assay using relative quantification is a resource-effective and efficient demonstration of the bioanalytical method's performance that is tailored to meet the objective of the application. This ultimately provides reliable study data to make important decisions. The decisions may involve further assay development and progression to a fully validated method.
The intended use (or application) of a metabolomics assay determines which level of reliability assessment should be used, not the type of assay. Selecting the most appropriate tier for measuring multiple metabolic biomarkers simultaneously for targeted metabolomics assays is challenging if the intended data use is not carefully defined. Hence, the first step in selecting an appropriate tier is to define the intended use of the data and which type of assay is needed; then, the most appropriate reliability tier can be further defined.
The following framework is proposed as a guideline for the metabolomics community to assess the reliability of both targeted and untargeted metabolomics assays for different types of applications (i.e., from biomarker discovery by a research laboratory, transfer of a method to a different laboratory, through the use of biomarkers within a clinical setting). The proposed framework is summarized in Table 1 (Tiers 1-4) to assist bioanalysts in selecting the most appropriate tier on the basis of their purpose and assay type. Tiers 1 and 2 (targeted metabolomics) are the main focus of this manuscript, and all related parameters for safeguarding scientific rigor for robust validation and bioanalytical quantification for these two tiers (termed 'validation' and 'qualification') are summarized in Table 2. These tiers differ in depth, robustness of parameters and the number of replicates performed for each parameter ( Table 2).
Tier 1: validated method Tier 1 validation involves diagnosis of disease/toxicity phenotype by using traditional targeted metabolite analysis with absolute quantification of typically one to a few (10) metabolites. Tier 1 validation is required for compliance with regulatory agencies for clinical diagnostics. This requires an authentic standard (external standard) for each metabolite. The proposed procedure is in alignment with current FDA and ICH M10 bioanalytical method validation guidelines and is applicable to quantitative analytical assays such as chromatographic, LC-MS and/or LC-MS/MS and ligand binding assays ( Table 2).
Tier 2: qualified method Tier 2 qualification involves diagnosis of disease/toxicity phenotype by using a multiplexed targeted metabolomics assay with absolute quantification of >10 metabolites. This requires an authentic external standard for each metabolite. The criteria for qualifying a method are less strict than for tier 1 validation of a method ( Table 2).
Tier 3: screening method Tier 3 screening involves screening for a disease/toxicity phenotype by using a multiplexed targeted or hybrid metabolomics assay with relative or semi-quantification of a panel of hundreds of metabolites. This does not require an authentic external standard for each metabolite. The criteria to meet in a screening method are less strict than for tier 2 qualification of a method.
Tier 4: discovery method Tier 4 discovery involves discovery of putative metabolic biomarkers by using untargeted or hybrid metabolomics with relative quantification in a research laboratory. Untargeted methods have the least strict criteria. The use of system suitability QC samples, intra-study QC samples, inter-laboratory QC samples and dilution series of pooled QC samples in tier 4 metabolomics have been previously discussed 7,51 .

Targeted metabolomic assays and multiplexing
Targeted metabolomic studies often require the quantification (e.g., absolute, semi-and/or relative) of multiple analytes (e.g., multiplexing) to exploit putative biomarkers identified by untargeted metabolomics methods and validate derived hypotheses. The gap between targeted and untargeted metabolomics is very narrow and often overlapping. For example, in assays for the quantification of hundreds of polar or lipophilic metabolites, authentic external standards and internal standards may not be available for all analytes. Many of these assays also satisfy the criteria for the accuracy and precision of metabolite measurements as defined by the FDA. However, they should be reported as semi-quantitative concentrations rather than absolute concentrations mainly because of the lack of standards and/or internal standard availability. LC-MS multiplexing allows for the measurement of numerous analytes in the same analytical run, thus providing significantly more information about molecular biomarker signatures than measurements of single analytes, although as the number of analytes increases, favorable accuracy and precision values are often more difficult to obtain. As noted by regulatory guidelines, all quantified analytes in the same assay need to meet the same acceptance criteria. If one of the analytes fails to meet acceptance criteria, the whole analytical run fails. However, in multiplexing assays, re-analysis of the whole panel of analytes should not be necessary if most of the analytes are within the predefined quality specifications.
In addition, acceptance criteria could be widened depending on the number of metabolites and purpose of the assay, as detailed in Tables 1 and 2 (ref. 52 ). One consideration might be increasing the variation at the LLOQ from 20% to 30-40%. One should bear in mind that increasing the number of replicates at the LLOQ will result in lower variation (RSD%) and that the degree of analytical variability that can be tolerated depends on biological variation. Higher variation is often expected for large biomolecules compared to metabolites, with incurred sample reanalysis of macromolecules as recommended by the FDA being within 30% of the average of original and reanalyzed values, compared to 20% for small molecules 53 . In the proposed framework, acceptance criteria for Tier 2 is more relaxed because the size and number of replicates are lowered. However, increased calibration points for Tier 2 when the number of metabolites are increased are recommended. Furthermore, biomarkers should be simultaneously evaluated in both absolute and semi/relative quantification manners for multiplexed assays 52 . For instance, identification or presence of a particular compound (e.g., qualitative evaluation) alongside quantification of related metabolites or a precursor could provide better insight into metabolic phenotyping.
Bioanalytical considerations for generation of quality data in targeted, hybrid and untargeted metabolomics assays The importance of good laboratory practice at different stages (e.g., sample collection and storage integrity) should be considered for bioanalysis. Sample, analyte and data integrity, as well as basic laboratory record keeping, are essential. Implementing a laboratory information management system is recommended. Routine calibration of laboratory instruments, pipettes and balances with well-written standard operating procedures and selection of suitable blank matrices, internal standards, system suitability tests and intra-study QC samples are essential. Intra-study QC samples should be placed in the analytical run in such a way that the precision of the whole run is ensured by taking into account that study samples should always be bracketed by QC samples 7 . Phenotyping QC samples (e.g., healthy versus diseased) are recommended. A QC sample is typically produced by pooling a small aliquot of all study samples, and these are analyzed throughout the analytical run. For untargeted metabolomics, a dilution series of the intrastudy QC sample is highly recommended to help differentiate features of biological origin from the LC-MS chemical background 12 . Application of isotopically labeled standards can provide a generalized measure of precision across the study. Furthermore, the use of isotopically labeled internal standards helps to compensate for matrix-induced ionization effects, thereby enhancing the accuracy of the assay when quantification/semi-quantification is applied 26 . Suitable surrogate matrices are recommended to improve the sensitivity and selectivity of biomarker quantification [54][55][56][57] . Blank matrices with the minimum level of endogenous analyte should be used wherever possible. This approach is suitable for multianalyte assays (spiked with an appropriate concentration of each analyte), but matrix effects and stability should be investigated for each analyte. In the absence of blank matrices or surrogate matrices, standard addition approaches, which take into account the native concentration of the targeted analyte(s), can be used for recovery and matrix effect checks, and the use of QC samples or standards prepared only in solvent and/or buffer considered for accuracy and repeatability/reproducibility tests represents the approach that makes the least assumptions. Artificial blank matrices may be used. A solution of 4% fatty acid-free Bovine Serum Albumin (BSA) (w/v) in saline buffer that represents the same concentrations of salts and electrolytes in human plasma is an example of blank matrix for human plasma (artificial surrogate matrix). Normalization strategies to correct for differences in sample amount should be considered. For example, urinary creatinine is often used to adjust the concentration of urinary biomarkers.
All targeted assays should have a clearly defined limit of detection and limit of quantification. A clearly discernible peak must be visible above clearly visible baseline noise and should be comprised of a specified number of data points (often six or above is used). As a general rule, a limit of quantification of the S:N (as defined by signal height divided by noise height) ratio of ≥5:1 is used by research laboratories, with a limit of detection of~3:1. This approach is fully in line with guidelines from international bodies 30,[58][59][60][61][62][63][64][65][66] . For targeted assays, all peaks should be checked to ensure that they reach the specified S:N ratio as well as the required number of data points. However, for large-scale metabolomics, manual checking is not feasible for all peaks, but if certain metabolites or features are judged to be discriminatory (e.g., predictive of sample type), then those should be prioritized for manual post-processing checks to ensure that the differences are real and that the data are of good quality.

Discussion
Validation is defined as the process of proving that any procedure, process, equipment, material, activity or system performs as expected within defined acceptance criteria under a given set of conditions and that the performance characteristics of the procedure meet the requirements for the intended analytical applications 67,68 . Although implementing fail/pass criteria advised by bioanalytical method validation guidelines has provided a useful degree of standardization and consistency between regulated laboratories, new advances in technology, multiplexing and metabolomics studies require tiered and/or fit-for-purpose approaches 69 for pragmatic/practical use.
Predetermined or fixed acceptance criteria are established and appropriate for validated assays (Tier 1); however, for qualified, screening and discovery methods (Tiers 2-4), it may be appropriate to define these after the method performance experiments have been conducted to fine-tune the assay to the required acceptance criteria. Minimally, it is expected that a priori acceptance criteria can be relaxed for the higher tiers if such method performance still supports the intended use of the data and ultimately supports the necessary decisions that will be made 42 .
Validation beyond the intended use of the data means significant re-work, loss of time and increased cost in the blind pursuit of absolute requirements. For metabolomics at its current state of development, what is required is the definition of a simple, pragmatic and easy-to-follow framework that reflects realistic and practical needs that allow for the most efficient practices. For instance, an assay that does not pass the criteria for full validation but, nevertheless, fulfils the essential requirements for linearity, accuracy, precision, LLOQ and carryover criteria may be devised. In that case, guidance should focus on minimum requirements. Specifications of merit might include linearity with an LLOQ set as first calibrant, accuracy, precision and carryover.
Overall, the guidelines for assays developed for drugs that have been devised by regulatory authorities to ensure safety and efficacy in humans represent a 'gold standard' that may not be required for many types of targeted and untargeted metabolomics applications. This is not to suggest that metabolic phenotyping methods should not be developed to the standards necessary to provide reliable and scientifically valid data but to suggest that the use of tiered approaches linked to the type of investigation (i.e., discovery, hypothesis validation, biomarker/panel and/or qualification stages) should drive the level of validation performed. A number of intricate analytical factors (e.g., pre-analytical factors) defining core assay expectations and setting acceptable assay performance criteria should be taken into account for assessing the reliability and quality of metabolomics assays. Our MAP/UK consensus framework provides a bench guide for the two major categories of validation and qualification of targeted metabolomics analysis that have been described in Table 2.

Conclusions
Metabolomics has the potential to lead advances in the discovery of clinically and toxicologically relevant biomarkers; yet, the lack of harmonization at different levels throughout the whole metabolomics pipeline from study design, sample handling, biobanking, metabolite quantification to data analysis remains an issue that needs to be addressed. Metrological traceability and future development of certified matrix reference materials similar to National Institute of Standards and Technology reference standards (NIST SRM 1950) 70 and standard calibration mixtures should be established and harmonized within both the research and regulatory communities. The MAP/UK consortium proposes the pragmatic development of a fit-for-purpose four-tiered framework for assessing the reliability of metabolomics assays via a decision-making process and adaptation of existing drug regulatory guidance. The required level of analytical rigor and/or qualification that bioanalytical methods need to show to achieve scientifically valid studies in metabolomics has been considered. This framework is intended to guide bioanalysts and to facilitate improved communication between the research and regulatory communities, to enable the establishment of appropriately qualified targeted metabolomics assays to meet the needs of multiple applications of this technology in the regulatory sciences. Ultimately, we hope that such a community-initiated framework can accelerate the application of metabolomics in regulatory applications and achieve acceptance via its consensus use.