Adaptation of the ToxRTool to Assess the Reliability of Toxicology Studies Conducted with Genetically Modified Crops and Implications for Future Safety Testing

To determine the reliability of food safety studies carried out in rodents with genetically modified (GM) crops, a Food Safety Study Reliability Tool (FSSRTool) was adapted from the European Centre for the Validation of Alternative Methods’ (ECVAM) ToxRTool. Reliability was defined as the inherent quality of the study with regard to use of standardized testing methodology, full documentation of experimental procedures and results, and the plausibility of the findings. Codex guidelines for GM crop safety evaluations indicate toxicology studies are not needed when comparability of the GM crop to its conventional counterpart has been demonstrated. This guidance notwithstanding, animal feeding studies have routinely been conducted with GM crops, but their conclusions on safety are not always consistent. To accurately evaluate potential risks from GM crops, risk assessors need clearly interpretable results from reliable studies. The development of the FSSRTool, which provides the user with a means of assessing the reliability of a toxicology study to inform risk assessment, is discussed. Its application to the body of literature on GM crop food safety studies demonstrates that reliable studies report no toxicologically relevant differences between rodents fed GM crops or their non-GM comparators.


INTRODUCTION
Biotechnology has made it possible to introduce into agricultural crops new traits that significantly improve yield, enhance quality, improve nutritional value, and/or simplify crop management. Given the predicted increase in world population from 6.5 billion in 2006 to 9.2 billion by 2050, biotechnology is considered to be an important tool that can help increase the efficiency of food production (James, 2011) and stabilize the agricultural footprint in order to conserve biodiversity (Carpenter, 2011). Nevertheless, questions regarding the impact of GM crops on the environment remain a contentious issue for some world regions (WHO, 2012). Because of the aforementioned beneficial aspects farmers rapidly adopted these crops after they were first introduced into commerce in 1994. Between 1996 and 2010, more than 2.47 billion acres of biotechnology-derived crops (also described as genetically modified [GM] crops) have been planted in 29 countries (James, 2011).
GM crops must be approved by regulatory agencies in countries where the crops are either grown in country or are imported from other countries. The food safety testing requirements for GM crops generally include characterization of the molecular insert for the introduced genetic material, safety assessment of the introduced trait(s), agronomic performance of the new crop variety in the field under varied environmental conditions, and compositional analysis of the food/feed (Codex, 2009). Animal toxicology studies of GM crops are sometimes required to be completed as a condition of registration in certain countries; however, the routine reliance on animal toxicology studies in the overall safety assessment of GM crops has been questioned (Codex, 2009;EFSA, 2011a). International scientific experts participated in the Food and Agriculture Organization of the United Nations (FAO) and the World Health Organization (WHO) Codex Alimentarius Commission (Codex) to provide guidance on the safety assessment of foods derived from GM crops. Due to the technical challenges inherent in testing the safety of complex mixtures such as whole foods in laboratory animals, they did not recommend performing such studies and concluded that such studies were "unlikely to give rise to meaningful information" (Codex, 2009). However, if a new substance, the safety of which had not been confirmed, was introduced into food, then hypothesis-driven animal toxicology studies may be appropriate to assess its safety (Codex, 2009). The European Food Safety Authority (EFSA) has also published a review of the safety and nutritional assessment of food/feed derived from GM crops (EFSA, 2008a). They included a review of submitted 90-day rat feeding studies with GM crops from registrants, as well as published animal feeding studies conducted by other groups. EFSA concluded "the performance of 90-day feeding trials with rodents or feeding trials with target animal species have provided little if anything to the overall safety assessment (except for added confirmation of safety)" (EFSA, 2008a). Moreover, they stated "many feeding trials have been reported in which GM foods like maize, potatoes, rice, soybeans and tomatoes have been fed to rats or mice for prolonged periods. The majority of these experiments did not indicate clinical effects or histopathological abnormalities in organs or tissues of exposed animals. In some cases adverse effects were noted, which are difficult to interpret due to shortcomings in the studies" (EFSA, 2008a).
Several other reviews summarizing the results of various animal toxicology studies and farm animal feeding studies with GM crops have been published over the last few years (Domingo, 2007;Domingo and Gine Bordonaba, 2011;Dona and Arvanitoyannis, 2009; Magana-Gomez and de la Barca, 2009;Snell et al., 2012). The utility of some of these reviews is limited because they summarize results without providing a critical assessment of the quality of the studies for purposes of risk assessment. This is an important point, because as previously noted not all studies are of similar quality, and therefore may not be comparable in their ability to inform risk assessment. Secondly, some of the reviews were not comprehensive, since they did not discuss all available studies.
Others have also addressed similar issues where they have applied the principle of "evidence-based toxicology" for the risk assessment of chemicals in the environment (Guzelian et al., 2005;Hoffman and Hartung, 2006;Griesinger et al., 2009). A recent review by Snell et al. (2012) recommended that there should be better peer review of manuscripts summarizing GM crop food safety studies, because some studies that have been published had major technical flaws that impacted the ability to scientifically interpret the reported results. Moreover, the media, who report the results of technically flawed studies, create confusion among the general public regarding the safety of GM crops.
The purpose of this publication is to provide a comprehensive and transparent assessment of the reliability of the published food safety studies with GM crops and proteins introduced into crops to impart new traits. This review focuses on rodent studies rather than other animal feeding studies (e.g., farm animals, fish, etc.) because rodent studies are commonly used to assess risks for human food consumption, and other animal feeding studies have been reviewed elsewhere (Flachowsky et al., 2007;Phipps et al., 2006). To initiate this review a thorough search of the literature was undertaken to identify as many studies as possible for evaluation. Such a review is in line with a renewed interest in surveying past studies as a result of efforts such as European Commission (EC) legislation that required a health assessment of over 30,000 chemical substances used in commerce in the EU (registration, evaluation, authorization, and restriction of chemical substances or REACH). Based in part on animal welfare considerations, vertebrate toxicology testing of the chemical substances covered by REACH would not be undertaken if other sources of credible safety information were available (Schaafsma et al., 2009). Therefore, the European Centre for the Validation of Alternative Methods (ECVAM) developed a battery of questions as a means of evaluating the reliability of all available published toxicology studies on these industrial chemicals, and the end result was the Toxicological data Reliability Assessment Tool (ToxRTool) (Schneider et al., 2009). The evaluation procedure described in the present review was adapted from the in vivo assessment portion of the ToxRTool and was expanded to incorporate additional evaluation criteria that reflect the unique aspects of safety testing of whole food (e.g., the formulation of test and control diets to confirm they are nutritionally balanced and comparable to each other). A rationale for the adaptations required for formulation of diets will be addressed in greater detail in the discussion section.

Literature Identification for Evaluation
The literature identification process was performed by searching the U.S. National Library of Medicine, National Institutes of Health online search service, commonly known as "PubMed," (www.ncbi.nlm.nih.gov/pubmed) and the Web of Knowledge (wokinfo.com) for relevant publications. Initially, an advanced search was conducted through PubMed with the query, "(Plants, GM/toxicity [MeSH Major Topic]) OR Food, GM/adverse effects [MeSH Major Topic])." This search returned a list of approximately 40 publications detailing original research on the safety of GM foods that were included in the evaluation. Additional relevant works were identified by utilizing the aforementioned 40 papers found in PubMed to search the Web of Knowledge for publications that cited these works. The results of this second Web of Knowledge search identified additional relevant publications detailing primary research on the safety of GM foods to be included in this review. Finally, references within the publications identified in the first two steps were reviewed to determine if any other relevant publications were overlooked, and a few previously unidentified publications were added to the list of publications to be evaluated as part of this review. Only papers written, or translated into English were considered in this review. This process identified a total of 70 publications examining the safety of GM crops used as feed/food.

Development of Evaluation Criteria
A systematic review method intended to harmonize data evaluation processes for toxicology studies has existed for some time (Klimisch et al., 1997). The evaluation of the quality of a study's data was based on the application of the judgment of an expert to determine the reliability, relevance, and adequacy of the data; and the reviewer was guided by careful definition of each of these criteria. Reliability was defined as the inherent quality of the study with regard to: the use of standardized methodology, full detail of the experimental procedures and results, and the clarity and plausibility of the conclusions drawn from the results. Relevance was defined as the extent to which the experimental design and resulting data are appropriate for identification of a particular hazard or risk. Adequacy was defined as the usefulness of data for risk assessment purposes, and when more than one test had been conducted, the greatest weight of evidence being assigned to the most reliable and relevant test. Based on how well a study met these criteria, the study was assigned to one of four categories: (1) Reliable Without Restrictions, (2) Reliable With Restrictions, (3) Not Reliable, and (4) Not Assignable (Klimisch et al., 1997). Subsequent publications referencing the Klimisch scores tend to focus on the first 3 categories (Schneider et al., 2009). Klimisch categories have become well established and widely used, however, because such assessments are largely dependent on the judgment of individual experts, the results of a classical Klimisch evaluation may be considered subjective.
To minimize the subjectivity of systematic reviews of toxicology studies, ECVAM developed the ToxRTool (Schneider et al., 2009). The list of evaluation criteria in the ToxRTool was generated by reviewing international and national guideline requirements, publications, and reports related to reliability assessments of toxicology data on chemicals, and by compiling the study design parameters with the most potential to impact the quality of study data. Additionally, the ToxRTool established critical criteria within each category indicated above. These criteria were marked in red, and noncompliance with one or more red criteria was justification for assigning the study to Klimisch category 3. If all red criteria were satisfied, the percentage of affirmative answers to the evaluation criteria was determined and the studies assigned to categories as follows: 80%, Klimisch category 1 (Reliable Without Restrictions); <80% to 60%, Klimisch category 2 (Reliable With Restrictions); <60%, Klimisch category 3 (Not Reliable) (Schneider et al., 2009). However, once a Klimisch category is assigned on the basis of the two-step process outlined above, reviewers could alter the final Klimisch category according to personal judgment, provided they document the reasons for this departure from the tool (Schneider et al., 2009). The ToxRTool is publicly available for testing and use (http://ecvam.jrc.it), and is the basis for the evaluation criteria proposed in this paper. The authors encourage other experts to try this adapted ToxRTool for assessing the reliability of rodent toxicology studies carried out on GM crops and assess its utility and the completeness of the assessment criteria that were listed.
Due to the emphasis on in vivo toxicology testing of whole food (e.g., grain, seed, processed meal from seed), the criteria composing the in vivo portion of the ToxRTool were slightly modified to include questions relevant to identifying and characterizing the whole food that was tested. For example, the conventional (i.e., non-GM) crop used as a control substance should have similar background genetics to the GM crop and both should be grown together at the same time and field location to minimize environmental interactions on crop composition. Compositional analysis of test and control grain/seed or processed meal should also be done to facilitate preparation of nutritionally balanced rodent diets for both the control and test groups. Antinutrients (e.g., trypsin inhibitor levels in processed soybean meal) and contaminant analysis (i.e., pesticide residues, heavy metals, mycotoxins, etc.) in grain or seed should also be undertaken to determine if these items are within acceptable limits and therefore unlikely to confound interpretation of the study results. The proteins introduced into GM crops and tested for potential toxicity should also be characterized for purity, and comparability to plant produced protein if prepared heterologously (e.g., bacterial strains engineered to express high levels of introduced proteins).
Another set of modifications to the ToxRTool included reemphasizing some of the aspects of inherent reliability included by Klimisch et al.: GLP-compliance; study designs adapted from international testing guidelines such as OECD TG 408 (OECD, 1998); use of historical control data ranges to interpret specific results; and assessing the toxicological relevance of a finding. However, unlike Klimisch's earlier work, the present review does not include the criteria regarding the use of experts in conducting and reviewing the results of studies. While the use of qualified experts is critical to establishing the validity of findings (e.g., board-certified veterinary pathologists performing microscopic histological examinations), independently assessing the degree of individual expertise from the information provided in a manuscript is difficult at best. Consequently, this criterion was excluded to avoid the potential for introducing subjectivity and/or bias. An abridged version of the resulting Food Safety Study Reliability Tool (FSSRTool) evaluation criteria is presented in Table 1, and the full length document is available in the Supplemental information on-line.

Application of FSSRTool Evaluation Criteria
Similar to the ToxRTool, the evaluation criteria of the FSSRTool were entered into a Microsoft Excel worksheet as a renewable evaluation form, referred to as an FSSRTool Score Sheet. A single point was awarded for each affirmative answer on the FSSRTool Score Sheet. The score of the study was calculated based on the total number of "yes" answers (i.e., the Calculated Klimisch Category), and a separate score was also determined by affirmative answers for all red criteria (Red Klimisch Category). These two scores were used to assign a Final Klimisch Category. In cases where there were discrepancies between the Calculated and Red Klimisch Categories, the authors determined potential reasons for the discrepancy and assigned the Final Klimisch Category based on the weight of the evidence regarding the study's reliability. An FSSRTool Score Sheet Explanation was created for each study that did not meet one or more evaluation criteria, and it provides brief justifications for why the criteria were not met. In cases where the Final Klimisch Category differs from the Calculated and Red Klimisch Categories for the study, the authors' rationale for the departure was provided on the study's FSSRTool Score Sheet Explanation. All FSSRTool Score Sheet Explanations are available in the Supplemental information on-line.

Herbicide Tolerant Crops and Proteins
The first widely available GM crops (herbicide tolerant) have been thoroughly studied with regard to both the enzymes introduced into the crops for the purpose of imparting tolerance to topically applied herbicides, as well as whole foods from the crops. Following an evaluation of the body of literature on the topic, 17 original publications with GM crops and seven publications with purified enzymes introduced into those crops by genetic modification were identified. The results of the whole food study evaluations are provided in Table 2, and the results of the protein study evaluations are provided in Table 3. The information in the tables also provides a list of the study element(s) that did not meet the FSSRTool evaluation criteria for each study. Of the seventeen whole food studies evaluated, six were determined to be "Reliable Without Restriction," two Age and/or weight at initiation provided 8 Environmental conditions of vivarium provided 1 Italics are used to denote "red" criteria. 2 Results, negative findings included, of all study endpoints described in the study methods should be discussed. Negative findings may be briefly summarized. 3 Historical control data (i.e., data from control groups of the same species, strain, gender, and approximate age) should be used to establish the range of variability that may be anticipated for each parameter under investigation (i.e., a normal range). When differences are detected between groups, the values should be compared to these normal ranges to determine whether they fall within the normal range of variability or not. Additionally, toxicological relevance means changes observed should be consistent with scientifically-accepted mechanisms of toxicity (e.g., increases in serum levels of cytosolic liver enzymes) and likely correlate with other adverse findings (i.e., changes in organ weight and/or histological evidence of tissue damage).
"Reliable Without Restrictions" and were lowered one Klimisch category as a result. Studies determined to be "Not Reliable" either violated more than one "red" criterion (Krzyzowska et al., 2010), or had overall low compliance with the FSSRTool evaluation criteria (e.g., Malatesta et al., 2002a met 9/20 criteria) indicating that the reported results and conclusions could not be reliably interpreted. Of the seven protein studies evaluated, six were determined to be "Reliable Without Restriction" and one was determined to be "Not Reliable". The conclusions of the whole food and protein studies deemed "Reliable Without Restriction" are summarized in Table 4.

Insect Resistant Crops and Proteins
Several studies have been conducted with whole foods from insect resistant crops and insect control proteins introduced into the crops by gene introgression. Fourteen publications with GM crops and eight publications with insect control proteins were identified. The results of the whole food study evaluations are provided in Table 5, and the results of the protein study evaluations are provided in Table 3. Of the 14 whole food studies evaluated, eight were determined to be "Reliable Without Restriction" and six were determined to be "Not Reliable" based on their Final Klimisch Category score. Among the Final Klimisch Category 3 studies, Kilic and Akay, 2008 received an overall high score, but missed certain critical components such as test substance identification and study design. For example, the authors refer to the test substance as "Bt Corn". However, by 2008, there were several varieties of transgenic corn expressing different Cry proteins commercially available. Based on this designation, the reader has no way to determine which one was tested, and as a result this name was considered insufficient identification of the test substance. Furthermore, this study does not adhere to international testing guidelines for multigenerational reproduction studies (OECD, 2001a). Consequently, the results they report and conclusions they reach have been confounded to the point that the study cannot be considered reliable. The conclusions of the whole food studies deemed "Reliable Without Restriction" are summarized in Table 4.
Of the eight protein studies evaluated, four were determined to be "Reliable Without Restriction," one was determined to be "Reliable With Restrictions," and three were determined to be "Not Reliable" based on their Final Klimisch Category score. Again, a list of the study element(s) that did not meet the FSSRTool evaluation criteria is provided and it reveals that the studies assigned a Final Klimisch Category of 3 deviated significantly from the evaluation criteria of the FSSRTool. For example, Vazquez-Padron et al., 2000 does not meet Criterion #1 because the test substance's sequence identity and its comparability to the protein expressed in planta cannot be determined from the data provided in the paper. Additionally, the control group was not immunized with a dosing vehicle and/or a suitable control protein, which brings into question the suitability of the negative control data. Consequently, the effect of variable experimental handling procedures among the treatment groups cannot be dismissed as a potential source of differences observed, and one must consider these factors in determining the study's reliability. The conclusions of the protein studies deemed "Reliable Without Restriction" are summarized in Table 4.

Stacked Trait Crops
Stacked trait crops derived by conventional breeding do not require mandatory in vivo testing because; (1) the traits being combined have already been individually shown to be safe, and (2) there is no scientific basis to question the safety of conventional breeding practices, whether or not the parental lines contained GM traits (provided there is no reason to anticipate interaction). As a result, fewer studies on such products are available for review. Six studies with GM crops were identified for evaluation with the FSSRTool, and the results of these analyses are provided in Table 6. Of the six papers evaluated, all six were determined to be "Reliable Without Restriction" and Table 4 presents a summary of these studies' conclusions.

Nutritionally Enhanced Crops
Improving the nutritional profile of crops has been another application of biotechnology, and several studies have been conducted with GM crops modified to produce beneficial nutrients. Examples include high lysine maize and stearidonic acid soybeans. These crops provide essential amino acids and heart-healthy omega-3 fatty acids, respectively, in commonly consumed foods and feeds that do not naturally contain such benefits. Thirteen publications with nutritionally enhanced GM crops were identified for evaluation with the FSSRTool. The results of these analyses are provided in Table 7 with the corresponding Klimisch Category scores, and a list of the study element(s) that did not meet the FSSRTool evaluation criteria. Based on the Final Klimisch Category score of the thirteen papers evaluated, seven were determined to be "Reliable Without Restriction," three were considered "Reliable With Restrictions," and three were considered "Not Reliable." Common deficiencies among the studies assigned a Final Klimisch Category of 3 included failure to: adequately demonstrate the identity of the test substance, provide a source for the test substance, use an appropriate design for the endpoints under investigation, and consider the toxicological relevance of their findings in the context of what is the normal range of variability for the test species. The conclusions of the studies deemed "Reliable Without Restriction" are summarized in Table 4. Herbicide tolerant crops (Hammond et al., 2004) Roundup ready corn is as safe and nutritious as existing commercial corn hybrids. (Sakamoto et al., 2007) Long-term intake of GM soybeans in diet has no apparent adverse effect in rats. (Appenzeller et al., 2008) 356,043 soybeans are as safe and nutritious as conventional non-GM soybeans. (Chukwudebe et al., 2012) CV127 soybeans are as safe, wholesome and nutritionally valuable as its near isogenic control variety. (Rhee et al., 2005) GM crops have no adverse effects on multigenerational reproductive-developmental ability. (Appenzeller et al., 2009a) The safety and nutritional value of Optimum Ò GAT Ò corn is comparable to nontransgenic hybrid field corn. Herbicide tolerance proteins (Stagg et al. 2012) The AAD-1 protein, expressed in DAS-40278-9 maize, represents a negligible risk to human health. (Harrison et al., 1996) Glyphosate-tolerant soybeans are as safe and nutritious as traditional soybeans.  There is a reasonable certainty of no harm from including the 2mEPSPS protein in food or feed. (Delaney et al., 2008c) The GAT4601 protein does not present a risk to humans when used in agricultural biotechnology. (Mathesius et al., 2009) The GM-HRA protein is safe when used in agricultural biotechnology. (Herouet et al., 2005) There is a reasonable certainty of no harm resulting from the inclusion of the PAT proteins in food or feed. Insect resistant crops  Bt corn had no measurable or observable effect on fetal, postnatal, pubertal, or adult testicular development. (Hammond et al., 2006a) MON 863 is as safe and nutritious as existing conventional corn varieties. (Hammond et al., 2006b) MON 810 is as safe and nutritious as existing commercial corn varieties. (Teshima et al., 2002) CBH351 GM corn does not have any effect on immune-related organs or allergenic potential in rats or mice. (Schroder et al., 2007) No adverse or toxic effects were noted for KMD1 rice when tested in the design used in this 90day study. (Cao et al., 2010) The Cry1C protein is not a potential allergen or toxin. (Poulsen et al., 2007a) Dietary exposure to transgenic rice expressing PHA-E lectin, a known mammalian toxin, was associated with biological, biochemical, microbiological, and pathological differences in rats. (Poulsen et al., 2007b) Adverse effects attributable to the test substance were not noted following dietary exposure to GNA lectin rice. Insect resistance proteins (Onose et al., 2008) No significant toxicological effects were detected for Cry1Ab in a gastrointestinal impairment rat model. (Juberg et al., 2009) The Cry34Ab1 and Cry35Ab1 proteins do not represent a risk to human health. (Kroghsbo et al., 2008) PHA-E lectin, a known mammalian toxin, had an immunomodulatory effect when feed to rats for 90 days. (Xu et al., 2009) There is a reasonable certainty of no harm from including Cry1Ab/Ac protein in human food or animal feed. Stacked trait crops (Healy et al., 2008) MON 88017 is as safe and nutritious as existing commercial corn hybrids. (Dryzga et al., 2007) Meal from Cry1F/Cry1Ac GM cottonseed did not produce any untoward effects and is nutritionally equivalent to cottonseed meals from non-transgenic cottonseed. (Appenzeller et al., 2009b) 1507 £ 59122 maize is as safe and nutritious as non-GM maize, and crossing two safe GM maize events results in production of a safe stacked GM event. (He et al., 2008) 59122 maize is as safe and as nutritious as non-transgenic maize. (MacKenzie et al., 2007) 1507 maize is as safe and as nutritious as non-GM maize. (Malley et al., 2007) 59122 maize is nutritionally equivalent to and as safe as conventional maize. Nutritional enhanced crops (Liu et al., 2004) High-gamma-linolenic acid canola oil (HGCO) is a safe and cost-effective source of gammalinoleic acid (GLA). (Palombo et al., 2000) The growth and hepatic metabolism of the n-6 fatty acids was similar whether the GLA was from high g-linolenic acid canola oil or borage oil (BO). (Wainwright et al., 2003) High-g-linolenic acid canola oil is an acceptable alternative to BO as a source of GLA.  Y642 transgenic lysine-rich maize is as safe and nutritious as conventional non-GM maize grain. (Momma et al., 2000) The novel GM rice was no different from the existing rice in its safety. (Delaney et al., 2008b) 305423 soybeans are as wholesome and nutritious as conventional non-GM soybeans.  28-and the 90-day studies confirm the safety of SDA soybean oil. Disease resistant and miscellaneous crops and proteins (Chen et al., 2003) GM sweet pepper or tomato products, as whole foods, are as safe as their traditional counterparts. (Fuchs et al., 1993) The NPTII protein is readily degraded, does not compromise the efficacy of aminoglycoside antibiotics, does not possess the attributes of known protein food allergens, is not toxic to mammals and so it presents no risk for human or animal consumption.

Disease Resistant and Miscellaneous Crops and Proteins
GM crops with modifications to improve their resistance to disease or other miscellaneous traits have also been evaluated for safety as food. For example, crops that have resistance to pathogenic plant viruses such as cucumber mosaic virus (CMV) have been developed. Because CMV is very easily spread and causes severe damage to the host plant, there is a potential for substantial reductions in yield in susceptible plants. Resistant plants provide a guard against such yield losses. Three publications with GM crops and two with proteins introduced into these GM crops were identified as candidates for evaluation with the FSSRTool. The results of the whole food study evaluations are provided in Table 8, and the results of the protein study evaluations are provided in Table 3. Of the three whole food studies evaluated, one was determined to be "Reliable Without Restriction," based on its Final Klimisch Category score. One of the two protein studies evaluated was determined to be "Reliable Without Restriction" based on its Final Klimisch Category score. The conclusions of the whole food and protein studies deemed "Reliable Without Restriction" are summarized in Table 4.

ASSESSING RELIABILITY OF GM CROP FEEDING STUDIES
Traditional toxicological testing of discrete chemicals can be accomplished by adding different concentrations of the chemical directly to commercially available laboratory animal diets. However, in the case of a food from either non-GM or GM crops, the test substance is an integral nutritional component in the diet and must be formulated along with the other components to maintain essentially isonitrogenous and isocaloric nutritionally balanced rodent diets. This requires characterization of the nutrient content of test and control substances. Feeding nutritionally unbalanced diets is likely to cause adverse effects that would confound interpretation of the results of the study (Codex, 2009;EFSA, 2008a). Additionally, the levels of potential exogenous components (e.g., pesticides residues, mycotoxins, etc.) and endogenous antinutrients (e.g., trypsin inhibitors in soybean) should be analyzed to confirm they are within acceptable limits (Codex, 2009;EFSA, 2008a;EFSA 2011b) to avoid the possibility of confounding effects from these substances. Confirmation that the test and control diets were formulated correctly (i.e., GM crop in the test diet and not in the control diet) can be done using chain-of-custody record keeping, PCR for the genetic event or lateral flow for the protein expression product. Such confirmation is critical to ensuring that any findings are in the context of having the intended test substance present in the test diet to determine if a cause and effect relationships exists between test conditions and findings in vivo. The use of an appropriate comparator to assess potential differences between groups of animals fed diets prepared with either GM or conventional crops is also essential. Generally, the comparator should be a near isogenic line with a similar genetic background to the test substance, but lacking the introduced genes present in the GM crop (Codex, 2009;EFSA, 2008a;EFSA, 2011aEFSA, , 2011bEFSA, , 2011c. Furthermore, the test and near isogenic control should be grown at the same time and at the same location (Codex, 2009;OECD, 2001b;OECD, 2002a) as a means of reducing compositional variability. Snell et al. (2012) have observed that a major flaw in some GM crop studies was the absence of an appropriate non-GM control comparator. They also commented that based on their review of published studies, "internationally agreed test methods should be used for toxicity testing," since a number of the studies reviewed used an inadequate experimental design (Snell et al., 2012).
Regarding the studies evaluated with the FSSRTool in this review, two-third were considered either "Reliable Without Restrictions" (39/70 or 56%) or "Reliable With Restrictions" (8/70 or 11%), and one-third were considered "Not Reliable" (23/70 or 33%). Those studies that were considered to be "Reliable Without Restrictions" consistently found no evidence that the GM crops tested produced any toxicologically relevant adverse effects (Table 4), even when fed to laboratory animals at levels considerably in excess of that which humans might encounter (>10 to 100£) (EFSA, 2008a). The results of individual FSSRTool evaluations of studies conducted with herbicide tolerant crops (e.g., soybean expressing CP4 EPSPS), insect resistant crops (e.g., corn expressing Cry1Ab, Cry34Ab1, or Cry35Ab1), stacked trait crops (e.g., Cry1F, Cry3Bb1, PAT, and CP4 EPSPS corn), nutritionally enhanced crops (e.g., high oleic acid soybean), and disease resistant crops (e.g., CMV resistant tomatoes) indicate they are as safe as their conventional counterparts and that there is reasonable certainty of no harm from consumption of these crops.
A few publications on the food safety of GM crops did not lend themselves to evaluation by the criteria developed for the FSSRTool, because they involved the statistical re-analyses of data from previously published studies (de Vendomois et al., 2009;Seralini et al., 2007). Toxicologists and statisticians have criticized these re-analyses as being technically flawed, and concluded that they provide no new evidence of treatment related adverse effects (Doull et al., 2007;EFSA, 2007a;EFSA, 2007b;EFSA, 2010). Other studies that were widely circulated in the media that reported adverse reproductive effects in animals fed different GM crops were never published in peer reviewed journals and were not assessed in this publication, although experts that reviewed these studies considered them to be technically flawed (EFSA, 2008b;Marshall, 2007). We concur that the results are not reliable because they do not comply with multiple criteria within the FSSRTool.
A recent review that included multigenerational reproduction studies with GM crops fed to different animal species found no evidence of adverse effects on reproductive performance (Snell et al., 2012). This review also considered the quality of studies designed to assess the safety of proteins introduced into GM crops. As shown in Table 3, a variety of acute and repeat dose animal studies have been carried out with proteins and a number of them were classified Klimisch categories 1 or 2. These studies found no evidence of treatment related adverse effects, which is consistent with existing safety data for enzymes used in food processing (Hammond and Cockburn, 2008;Pariza and Johnson, 2001;Spok, 2006). Similarly, Cry proteins introduced into GM crops to control insect pests have been confirmed to be non-toxic when tested in mammals (Betz et al., 2000;Bondzio et al., 2008;Hammond and Koch, 2012;Shimada et al., 2006). Furthermore, Cry proteins introduced into GM crops to protect the plant against insect pests are derived from Bacillus thuringiensis microbial pesticides that have been safely used in organic agriculture for decades (Betz et al., 2000;Federici and Siegel 2008;Hammond and Koch, 2012;OECD, 2007;WHO, 1999a), and this history of safe use of the source organism in organic agriculture suggests that the extensive testing of insect protected GM crops was likely unnecessary.
The safety assessment of proteins introduced into GM crops typically includes sequence homology (bioinformatics) searches comparing the protein of interest with all known protein allergens (Delaney et al., 2008a). Evaluation of potential allergenicity also includes an assessment of the protein's susceptibility to enzymatic degradation (e.g., pepsin digestion) and/or denaturation by heat treatment. If the protein is derived from an organism that is known to elicit allergic reactions when consumed (e.g., soybeans), then its potential to specifically bind IgE class antibodies from the serum of individuals with a clinically validated allergy to the source of the protein is tested using in vitro assays (Codex, 2009). The weight of evidence from all of these assays is then considered to decide whether or not the protein of interest presents a risk as an allergen. This integrated approach does not currently include animal studies because there is no validated animal model to predict whether a protein will be allergenic in humans (Codex, 2009;Goodman et al., 2008;Thomas et al., 2009). Similar bioinformatics searches are performed to assess whether the introduced protein is related to proteins that are toxic or pharmacologically active in mammals. Alternately, if bioinformatics demonstrate the protein is structurally/functionally related to proteins and/or protein families that have a history of safe use in foods and whose function does not raise safety concerns, this can also support the weight of evidence of the protein's safety.
A few studies, listed in Table 5, report adjuvant effects of Cry proteins that were classified as Klimisch Category 3 (i.e., Not Reliable), because of various deficiencies with the study designs (Table 3, supplemental information). Immunogenic effects were reported in studies where Cry proteins were administered at high dosages by gavage to mice Vazquez-Padron et al., 1999;Vazquez-Padron et al., 2000;. In other studies, feeding of GM foods containing Cry proteins such as Bt rice (Kroghsbo et al., 2008) or Bt maize (Finamore et al., 2008) were reported to have induced immune responses in mice, while another study with Bt maize did not report significant effects on the immune responses in mice (Adel-Patient et al., 2011). While the mouse model has some similarities to human immunological mechanisms (Adel-Patient et al., 2011), some investigators have acknowledged the limitations of using one highly sensitive inbred mouse strain to predict immunologic effects in humans, which was likened to ". . .repeatedly sampling a single individual. . ." and noting that "such a situation would be regarded as absurd if applied to human immunology" (Hayday and Peakman, 2008). In general, animal models have not been considered to be sufficiently validated to accurately predict potential allergenic or immunologic (e.g., adjuvant) effects in humans from dietary exposure to proteins (Codex, 2009;Goodman et al., 2008;Thomas et al., 2009). Furthermore, some of the results of earlier studies reporting immunologic effects were not reproduced in subsequent studies (Adel-Patient et al., 2011). This discrepancy was attributed to possible endotoxin contamination in Cry protein preparations prepared heterologously in E. coli and used in earlier studies (Adel-Patient et al., 2011). Additionally, other earlier studies reporting immunological effects in mice included the coadministration of large doses of Maalox to neutralize stomach acidity and slow down the digestion of the Cry proteins (an administration scenario inconsistent with common consumption patterns). Lastly, the dosages of Cry proteins administered to mice in the earlier studies were generally many orders of magnitude higher than humans would ever encounter from consumption of insect protected GM crops. They would also greatly exceed potential human dietary exposure to Cry proteins resulting from spraying of commercial Bt microbial pesticides on vegetable crops shortly before harvest (as has been standard commercial practice for many years in organic vegetable crop production). Consequently, the exposure scenarios created in the studies reporting immunological effects were considered unrealistic due to the low levels of Cry proteins consumed by humans from Bt maize crops (Guimaraes et al., 2010;Hammond and Koch, 2012).

The Impact of This Review's Results on Continued Testing to Detect Unintended Adverse Effects
Despite thorough safety evaluation of GM crops, some continue to raise concerns about the food safety of GM crops and advocate the continued use of new designs, more and longer term animal toxicology studies. All animal studies with whole foods have inherent limitations due to the inability to feed high levels of a food to rodents while maintaining a nutritionally balanced diet. Moreover, as stated previously by EFSA and Codex, animal feeding studies provide little useful information when there is no evidence of unintended adverse changes in the crop based upon composition, agronomic comparison, molecular characterization of the introduced gene(s), and substantial equivalence between the GM crop and its non-GM comparator (EFSA, 2008a). As the typical 90-day study design (OECD, 1998) has been criticized by some for lacking sufficient sensitivity (Poulsen et al., 2007b), EFSA has been asked by the EU to make the 90-day rat study with whole foods more robust in its ability to detect potential "unintended" adverse effects in GM crops (EFSA, 2011c). Accordingly, a revised statistical design and analysis of the 90-day rodent feeding studies was proposed by EFSA in an attempt to increase the sensitivity of the study to detect statistical differences between animals fed a GM crop and its parental control (EFSA, 2011c). These revisions are a significant departure from the statistical analysis procedures that have been commonly used and well accepted by regulators to assess the safety of substances that are either intentionally or inadvertently added to foods (e.g., food additives, pesticides, etc.). Furthermore, the biological relevance of changes detected by statistical analysis with heightened sensitivity must still be assessed against the normal background for measured test parameters as defined by historical control data (i.e., a statistically significant difference between test and control groups may still be in the normal range for the animal model being utilized in the experiment and therefore not toxicologically relevant). Thus, the proposed changes are probably unlikely to meaningfully improve upon the inherent limitations of 90-day rodent toxicology studies with whole foods, for which a valuable body of data has already been derived and published, vide supra. However, the perception that the new EFSA changes would significantly improve the sensitivity of 90-day studies may encourage the use of more animal studies with GM crops. The duplication of 90-day studies on the same GM crop that has been already undergone testing in several 90-day studies (no adverse effects reported) has also been proposed in Europe (GMO Risk Assessment and Communication of Evidence, 2012). Given the results of the present systematic review, and the position that, "an experiment may not be performed if another scientifically satisfactory method of obtaining the result sought, not entailing the use of an animal, is reasonably and practically available" (EC, 2010ab), we propose that animal studies only be undertaken when there is a clear hypothesis to be tested and it is determined that an animal study will provide the most appropriate method to test the hypothesis. Otherwise, many more animal studies may be undertaken in the future for which there is no clear hypothesis to be tested and these studies may not generate any more useful information than what has been generated during the last 25 years of extensive research carried out in Europe.
After many years of research on GM crops, where over 200 million Euros were expended, the EC concluded that "The main conclusions to be drawn from the efforts of more than 130 research projects, covering a period of over more than 25 years of research, and involving more than 500 independent research groups, is that biotechnology, and in particular GMOs, are not per se more risky than e.g. conventional plant breeding technologies" (EC, 2010b). More recently, the EC's Chief Scientific Advisor, Anne Glover, has come to similar conclusions, stating to the press, "There is no substantiated case of any adverse impact on human health, animal health or environmental health, so that's pretty robust evidence, and I would be confident in saying that there is no more risk in eating GMO food than eating conventionally farmed food," (Fleming, 2012).

CONCLUSIONS
In conclusion, the FSSRTool was developed using evaluation criteria supported by peer-reviewed publications and independent laboratory guidelines. The FSSRTool appears to provide a useful method for evaluating the reliability of published toxicology studies to facilitate the risk assessment of GM crops. Individual FSSRTool evaluations of the body of literature on the safety of GM crops and proteins identified reliable feeding studies conducted with GM crops; these studies conclude that GM crops are as safe as conventional crops. These findings are consistent with other reviews where the quality of published studies was assessed (EFSA, 2008a;Snell et al., 2012). The lack of adverse effects in toxicology studies with GM crops attests to the strength of the weight of evidence approach to GM crop safety assessments advocated by the Codex Alimentarius commission of FAO/WHO. This approach emphasizes; (1) characterization of the molecular insert for the introduced genetic material, (2) bioinformatic and in vitro safety assessment of the introduced trait(s), (3) agronomic performance of the new crop variety in the field under varied environmental conditions, and (4) compositional analysis of the food/feed. Consequently, when the weight of evidence finds no meaningful differences between the GM crop and its control comparator, it is reasonable to presume that a rodent feeding study is unlikely to provide any additional insight into the safety of the GM crop. This would support the stated desire of many groups to reduce, replace, and refine animal testing in those circumstances where routine conduct of animal toxicology studies with GM crops and proteins is neither scientifically nor ethically justified.