Characterisation of genetic structure of the Mayan population in Guatemala by autosomal STR analysis.

BACKGROUND
Currently, the Guatemalan population comprises genetically isolated groups due to geographic, linguistic and cultural factors. For example, Mayan groups within the Guatemala population have preserved their own language, culture and religion. These practices have limited genetic admixture and have maintained the genetic identity of Mayan populations.


AIM
This study is designed to define the genetic structure of the Mayan-Guatemalan groups Kaqchiquel, K'iche', Mam and Q'eqchi' through autosomal short tandem repeat (STR) polymorphisms and to analyse the genetic relationships between them and with other Mayan groups.


SUBJECTS AND METHODS
Fifteen STR polymorphisms were analysed in 200 unrelated donors belonging to the Kaqchiquel (n = 50), K'iche' (n = 50), Mam (n = 50) and Q'eqchi' (n = 50) groups living in Guatemala. Genetic distance, non-metric MDS and AMOVA were used to analyse the genetic relationships between population groups.


RESULTS
Within the Mayan population, the STRs D18S51 and FGA were the most informative markers and TH01 was the least informative. AMOVA and genetic distance analyses showed that the Guatemalan-Native American populations are highly similar to Mayan populations living in Mexico.


CONCLUSIONS
The Mayan populations from Guatemala and other Native American groups display high genetic homogeneity. Genetic relationships between these groups are more affected by cultural and linguistic factors than geographical and local flow. This study represents one of the first steps in understanding Mayan-Guatemalan populations, the associations between their sub-populations and differences in gene diversity with other populations. This article also demonstrates that the Mestizo population shares most of its ancestral genetic components with the Guatemala Mayan populations.


Introduction
There is generally a low level of genetic variation in human populations (Excoffier, 2003), but local factors, such as geography and differential settlement, can greatly enhance genetic discontinuity (Cardoso et al., 2012;Dos Santos et al., 2009). STRs are most commonly used in forensics (Butler, 2005); however, these widely distributed and highly mutable elements are also excellent markers for establishing genetic structure, diversity and genetic differentiation within human populations (Sun et al., 2013b). In fact, STR analysis can be used to reconstruct recent human history and assess the phylogenetic relationships between populations on a global level (Martinez-Cortes et al., 2010;Sun et al., 2013a).
Language is currently the most important element of heritage identity in Guatemala. Although Spanish is the official language, 24 Amerindian languages constitute 40% of the spoken language in Guatemala. A particularly important group of languages-K'iche', Kaqchikel and

Sample collection and DNA purification
Blood samples were collected from 200 unrelated male and female donors of the Mayan-Guatemalan populations Kaqchiquel (n ¼ 50), K'iche' (n ¼ 50), Mam (n ¼ 50) and Q'eqchi ' (n ¼ 50). Details about the geographic location of each ethnic group in Guatemala and their nearby Native American populations are shown in Figure 1. At the time of sample collection, a questionnaire was given to each donor to determine his or her genealogy and linguistic affiliation. Written informed consent was acquired from each donor, according to the Helsinki Declaration Ethical Guidelines.
The samples ($25 mL) were spotted on FTA ß paper by members of INACIF (Instituto Nacional de Ciencias Forenses de Guatemala). DNA was extracted by standard FTA Õ protocols (Whatman, Clifton, NJ). FTA cards were punched (1.2 mm) to acquire samples for each amplification process.

Data analysis
Allelic frequencies, unbiased estimates of observed and expected heterozygosities (Ho and He, respectively) and possible divergence from Hardy-Weinberg equilibrium (p) were calculated using Arlequin v.3.11 software (Excoffier & Lischer, 2010). Power of discrimination (PD), power of exclusion (PE), typical paternity index (PI) and observed heterozygosity were calculated for each locus using PowerStats v.12 software (Tereba, 1999). The exact test, based on 5000 shuffling experiments and inter-class   Ribeiro Rodrigues et al. (2007) correlation criterion for two-locus associations, was used for detecting disequilibrium between the STR loci. The genetic structure of the sampled populations was then investigated by analysing the variance framework (analysis of molecular variance: AMOVA) (Excoffier et al., 1992). The apportionment of genetic variation between and within populations was estimated by comparing allele frequencies using Arlequin v.3.11 per linguistic and geographical classification criteria. The statistical significance of F-values was estimated by permutation analysis using 10 000 random permutations.
To infer genetic relationships between Mayan and other Native American populations, including admixed groups from Guatemala and Mexico, the allele frequencies of the same 15 STR markers were analysed in other Native American populations from Siberia, North America and Central America using data collected from previous studies (Table 1).

Diversity within Mayan populations
The allele frequencies of the 15 autosomal microsatellite loci within the Mayan populations are shown in Tables 2-5. The statistical forensic parameters in each Mayan group are reported in Table 6. None of the analysed markers deviated from Hardy-Weinberg equilibrium.   D18S51 and FGA were the most informative markers in the Native American populations (PD values higher than 0.945) and TH01 was the least descriptive locus in this study (ESM1). The combined power of discrimination in all populations was higher than 0.99999. The combined parameters of interest for population genetics and the average heterozygosities are summarised in ESM2.
AMOVA analysis of the Guatemalan Mayan populations revealed that much of the variance can be attributed to intrapopulation differentiation (99.61%, p50.001), whereas the diversity between the four groups was 0.39% (p50.001) (Table 7). Furthermore, if we divide the populations belonging to the Mayan language into two groups (Greater K'iche'an and Greater Mamean languages), by AMOVA analysis, 0.29% of the variability is between populations within groups vs 0.19% between groups (Table 7).
To study the genetic structure of Guatemala, the Mayan populations were clustered with the Mestizo population (Martinez-Espin et al., 2006). AMOVA analyses demonstrated few differences between the Mestizo and Mayan populations (0.35%), similar to those observed between the four Mayan groups (0.36%; Table 7).
These results were confirmed by STRUCTURE analysis (ESM3). No evidence of the presence of a significant genetic structure within the four Mayan groups was observed. The model with the highest posterior probability value was K ¼ 1 (ln P(D) ¼ À9625.6). When the four Mayan groups were analysed individually, no genetic sub-structure was detected (model with the highest posterior probability value was K ¼ 1). Furthermore, when the four Mayan populations were studied with the Guatemalan Mestizo population, K ¼ 2 was the model with the highest posterior probability (ln P(D) ¼ À20 034.8) (Figure 2).
AMOVA analysis was carried out between 16 Native American populations using 15 STR markers comparing geographical and linguistic criteria. Linguistic population grouping demonstrated that the variance was higher between populations (0.22%) than between groups (0.07%). However, geographic population grouping demonstrated greater variance between populations (0.26%) ( Table 8)  predominantly shared a region in the centre of the graph. Guatemalan Mestizos were closest to the Guatemalan Mayan populations, between the Mayan groups and the Native American descendents in the US. Mexican Mestizos were closer to Mayan Native American populations than Brazil or Puerto Rico Mestizos. Similar results were obtained by correspondence analysis (CA) using the same allele frequencies (data not shown).

Discussion
STR markers are powerful tools that are commonly used for forensic human identification, but here, we used these loci to define the population structure of Guatemala. For all Native American populations, the most informative markers were D18S51 and FGA. These same results were observed in previous studies involving Native American populations Allele frequencies revealed few differences in the total number of alleles between the four Mayan groups: Q'eqchi' had the highest number of allelic types (113), followed by Mam (106), K'iche' (105) and Kaqchiquel (104). Similar results were observed in previous studies in Guatemalan K'iche' and Kaqchiquel populations, in which 96 and 112 allelic types were, respectively, described (Ibarra-Rivera et al., 2008). The differences observed between both studies were attributed to the number of samples analysed.
The combined matching probabilities (CMPs) and average gene diversity values indicated that the discriminatory power of the loci analysed was strongest in the Kaqchiquel population, occurring in the Kaqchiquel and Yucatan populations described by Ibarra-Rivera et al. (2008). This Mayan group's distinction may be the result of Spanish genetic influence.
Heterozygote deficiencies are indicative of isolation, bottlenecks, founder effects, selective pressures, recent population expansions and/or non-random mating (Maria Saiz et al., 2014). Given the historical events leading to contractions and subsequent expansions of Mayan communities, it is likely that bottlenecks and/or founder effects influenced the patterns of genetic diversity, although the alternatives, especially inbreeding, cannot be excluded (Schurr & Sherry, 2004;Wang et al., 2007).
The possibility of population sub-structuring must be carefully considered when DNA databases are used in forensic casework (Devlin et al., 2001). In this case, a population sub-structure was not detected when the four Mayan populations from Guatemala were compared by AMOVA analysis or STRUCTURE; so, they can be considered a single genetic group (Figure 2). However, STRUCTURE analysis revealed that the Native American component was slightly more noticeable in the Kaqchiquel and Q'eqchi' populations. All genetic distances demonstrated that Kaqchiquel and Q'eqchi' were very close to Mayans of Table 7. AMOVA design and results for analysis in the Guatemalan population. (a) Four Mayan groups divided according to their language group; Greater Mamean (Mam) and Greater K'iche'an (Q'eqchi', Kaqchiquel and K'iche'  the Yucatan populations (Campeche y Yucatan), even more than to other Mayans from Guatemala. This could be attributed to gene flow between the native populations or it may have been previously established. The characterisation of the four Mayan population groups revealed a significant relationship between them (FST ¼ 0.00392; p40.05). Furthermore, this gene diversity was similar to that found in other Native American populations. Prior to Hispanic influence, the Central American Mayan groups were more numerous and had stronger political, religious and social cohesion than groups in North and South America. Currently, Guatemala has preserved this genetic structure, as evidenced by AMOVA and genetic distance analysis, as previously described in Mexico (Rubi-Castellanos et al., 2009).
AMOVA analyses of these populations demonstrated fewer variations within the groups separated by language (0.22%) than by geographical location (0.26%). This indicates social cohesion in Mayan groups. The MDS analysis grouped all Mayan populations together (left area), as seen in Figure 3. Similar results have been observed in Central and Southern Mexico using SNP analysis (Moreno-Estrada et al., 2014).
In this study, the resolution of the analysed loci revealed previously hidden diversity within Native American genetic pools. The distinct migration patterns, accumulated mutations and marital behaviours in different areas of America may account for the variation observed in the admixed Native American populations. Therefore, autosomal STR loci may be suitable for modern historical studies, but are less powerful for ancestral studies (Wang et al., 2007). Analyses using markers that are more specific to ancestry, such as AIMS (Pereira et al., 2012), the Y chromosome (Martinez-Gonzalez et al., 2011) and mtDNA (Fagundes et al., 2008), should be performed to explore this relationship.
The diversity uncovered in this study could be explained by the number of migrations into Guatemala, which is always lower than the number of indigenous natives; however, it may also be explained by genetic flow between the Mayan and Mestizo populations. Despite the large number of Native American inhabitants in Guatemala during the colonial period, historical records report a drastic demographic decline once the Spaniards arrived (Wang et al., 2008). Subsequently, a new group developed-the Mestizos-which is currently the most abundant group in Guatemala. This group originated primarily by admixture of Native American women and European men. This differs from previous studies in neighbouring countries, like Mexico (Martínez-Cortés et al., 2012, 2013. However, a recent PCA analysis of a Mexican cohort found that Indigenous Americans, such as the Mayan and Quechua, have a slight European genetic component (Johnson et al., 2011).

Conclusion
This study represents one of the first steps in understanding Mayan-Guatemalan populations, the associations between their sub-populations and differences in genetic diversity with other populations. Guatemalan Mayan populations are genetically very close within themselves, but also between themselves and other populations. However, other Native American populations, even those that share the same linguistic origin, could have greater genetic distance from Guatemalan Mayan populations. This fact is due to contractions and subsequent expansions of Mayan communities. This article also demonstrates that the Mayan community is a major contributor to the Mestizo population. The Mestizo population shares most of its ancestral genetic components with Guatemalan Mayan populations. Therefore, the influence of Mayans over Mestizos was higher in autosomal STR markers than in the Y-chromosome.