Population structure of the giant tiger shrimp Penaeus monodon in Bangladesh based on variation in microsatellites and immune-related genes

ABSTRACT Penaeus monodon is the most economically important penaeid shrimp in the Indo-West Pacific region for both aquaculture and wild capture. This study evaluated the population structure of P. monodon in Bangladesh and searched for signs of natural selection in immunological genes. Wild P. monodon sampled at four locations off Bangladesh were studied using 10 microsatellite markers and 14 single nucleotide polymorphic sites (SNPs), including six unannotated regions and eight immunological genes (seven C-type lectin and one HLA3), of which variations were associated with tolerance to pathogens and survival in aquaculture. Three genetically distinguishable populations were observed, one in the eastern mangrove forest in the Sundarban–Barguna coast (south Bangladesh), the second in the Sundarban–Middle ground close to the delta of the river Ganges and the third at St Martin’s Island, southeastern Bangladesh. ranged from 0.042 to 0.093, and significant FST ranged from 0.004 to 0.011. A large proportion of individuals caught in Middle ground and at St Martin’s Island showed a genetic origin that was distinct from the other individuals at those sites, possibly from unsampled southern regions in the Bay of Bengal. Variation and the FST values for the SNPs from C-type lectin genes and HLA3 did not differ from the other markers studied, indicating no selective effects at these loci in the natural populations. Sustainable management of P. monodon should consider the population differentiation described in this study. Further assessment of signs of selection in aquaculture stocks could be useful for successful breeding of the species.


Introduction
The giant tiger shrimp, Penaeus monodon Fabricius, 1798, is an important penaeid shrimp for aquaculture and marine capture in the Indo-West Pacific region (Motoh 1985;Kumar et al. 2007;You et al. 2008;Azad et al. 2009;Kamal & Khan 2009;Quader 2010;Mandal et al. 2012;Waqairatu et al. 2012;Vaseeharan et al. 2013;Debnath et al. 2015;Alam et al. 2016). In Bangladesh marine waters, it is the most targeted out of 36 species of shrimps caught, due to its high economic value (Quader 2010;Department of Fisheries Bangladesh 2014). Shrimp aquaculture in Bangladesh has been gradually expanding since the late 1980s (Debnath et al. 2015) and P. monodon is now the dominant cultured marine species, with most of the shrimp farms located in the southwestern region (Debnath et al. 2015). The species is cultured using different methods, ranging from extensive rearing of natural postlarvae in rice fields to semi-intensive rearing of natural/hatchery-produced postlarvae in aquaculture farms (Hoq et al. 2001;Azad et al. 2009;Debnath et al. 2015). Hatcheries, mainly located in the southeastern region, use both natural gravid females and farmed P. monodon for postlarvae production.
Diseases, particularly white spot syndrome virus (WSSV), are highly detrimental to shrimp aquaculture production (Escobedo-Bonilla et al. 2008;Hayes et al. 2010;Stentiford et al. 2012;Debnath et al. 2015). The management of shrimp aquaculture, with its vulnerability to viral diseases and the availability of high-quality gravid females for postlarvae production from the natural stock, is important for long-term sustainability of the shrimp industry in Bangladesh (Walker & Mohan 2009;Debnath et al. 2015). Broods from the deep zone of Bangladesh marine waters have been found to have higher reproductive performance and a lower incidence of WSSV infection than the broods caught from the shallow zone (Debnath et al. 2014); thus, the natural origin of the broods may matter for the successful breeding of the species.
The marine diversity of shrimp in Bangladesh is threatened by overexploitation (e.g. postlarvae collection, the use of banned fishing gear), coastal and marine pollution, the impact of coastal aquaculture, natural disasters (e.g. cyclones, sea level rise) (Quader 2010;Department of Fisheries Bangladesh 2013) and the escape of aquaculture shrimps (Kumar et al. 2007). The Department of Fisheries (DOF) in Bangladesh imposed a ban on wild prawn larvae fishing in September 2000 to minimize the potential negative effects on diversity from high levels of bycatch of non-target species caught incidentally (Department of Fisheries Bangladesh 2002; Ahmed & Troell 2010). To overcome the threats to shrimp biodiversity and to manage wild P. monodon populations in a sustainable way, basic information on genetic variation and population structure is of great importance as it can provide guidance in restoration efforts and for the sustainable harvest of local populations (Willette et al. 2014). Limited molecular genetic information is available for Bangladesh P. monodon, except for a recent mitogenomic study which described high variation and a panmictic population along the coast (Alam et al. 2016).
Genetic improvement through related biotechnological applications is of great importance for sustainable development in aquaculture (Gjedrem et al. 2012;Mandal et al. 2012;Debnath et al. 2015), and genetic variation is essential for artificial selection programmes that aim to enhance ecologically or economically important traits and to identify disease-resistant traits for aquaculture (Moss et al. 2012;Maralit et al. 2014). Furthermore, patterns of genetic variation carry signatures of population structure, indicating areas of recruitment and the effects of natural selection at linked loci, and, thus, provide insight into the biology of the species which could be of importance for management (e.g. Carvalho & Hauser 1994).
Penaeus monodon has been studied for sustainable aquaculture production (Kumar et al. 2007;Mandal et al. 2012) and has been a subject of numerous studies on genetic variation to conserve its genetic diversity through management of wild populations in the Indo-West Pacific region (Benzie et al. 2002;You et al. 2008;Mandal et al. 2012;Waqairatu et al. 2012;Alam et al. 2016). Population differentiation and the identification of stocks and nonindigenous P. monodon have been studied by microsatellites (Pan et al. 2004). Based on 10 microsatellite loci, significant genetic structure was identified among eight P. monodon populations from the Bay of Bengal, south of Bangladesh, with the highest genetic diversity at Andaman Island sites (Mandal et al. 2012). High levels of polymorphism were also reported from Malaysian waters, with highly significant heterozygote deficiencies indicating probable inbreeding in the populations or, alternatively, admixture of breeding units (Aziz et al. 2011). Further study, utilizing genomic DNA such as microsatellites, is warranted to examine population structure of P. monodon in Bangladesh waters.
Several nuclear markers are linked to resistance to WSSV infection in penaeids, including C-type lectins, which effectively participate in the initial step of pathogen recognition, as well as in various antimicrobial effector functions (Song et al. 2010;Zeng et al. 2013;Wang et al. 2014). Eight SNPs, including a linkage to C-type lectin, showed genome-wide associations in P. monodon , and individuals surviving more than 84 h following a WSSV infection showed higher levels of expression of C-type lectin (Jeswin et al. 2013). If WSSV is a problem in wild populations of P. monodon, variation in genes causing resistance to the virus may be reduced within populations and increased among populations due to natural selection (e.g. Luikart et al. 2003;Wan et al. 2004). Identification of SNPs contributing to the susceptibility to common diseases could be of importance for husbandry, stock enhancement and improved aquaculture production (Debnath et al. 2014(Debnath et al. , 2015. The aims of the present study were: (1) to examine genetic patterns of P. monodon in Bangladesh waters of the Bay of Bengal to identify putative management units, and (2) to assess variation and the possible effect of natural selection in genes that have been associated with survival and resistance to microbial pathogens in aquaculture.

Sample collection and DNA extraction
A total of 100 wild-origin P. monodon were collected and preserved in 96% ethanol from four locations along the Bangladesh coastline ( Figure 1) during the period of December 2012 to September 2013. Artisanal fishermen collected 20 subadult specimens (average weight of 40 g) from the Sundarban mangrove forest (SB). Samples of adults, with an average weight of 100 g, were collected from the Barguna coast (BC, 10 individuals), Middle ground (one of four fishing grounds) of the Bay of Bengal (MB, 40 individuals) and the coast of St Martin's Island (SM, 30 individuals) by fish/shrimp trawlers. Total genomic DNA was extracted from the pleopod tissue using standard phenol-chloroform extraction (Maniatis et al. 1982).

Polymerase chain reaction (PCR) and genotyping
Ten polymorphic microsatellite loci (Pan et al. 2004) and 14 unlinked SNPs, six unannotated, seven from C-type lectin genes and one from HLA3 , were genotyped (see supplementary material Tables SI and SII for details of microsatellites and SNPs, respectively). Amplification of microsatellite loci was performed in a final volume of 10 µl, which included 50 ng DNA, 0.2 mM dNTP, 0.1% Tween 20, 1× standard Taq Buffer (New England Biolabs), 0.5 mg Bovine Serum Albumin, 0.5 U Taq Polymerase and 0.34 mM each of the forward and reverse primers. The amplification protocol for all microsatellite loci included an initial denaturation at 95°C for 5 min, 35 cycles of denaturation at 95°C for 30 s, annealing at 52-56°C for 45 s and extension at 72°C for 1 min, and a final extension at 72°C for 10 min. Two microlitres of the PCR products (diluted 2×), with 0.1% GS500 (−250) LIZ size standard and 100% formamide mix (8.0 µl), were run in an Applied Biosystems 3500xL Genetic Analyser. Amplification of SNP loci was done using qPCR (7500 Real Rime PCR system, Applied Biosystems) in a final volume of 10 µl, which contained 100 ng DNA, 5.0 µl KASP master mix (2×) and 0.14 µl assay mix (72×) (LGC Limited), following the manufacturer's instructions. The amplification protocol for qPCR consisted of four stages: (I) activation at 94°C for 15 min, (II) 10 cycles including denaturation at 94°C for 20 s and annealing at 61-55°C for 1 min with a decrement of −0.6°C per cycle, (III) denaturation at 26 cycles of 94°C for 20 s and annealing at 55°C for 1 min, and (IV) post-PCR read at 30°C for 1 min. A second post-PCR read was performed by three additional cycles, at 94°C for 20 s and 57°C for 1 min.
Microsatellite allele sizes at each locus were determined from electropherogram images, using the GeneMapper software (version 4.1, Applied Biosystems). SNP genotypes were determined from allelic discrimination plots of the qPCR analysis for each of the SNP loci.

Data analysis
Microsatellites were inspected by looking for signs of stuttering and large allele dropout, and the frequency of null alleles was determined using MICRO-CHECKER (Van Oosterhout et al. 2006) and GENEPOP (Rousset 2008). Deviation from Hardy-Weinberg equilibrium, summarized with the inbreeding coefficient (F IS ), and genotypic disequilibrium were analysed by performing exact tests in GENEPOP (Rousset 2008), using a Markov Chain Monte Carlo (MCMC) approach with default parameters. Overall significance values across loci were obtained with Fisher's combined probability method, implemented in GENEPOP. Basic statistics (including allelic richness, observed heterozygosity and expected heterozygosity) of the data sets (i.e. microsatellite, nonannotated SNPs, annotated SNPs and all loci) were summarized with the HIERFSTAT package (Goudet 2005) in R (R Core Team 2015). Identification of candidate loci under natural selection for all loci (microsatellites and SNPs) was done using BayeScan (Foll 2012), with the probability for the test statistic adjusted due to multiple testing using the FDR method (q-values) in the package BOA (Smith 2007) in R (R Core Team 2015).
The number of population units (K value) and the multivariate ordination of the different genotypes for the combined data set of microsatellites and SNPs were investigated using Discriminant Analysis of Principal Components (DAPC) with the ADEGENET package (Jombart et al. 2015) in R (R Core Team 2015). Variation among populations was summarized by calculating F ST for microsatellites with or without null alleles and the corresponding allele size-based statistics for microsatellites (R ST ), and for the combined data set of microsatellites (without null alleles) and SNPs, using GENEPOP. Sequential Bonferroni correction was applied to adjust the significance level due to multiple tests (Rice 1989). Considering that F ST is dependent on heterozygosity within populations (Meirmans & Hedrick 2011), two unbiased methods of population comparisons, G ′′ ST (Meirmans & Hedrick 2011) and D ST (Jost 2008), were calculated, in addition to F ST , based on the microsatellite and SNP variation with the MMOD package (Winter et al. 2015) in R (R Core Team 2015). Wilcoxon tests were used to compare the heterozygosities and the F ST between the SNPs from the C-type lectin genes, HLA3 and the unannotated SNPs. Genetic distances (G ′′ ST ) between populations were visualized in a multidimensional scale plot (MDSP) utilizing the R package MASS (Venables & Ripley 2002) and the fit of the multidimensional scaling distances to the original distances (the stress) was estimated according to Kruskal (1964). Correlations between genetic distances for all loci (F ST ) and geographic distances were tested using the Mantel (1967) test implemented in the package APE (Paradis 2012) in R (R Core Team 2015).

Results
The average null allele frequency for the microsatellites was 0.05 ± 0.02 (Tables I and SIII) and was found to be significant at one locus in SB and BC, two in MG and three in SM, ranging from 0.08 to 0.23 (Table SIII). No signs of stuttering and large allele dropout were observed in the microsatellites. Four pairwise comparisons of the allelic variation at different loci showed indication of linkage disequilibrium (P = 0.004-0.027), but they were not significantly different from zero, when correction for multiple tests was applied (Rice 1989). On average, a deviation from a random association of alleles within populations was observed for all populations, except for BC, with an overall F IS = 0.07 ± 0.13 (Table I; see Table SIII for deviation at each locus within each population). The inbreeding coefficient was nominally larger for the microsatellites (F IS = 0.11 ± 0.07), with seven of 10 loci showing deficiency of heterozygotes, than for both the unannotated SNPs (F IS = 0.08 ± 0.17) and the immune-related SNPs (F IS = 0.00 ± 0.13) (Tables I and SIII).

Genetic diversity
The average overall observed heterozygosity (H O ) and expected heterozygosity (H E ) for all samples, based on both microsatellites and SNPs, were: H O = 0.57 ± 0.24 and H E = 0.62 ± 0.28, and were higher for the microsatellites (H O = 0.83 ± 0.05 and H E = 0.93 ± 0.04) than for both the unannotated SNPs (H O = 0.39 ± 0.14 and H E = 0.42 ± 0.12) and the immune-related SNPs (H O = 0.39 ± 0.10 and H E = 0.40 ± 0.10) ( Table I). Variation, summarized with the different statistics, varied among locations (Tables I, SIII and SIV). The average overall allelic richness (A R ) for both microsatellites and SNPs was similar among the sampling locations, microsatellites ranging from 10.6 in BC to 10.9 in SM, unannotated SNPs ranging from 1.90 in SB to 2.00 in BC, and immune-related SNPs ranging from 1.94 in MG to 2.00 in BC (Table I). For the microsatellites, the number of private alleles and the average number of alleles per locus were smallest in BC (8 and 11.6, respectively) and largest in MG (37 and 22.4) ( Table I;  see Tables SI and SIV for allele sizes and frequency of private alleles, respectively). Heterozygosity and the F ST of the SNPs from those of the C-type lectin genes and the HLA3 did not differ from those of the unannotated SNPs (Wilcoxon test, P > 0.05).

Detecting markers under selection
Variation at all markers, except for the microsatellite PM4868, was in compliance with the neutral expectation obtained with BayeScan, q values (adjusted P) ranging from 0.098 to 0.809 (Table SV). The alpha value for PM4868 was negative (−1.520; 95% CI: −2.926-0.307) with q = 0.016, indicating purifying or balancing selection. Since the microsatellite marker (PM4868) seems to be affected by selection that could, thus, affect the estimation of population divergence, it was omitted from the downstream analyses. The average F ST value for all markers, excluding PM4868, was 0.012 (sd = 0.002) and was considerably higher than for PM4868 (F ST = 0.003) ( Table SV).

Clustering of individuals and population differentiation
The DAPC analysis without prior information determined five clusters (K = 5, stat = 199.63; Figure 2a), two with individuals solely from the southerly locations at MG and SM and three with individuals from all locations, but in different frequencies (Figure 2c). The DAPC analyses with prior information of the sampling sites revealed separation among the four locations with the largest overlap between SB and MG (Figure 2b; see Figure 2d for the proportion of individuals in the clusters, representing four sampling sites, and Figure 3 for the posterior probabilities of samples from four locations). All population comparisons between sampling locations, based on nine microsatellite loci and all loci (nine microsatellites and 14 SNPs), were significant, except for the comparisons of SB with BC and MG. For microsatellite loci, significant F ST ranged from 0.002 to 0.010, and for all loci slightly larger differentiations were obtained (significant F ST ranged from 0.004 to 0.011) (Table II). By adjusting the allele frequencies with the null alleles, a slight reduction in F ST was observed for comparisons of SM with BC and SB, which were lowered by 10% but still remained significant (P < 0.01). The F ST between SM and MG was lowered to 0.0004, but remained significant (P < 0.05). Analysis of the microsatellites with F ST and R ST gave similar results; little or no differentiation was observed between SB and BC or MG, but the values from the other comparisons were larger and more pronounced for the R ST , especially between BC and MG (Table II). Where R ST was larger than zero, values were considerably larger than the F ST , and the ratio of R ST /F ST ranged from 2.7 to 15.4, indicating that allele sizes were more similar within populations than among populations. The interpretation of R ST should be made with caution as it is dependent on the mutational model. Similar patterns are observed by adding the SNP data to the analyses. By considering the dependency of F ST on heterozygosity within a population, the G ′′ ST and D ST statistics showed larger differentiation, as expected (Table II). Similar patterns were observed with the pairwise G ′′ ST and D ST values for all loci, as revealed by high correlation (r = 0.998) between them, but the former was about two times larger. G ′′ ST and D ST ranged from 0.042 to 0.093 and from 0.022 to 0.053, respectively (Table II), and the largest differentiation was observed between BC and SM (G ′′ ST = 0.093 and D ST = 0.053; Figure 4), followed by the differentiation between SB and SM, and between BC and MG (Table II; Figure 4). Less differentiation was observed between MG and SM, SB and BC, and SB and MG (G ′′ ST and D ST ranged from 0.042 to 0.055 and from 0.022 to 0.030, respectively) ( Table II). The fit of the distances in the multidimensional scaling was excellent, with a low stress value of 0.0002. Based on the Mantel test, differentiation between populations (F ST ) was not correlated with the geographic distances separating the sample sites (r = −0.13, P = 0.83).

Microsatellites and SNP variation, based on both DAPC and population differentiations (G ′′
ST and D ST ), revealed three distinct populations in Peneaus monodon sampled from four locations along the Bangladesh coastline of the Bay of Bengal: one in the east in the mangrove forest of the Sundarban-Barguna coast (south Bangladesh), the second at the Sundarban-Middle ground in the Bay of Bengal, and the third at St Martin's Island in southeastern Bangladesh. The population at St Martin's Island in south Bangladesh was differentiated from the other samples, and the population sampled at Middle ground differed from the one at the Barguna coast. The population samples from both Middle ground and the Barguna coast had a certain level of similarity with Sundarban samples, indicating an ongoing gene flow among these three locations, which may have resulted via the large mangrove forest, Sundarban, which is a major nursery ground for shrimps (Hoq 2007;Ghosh et al. 2015). As most of the aquaculture farms are located in the southwestern region of Bangladesh, the possible escape from aquaculture might have reduced the differentiation among Sundarban, Barguna coast and Middle ground. A large proportion of individuals caught in Middle ground and St Martin's Island shared a genetic origin that was distinct from the other individuals, indicating a possible influx from unsampled southerly regions. Previous studies of microsatellite patterns in P. monodon from the Bay of Bengal (Mandal et al. 2012) and Malaysia (Aziz et al. 2011) revealed a genetic structure where divergence reflected geographic distance. Distinct genetic differentiation was also reported   Letters for sampling locations correspond to Figure 1. ns, not significant after sequential Bonferroni correction. *Significant at 0.01 < P < 0.05. **Significant at 0.001 < P < 0.01. ***Significant at P < 0.001. among P. monodon populations from the Gulf of Thailand (Tassanakajon et al. 1998) and Australia (Li et al. 2007). Here, we did not detect an association with distance, but neighbouring populations sampled only 95 km apart were found to be differentiated, pointing to a limited connection between the regions. Genetic variation estimated using microsatellites was high, and characterized by heterozygote deficiency and a large number of private alleles. Similar genetic variability was reported in P. monodon sampled from India (Mandal et al. 2012), Malaysia (Aziz et al. 2011) and the Indo-West Pacific region (You et al. 2008). In general, most marine decapods, including penaeids, have shown high microsatellite diversity (Benzie 2000), reflecting large population sizes. High mitogenomic variation was also reported in Bangladesh P. monodon, where haplotype diversity of 520 bp of the control region was close to one (Alam et al. 2016). The average overall observed heterozygosity based on microsatellites and SNPs was slightly lower than the expected heterozygosity, which indicated a deficiency of heterozygotes. This pattern has been reported in previous genetic studies on P. monodon samples from India (Mandal et al. 2012) and the Indo-Pacific region (You et al. 2008). Seven of 10 microsatellite markers showed significant deviation from HWE with positive F IS values, indicating a lack of heterozygotes. One probable reason for the deficiency could be mutations in the primer binding sites that might result in null alleles (e.g. García et al. 1995;Aziz et al. 2011). Alternatively, such deviations could result from the sampling design or admixture of differentiated populations, as suggested for P. monodon in India (Mandal et al. 2012), due to different cohorts (Christiansen 1988), even breeding at different times of the year. Null allele frequencies below 0.20 have a limited effect on general population structure analyses and analyses of F ST with and without markers which showed a frequency of null alleles in the range of 0.1-0.2 did not affect the results (Chapuis & Estoup 2007). It is unlikely that the deficiency of heterozygotes is due to inbreeding, as the high variation in both marker types presented here and the high mitogenomic variation in P. monodon in Bangladesh (Alam et al. 2016) suggested a large population size.
Variation at the SNPs from the unannotated regions, and the C-type lectin genes and HLA3 did not differ, indicating no signs of selective effects at SNPs, e.g. due to pathogens at these loci in the natural populations. A single microsatellite marker with an unknown genetic location was found to be affected by natural selection and was excluded from further analysis. Lower pathogenic infection was observed in gravid shrimps from the deeper zone of Bangladesh marine waters (Debnath et al. 2014) and the harmful pathogenic effects may be restricted to aquaculture stocks. A further study should be done to assess the putative impact of selection in these genes, both by analysing larger numbers of SNPs and by assessing variation in aquaculture stocks. This could be valuable for aquaculture by allowing breeding of disease resistant individuals.
To conclude, three genetically distinguishable populations with high genetic diversity were observed along the Bangladesh coastline as described above, which can be considered as separate management units. The Sundarban mangrove forest region can be considered as an important nursery ground and could be managed as a separate unit, as postlarvae from both the Barguna coast and Middle ground of the Bay of Bengal share this nursery ground. The southernmost population (St Martin's Island) should be considered as a separate unit for management. The study did not identify any signs of selective effects on wild P. mondon populations that could have resulted, for example, from severe viral diseases such as WSSV. This could suggest that the negative effects of pathogens are mainly restricted to conditions in aquaculture or that more extensive sampling is needed to detect the signs of selection.