Clinically significant findings of high-risk mutations in human SLC29A4 gene associated with diabetes mellitus type 2 in Pakistani population

Abstract This study conducted an in-depth analysis combining computational and experimental verifications of the deleterious missense mutations associated with the SLC29A4 protein. The functional annotation of the non-synonymous single nucleotide polymorphism (nsSNPs), followed by structure-function analysis, revealed 13 single nucleotide polymorphisms (SNP) as the most damaging. Among these, six mutants P429T/S, L144S, M108V, N86H, and V79E, were predicted as structurally and functionally damaging by protein stability analysis. Also, these variants are located at evolutionary conserved regions, either buried, contributing to the structural damage, or exposed, causing functional changes in the protein. These mutants were further taken for molecular docking studies. When verified via experimental analysis, the SNPs M108V (rs149798710), N86H (rs151039853), and V79E (rs17854505) showed an association with type 2 diabetes mellitus (T2DM). Minor allele frequency for rs149798710 (A > G) was 0.23 in controls, 0.29 in metformin responders, 0.37 in metformin non-responder, for rs151039853 (A > C) was 0.21 in controls, 0.28 in metformin responders, 0.36 in metformin non-responder and for rs17854505 (T > A) was 0.20 in controls, 0.25 in metformin responders, 0.37 in metformin non-responder. Hence, this study concludes that SLC29A4 M108V (rs149798710), N86H (rs151039853), and V79E (rs17854505) polymorphisms were associated with the increased risk of T2DM as well as with the increased risk towards the failure of metformin therapeutic response in T2DM patients of Pakistan. Communicated by Ramaswamy H. Sarma


Introduction
Type 2 diabetes mellitus (T2DM) is characterized when pancreatic b-cells become defective in insulin secretion or become insulin resistant (Stumvoll et al., 2005). T2DM is a metabolic disorder that is a heterogeneous and multi-factorial disease. Both environmental and genetic factors contribute to the pathogenesis of T2DM (Brunetti et al., 2014). Pakistan is the 36 th largest country, and globally, Pakistan stands at position 6th in the prevalence of T2DM. The frequency of T2DM in Pakistan in the year 2000 was assessed to be 5.2 million, which was anticipated to be raised to 13.9 million by the year 2030 (Wild et al., 2004). The mean prevalence of type 2 diabetes mellitus in Pakistan was 11.77%, wherein females' prevalence was 9.19%, and in males was 11.20% (Shu et al., 2007).
Metformin is the most commonly used drug against diabetes, sold as Glucophage, the only biguanide drug available. Although metformin is widely used, it still has different effects on different individuals. Clinical trial studies have shown that more than one-third of individuals getting metformin monotherapy fails to attain satisfactory control levels of fasting glucose. The fundamental cause for the lack of response to metformin in T2DM individuals may be due to changes in genes involved in drug pharmacokinetics and pharmacodynamics (Sissung et al., 2012).
It has been considered the best choice that reduces Hemoglobin A1c (HbA1c) level without causing any hypoglycemia in individuals (Nathan et al., 2006). It is a hydrophilic molecule, and the movement of metformin in the intestine, liver, and kidney is facilitated by several transporters, including plasma membrane monoamine transporter (PMAT). The passive distribution of metformin is restricted by its low lipid solubility across cell membranes (Graham et al., 2011).
Genetics plays a crucial part in the diagnosis at early stages, precise personal treatment, and how to monitor complex diseases, including T2DM (Ali, 2013). Genetic polymorphisms in the genes that encode their transporters directly affect the metformin pharmacokinetics and its therapeutic response (Vitarani et al., 2019). Besides, metformin's pleiotropic effects give a new perspective for both descriptions of its inter-individual differences in response and the explanation of novel drug targets.
Researchers have lately identified significant associations between single nucleotide polymorphisms (SNPs) and the incidence of T2DM. Some studies have explored the effects of variations in genes organic cation transporter 1 (OCT1), organic cation transporter 2 (OCT2), human multidrug and toxin extrusion (MATE1) (Tarasova et al., 2012), human multidrug and toxin extrusion (MATE2) (Raj et al., 2018), and plasma membrane monoamine transporter (PMAT) (Dawed et al., 2019), on the patients of T2DM treated with metformin. Genetic polymorphisms in PMAT exhibited an association with decreased metformin absorption (Zhou et al., 2007). PMAT (SLC29A4) was first recognized from public genomic databanks as the 4 th member in the mammalian SLC29 gene family, which encodes equilibrative nucleoside transporters (ENTs) molecularly and functionally different from the OCTs. PMAT is a poly-specific organic cation transporter, and its function is to transport many xenobiotic cations and biogenic amines, including Metformin. SLC29A4 gene encodes PMAT, is located at chromosome 7p22.1. It has 12 exons and has 530 amino acid residues with 10-12 transmembrane segments. In the intestine, the absorption of metformin is generally carried out by PMAT. This transporter is expressed on the luminal side of enterocytes. However, no in-vivo studies have been performed on PMAT function and metformin's pharmacological effect (Gong et al., 2012;Semiz et al., 2013).
This study has identified the high-risk SNPs in PMAT associated with Type 2 diabetes mellitus by utilizing both in-silico and experimental methods. The deleterious missense SNPs that are disrupting the structure and function of the protein were studied. The protein stability changes and analysis of amylogenic regions were explored in particular. This study facilitates finding the effect of mutations on the structure and function of PMAT. To the best of our knowledge, this is the first study that conducted a comprehensive analysis of non-synonymous SNPs of PMAT protein to find a suitable treatment for diabetic patients.

High-Risk single nucleotide polymorphisms dataset
The amino acid sequence of PMAT (Q7RTT9) was downloaded from UniProt. Total 514 single nucleotide polymorphism (SNPs) were collected from dbSNP (Short Genetic Variations) https://www.ncbi.nlm.nih.gov/projects/SNP/ (Dayem Ullah et al., 2012). First, the non-synonymous SNPs were separated from synonymous ones by utilizing SNPnexus that outputs the combinatorial results from SIFT and PolyPhen. These variants were further subjected to highrisk SNP analysis.

PROVEAN
PROVEAN (Protein Variation Effect Analyzer) is a sequencebased tool that predicts variants' effect on protein function. The threshold of À0.5 is set, if the variants are below the threshold, the SNPs are categorized as deleterious, and if it's above the threshold, the variants are labeled as neutral. The tool is accessible at http://provean.jcvi.org/index.php (Choi & Chan, 2015).

Prediction of evolutionary conserved residues
The evolutionary conservation of amino acids was predicted using the ConSurf server https://consurf.tau.ac.il/ (Glaser et al., 2003). The server generates multiple sequence alignment of the queried protein sequence and computes the Bayesian algorithm's conservation score.

Prediction of Disease-Related SNPs
To predict the SNPs as disease-associated or neutral, the Meta-SNP predictor was used. This tool combines four predictors, namely PANTHER, PhD-SNP, SIFT, and SNAP. The tool is accessible at http://snps.biofold.org/meta-snp/index.html (Manning et al., 2011). The majority voting of the four tools will label a mutation as diseased or neutral.

3D Structure modeling of PMAT
The 3D structure of PMAT was predicted by using homology modeling server I-TASSER. It takes in the FASTA sequence as input and predicts the 3D structure of the protein. The server can be accessed here, https://zhanglab.ccmb.med.umich.edu/ I-TASSER/ (Yang et al., 2015). Further, the mutant structures were also generated from the PYMOL mutagenesis wizard. Each mutant is superimposed with the wild-type structure and is visualized using USCF Chimera. Further, the energy minimization of the mutant models was generated using the ModRefiner. The respective RMSDs were generated to analyze how well the mutant models deviate from the wild type.

Protein stability changes upon mutation
For analyzing the effect of protein stability changes, we used the Mutation Cutoff Scanning Matrix (mCSM), which relies on a graph-based structure. The server takes in protein structure and mutations list as input and outputs the mutations as stabilizing or destabilizing based on the scores. The tool can be accessed at http://biosig.unimelb.edu.au/mcsm/ (Pires et al., 2014). This is a graph-based method that computes distance patterns around the mutant residues. The method calculates the difference of Gibbs free energy, which then labels a mutation as destabilizing or neutral.

Prediction of amyloidogenic regions
Aggregation of incorrectly folded proteins is the cause of various diseases. The missense variants result in the formation of the amyloid aggregates and disrupt the protein function. We used the FoldAmyloid server to predict part of the sequence that is involved in aggregation. The server is available at http://bioinfo.protres.ru/fold-amyloid/ (Garbuzynskiy et al., 2010).

Predicting Protein-Drug interactions using ClusPro and AutoDock Vina
To predict the PMAT-metformin interactions computationally, we have employed the ClusPro web server with the default settings. The wild-type PMAT structure and the mutant models were docked with metformin to compute the difference in docking energies. ClusPro generated 10 different docked poses with the scores for each experiment that determines the binding affinities of the docked molecules. The server can be accessible at https://cluspro.bu.edu/ (Ngan et al., 2012).
Further, to validate our docking protocol, we have utilized AutoDock Vina (Kozakov et al., 2017). The preprocessing steps include preparing ligand and receptor files in pdbqt format via the graphical user interface program AutoDock Tools (ADT). ADT adds Kollman charges polar hydrogens, solvation parameters, and fragmental volumes to the protein.
The AutoGrid was used to set the grid size, which we fix to 60 Â 60 Â 60 xyz points in this case. The protein and ligand were considered rigid while performing the docking. Further, the docked poses were selected based on the lowest binding energy or binding affinity.

Molecular dynamics simulations using CABS-Flex webserver
The molecular dynamics simulations provide a detailed analysis of the conformational changes of the protein by plotting Root Mean Square Fluctuations (RMSF). CABS-Flex uses CABS coarse-grained model for simulations. The input protein structure is represented in a coarse-grained representation with a single residue that can be represented by four different atoms. The wild-type protein and the mutant models were provided as an input to the CABS-Flex server, and RMSF was analyzed to observe the difference in the models in terms of conformational flexibility. The server can be accessed at http://biocomp.chem.uw.edu.pl/CABSflex2/ (Trott & Olson, 2010).

Experimental validation of in silico results by ARMS-PCR
We have designed an experimental analysis to provide sufficient justification for the results obtained from in-silico analysis. The protocol is described below.

Ethics statement
The endocrinology department of PIMS-Pakistan approved the present study. All the patients were informed before blood sample collection, and written consent was obtained from all the participants. All the clinical information gathered from the participants was kept confidential.

Study population and design
This was a case-control study. A total of 1200 unrelated individuals, including 800 clinically diagnosed T2DM individuals and 400 ethnically matched healthy individuals, were enrolled in this study as per their permission. The sample size was calculated by using an online calculator (Kuriata et al., 2018) by considering confidence level 95% and confidence interval (CI) in line with the literature (Creative Research Systems, 2017). Observed mean ages of healthy controls and T2DM patients were 43.55 ± 14.00 and 56.04 ± 12.86. There were 61.5% males in healthy controls and 58.63% in T2DM, whereas 38.5% females in healthy controls and 41.37% in T2DM. The T2DM patients were further categorized into two groups of 400 T2DM patients based on HbA1c level. T2DM patients with HbA1c level 7.0% were considered metformin responders and T2DM patients with HbA1c level >7.0% were considered metformin non-responders. American Diabetes Association (ADA) guidelines were followed to categorize T2DM patients into metformin responders and metformin non-responders. All T2DM patients were clinically diagnosed by the diabetologist of Pakistan Institute of Medical Sciences (PIMS) hospital, Pakistan-Islamabad. Individuals who failed to match the drugs criteria were disqualified from the study.

Blood collection and DNA extraction
Venous blood of all the involved patients and controls individuals were collected in 5 mL EDTA (ethylenediaminetetraacetic acid) vacutainers. The blood was collected from the individuals from the year 2015 to 2016.
A standard phenol-chloroform method was used with minor alterations to extract DNA from WBCs (Kirby et al., 2002). 750 lL of blood was added into a tube with solution A. After the incubation of 10 min, tubes were centrifuged. 400 lL of solution A was added again, and tubes were centrifuged. Resuspension of the nuclear pellet was obtained by adding solution B, 20% sodium dodecyl sulphate (SDS), and proteinase K. The tube was then kept in a shaking incubator at 37 C for overnight. The next day fresh mixture of equal volumes (1:1) of solution C and solution D was added into the tube. After centrifugation, the upper aqueous phase was collected in a labelled tube. Followed by the addition of solution D to the aqueous layer, it was again subjected to centrifugation. The aqueous layer was collected in a properly labelled fresh tube. Sodium acetate and ethanol were added to precipitate DNA by inverting tubes several times. Once again, the tube was centrifuged, and the supernatant was wasted. To the DNA pellet, 70% ethanol was added. The mixture was centrifuged, and ethanol was discarded. The centrifuge tube was blot dried and then air-dried for 2-3 min. 100 lL of distilled water was added to the DNA pellet. The extracted DNA was successively examined on 1% agarose gel, as shown in Figure S1.
Primer designing, genotyping, and gel electrophoresis Primers were designed by using Primer 3 software, and specificity was checked by Primer-BLAST. A total of 800 T2DM individuals and 400 healthy control samples were amplified for the target genes using a specific set of primers listed in Table S1. A single reaction of PCR was done in a 20 lL reaction mixture. To prepare a reaction mixture of PCR, 0.2 mL tubes (Axygen, California, USA) were used. A reaction mixture consisted of 2 lL of sample DNA (50 ng/lL), 2 lL of each forward and reverse primers (20 pm/lL), 6 lL of PlatinumV R PCR SuperMix, and 8 lL of PCR water. It was then thoroughly mixed via centrifugation at 8000 x g for 30 sec, and later tube was gently tapped to remove any air bubbles present in the mixture.
A 2720 thermal cycler (Applied Biosystem, Foster City, USA) was used to further process the reaction mixture through thermocycling conditions. Thermocycling conditions were set as follows; denaturation of template DNA at 95 C for 5 min and then 35 cycles of PCR amplification. A PCR cycle further is divided into three steps including; 1) denaturation: template DNA is melted to single strands at 94 C for 30 secs, 2) annealing: at 57 C for 30 secs to allow the annealing of the primers to their respective target sites on DNA and 3) extension: at 72 C for 1 min and 20 s for extension of the complementary DNA strand from the annealed primers. These 35 cycles were followed by synthesizing any nonextended strands by Taq polymerase for 10 min at 72 C. Products by the presence or absence of bands specific for wild or mutant primers in each well were evaluated using a UV transilluminator.

Haplotype and linkage disequilibrium analysis
The haplotype frequency was estimated using SNPStats, which uses the expectation-maximization (EM) algorithm for three SNPs. R software and genetics packages were used for linkage disequilibrium analysis.

Statistical analysis
Data were analyzed using IBM SPSS Statistics 20.0. The direct gene count method was used to calculate the genotypic and allelic frequencies. Hardy-Weinberg equilibrium test was used to compare the actual genotypes with the expected number based on the Hardy-Weinberg equilibrium theory (p ¼ allele frequency, q ¼ 1-p, p2 þ q2 ¼ 1) in controls. The Hardy-Weinberg principle states that a population's allele and genotype frequencies will remain constant in the absence of evolutionary mechanisms. According to the Hardy-Weinberg principle, the variable p often represents the frequency of a dominant allele, and the variable q represents the recessive allele frequency. The sum of the frequencies must add up to 1, or 100 percent.
We used descriptive statistics with the mean ± SD in groups. To compare the differences between continuous variables, the student's t-test or the Mann-Whitney test was done. The difference in genotypic and allelic frequencies of the polymorphisms was analyzed between healthy individuals and T2DM patients, metformin responder, and nonresponder individuals using Fisher's exact test. With the adjustment of age and gender, multinomial logistic regression was applied to calculate the odds ratios (ORs) and 95% confidence intervals (CIs). To analyze the SNPs, five multinomial logistic regression models (co-dominant, dominant, recessive, over-dominant, and additive) were also used. To find % difference, the mean ± SD was calculated by the student's t-test and variables significance was checked by applying the ordered Probit econometric technique because the dependent variables of the model were categorical. We estimate it by using Stata software. Haplotypes were generated from the genotyped data. The haplotype frequency was calculated using SNPStats, which uses the expectation-maximization (EM) algorithm for two SNPs. R software and Genetics package was used for Linkage disequilibrium analysis. p < 0.05 was considered significant.

High-Risk SNP dataset
The non-synonymous substitutions were filtered from synonymous ones from SNPNexus. Out of 516 SNPs, 21 were categorized as missense variants. These variants were subjected to PROVEAN, which further classified 14 SNPs as deleterious. Next, these variants were further passed through Meta-SNP to predict if these SNPs were disease-associated or not. The tool predicted 12 SNPs as disease-causing. The method takes the average voting of 4 in-built tools, namely PHD-SNP, PANTHER, SIFT, and SNAP, to label a SNP as diseased or neutral. The results are tabulated in Table 1.

Evolutionary conservation analysis
The ConSurf web server was used to analyze evolutionarily conserved residues. The mutants, N123S, N86H, F234I, P429T, and P429S, were predicted as highly conserved and exposed at the protein's surface, while two mutants G461S and A471V, were predicted as highly conserved and buried in the core of the protein. The buried amino acids usually contribute structurally to the protein; thus, a mutation might cause structural damage to the protein. While the mutants predicted as exposed are mostly functional therefore will disturb the function of the protein, which in turn leads to disease formation. Figure 1 highlights the mutant amino acids with their structural and functional impact on the protein.

3D Structure prediction using I-TASSER
The 3D structure of PMAT was generated from the I-TASSER homology modeling tool. The model is generated by using the top 10 threading templates with an average sequence coverage of 67%. The details of the templates and alignment are provided in Figure S2. The model was analyzed using phi and psi distributions of the Ramachandran plot, which showed that the highly preferred observations are 85.9 ( Figure S3). Further, the PyMol was utilized to generate mutant models of PMAT, which were then superimposed and analyzed by the UCSF Chimera 1.11 visualization tool (Mahjabeen et al., 2013). All the mutant models were superimposed with the native structure, and the deviation was calculated using RMSD as figured in Figure S4. ModRefiner was used to perform the energy minimization of the modeled structures. The respective RMSDs calculated for wild-type and mutant structures are tabulated in Table1; only those mutants have tabulated whose calculated RMSD's are significantly different from the wild-type. The wild-type 3D structure and its mutant models are figured in Figure 2.

Analyzing protein stability changes
The protein stability changes were predicted from mCSM. The difference of Gibbs's free energy between the folded and unfolded state of the proteins indicates the stability of the protein. More positive this value means more stable the protein is. The predicted scores of the SNPs determine the SNP as destabilizing or neutral. All the negative DDG scores produced from mCSM were labelled as destabilizing, while the positive values were labelled as stabilizing proteins. Considering this, 13 mutants are classified as destabilizing, while M108V, L144S, P429T, and P429S were predicted as highly destabilizing. The results are tabulated in Table 1.

Analysis of amylogenic regions
The protein aggregates analysis showed eight mutants N86H, M108V, N123S, I218T, F234I, R256C, P429T/S, and L430F that were found in the aggregation profile. This implies that mutations are evidently disrupting the helix formation, which ultimately results in the aggregation of beta-sheets. Amyloids are associated with various diseases, including neurodegenerative disorders, diabetes, and prion diseases.  shows the graphical representation of amylogenic regions predicted from the FoldAmyloid server. Few of the mutations are also occurring in the amylogenic region, as figured in Figure S5.

Predicting protein-drug interactions using ClusPro and AutoDock vina
To determine the structural effect of the mutations, we have carried out blind docking of PMAT with Metformin using ClusPro. The interactions between the protein and ligand were analyzed by computing the buried surface area (BSA), which we considered as a strength of a bound complex. The BSA was computed for wild-type interactions and for each mutant model interaction with the ligand. The protein-ligand bound complex with the highest BSA is considered as the most stable compound. Among wildtype and all the mutant models, the wild-type interactions showed the highest BSA hence expected to be stable with the mutations effecting the stability of the complex. The results are tabulated in Table 2 and figured in Figure 4. Further, we have employed Auto Dock Vina for validating the docking interactions obtained from ClusPro. The docking routine returns 10 docked poses, out of which we have selected the best-docked poses based on lower binding affinities. To analyze the interaction patterns between PMAT and Metformin, LigPlot þ was utilized (Laskowski & Swindells, 2011). The 2D plots were drawn, and the binding residues were observed in the wild-type complex and the mutant complexes. The 2D plot of the wild-type illuminates the presence of hydrogen bonds at a distance of 3.13 Å whereas, the spiked atoms show the hydrophobic contacts. The binding pocket shows the presence of different residues in the wildtype compared to the mutants; hence might be affecting the protein-ligand interactions. The results are figured in Figure 4 and in supplementary Table S2 that shows the 2D interaction profile of the protein-ligand interaction and the 3D interaction plots which we have generated from the PLIP server (Pettersen et al., 2004). PLIP is a fully automated tool that determines seven contacts of protein-ligand interactions including hydrogen bonds, hydrophobic contacts, pi-stacking, pi-cation interactions, salt bridges, water bridges, and halogen bonds.

Molecular simulations using CABS-Flex 2.0
For the current study, we have utilized the CABS-Flex server to carry out simulations of the wild-type docked complex and the mutant model docked complexes to all atom dynamics molecular algorithms. The RMSF plots generated are figured in Figure 5. The plots show the RMSF range in the wild-type protein model from 0.02 nm to 2.44 nm. The high fluctuating residues can be seen around the residues 302-352. The center residues of all the protein structures show the highest fluctuations. Compared with the mutant models, the RMSF plot shows high fluctuating residues in all mutant complexes, especially in N86H, V79E, M108V, and L144S. This indicates that the wild-type structure is more stable as compared to the mutant ones. Thus, these mutations decreases the stability of the protein and can be correlated with the mCSM results. Genotyping of SLC29A4 SNP's M108V (rs149798710), N86H (rs151039853) and V79E (rs17854505) All subjects (controls, metformin responders, and metformin non-responders) were genotyped for three polymorphisms; rs149798710, rs151039853, and rs17854505, located in exon 4 and exon 3 of the SLC29A4 gene, respectively. All the possible combinations of alleles were observed in the study subjects. All three SNPs were in HWE in each group. Minor allele frequency for rs149798710 was 0.23 in controls, 0.29 in metformin responders, 0.37 in metformin non-responder, for rs151039853 was 0.21 in controls, 0.28 in metformin responders, 0.36 in metformin non-responder and for rs17854505 was 0.20 in controls, 0.25 in metformin responders, 0.37 in metformin non-responder. A statistically significant difference was observed between groups in the genotypic and allelic frequency (p < 0.05), representing that all studied SNPs significantly affect T2DM in the Pakistani population.

Controls versus patients
In the case of the first selected SNP of SLC29A4, A > G (rs149798710), the carriers of one mutant allele 'G' (AG) and homozygous GG was higher in metformin responders and non-responders as compared to healthy controls. Genotyping of the second selected SNP A > C (rs151039853) of SLC29A4 revealed that frequencies of heterozygous mutant AC and homozygous mutant CC genotypes along with the C allele were significantly higher in metformin responders as compared to controls. For controls and metformin non-responders, the same results were noted. The third SNP of SLC29A4 T > A (rs17854505) showed that the frequency of TA and AA genotypes were higher in metformin responders as compared to controls, but no significant association was observed for the dominant model between controls and metformin responders (p > 0.05). However, for the additive model, a significant association was observed. Furthermore, when the analysis was done between controls and metformin non-responders, homozygous of one mutant genotype TA and homozygous of mutant genotype) was less in controls than in metformin non-responders (Table S3). The frequency of mutant 'A' allele (AA) was higher in metformin non-responders than controls.
The genotype frequencies of all three SNPs were further analyzed by four genetic models: dominant, over-dominant, recessive model, and additive model, as shown in table S3.  (Table S3).

Comparison between patients groups (metformin responders and non-responders)
For third SNP of SLC29A4 T > A (rs17854505), the heterozygous TA genotype (37.0% vs. 48.0%; OR ¼ 1.89; 95%CI ¼ 1.4107 to 2.5484; p < 0.0001) and homozygous AA genotype (6.0% vs. 13.0%; OR¼ 3.16; 95%CI ¼ 1.8736 to 5.3521; p < 0.0001) of rs17854505 were significantly less among metformin responders than those who failed to respond. Patients with T2DM that were homozygous for wild allele 'T' Figure 3. The amylogenic residues predicted from the FoldAmyloid server depict the residues involved in the aggregation profile responsible for amyloid formation and aggregation. The residues are highlighted in purple color.
(TT) had 3 times more probability to respond metformin monotherapy. Same pattern was noticed when studied under other penetrance models (    6 months. The mean (SD) of metformin daily dose required in metformin responders and non-responders in the starting 6 months was 1000 mg.
Change in the study parameters from baseline to 6 months of metformin therapy according to the genotypes Comparisons of differential values in metformin responders and non-responders with different SLC29A4 variants rs149798710, rs151039853, and rs17854505 genotypes before and after metformin treatment are presented in Table S5.

Linkage disequilibrium analysis
All three SNPs are located at the same chromosome in proximity; the likelihood of linkage disequilibrium (LD) between the SNPs (rs149798710, rs151039853, and rs17854505) was studied. In LD, the D' represents the coinheritance of the rare allele; when it is present, it is always inherited with one particular allele of the 50% polymorphism, whereas the r2 Table 3. Summary of a case-control study examining haplotypes of SLC29A4 SNPs (rs149798710, rs151039853, and rs17854505) and T2DM risk in metformin responders. Abbreviations: G ¼ Guanine; A ¼ Adenine; T ¼ Thymine; C ¼ Cytosine, bp ¼ base pairs, OR ¼ odd ratios, CI ¼ Confidence Interval. P-value <0.05 was considered as significant.
.05 was considered as significant. represents if it is a rare allele. The LD value close to > 0.8 indicates that SNPs are in strong LD and coinherited. D (measures the deviation of haplotype frequencies from expected values based on gene frequencies), D', r, and p values are mentioned in Figure 6. The r2 (correlation coefficient) of Pakistani population were 0.81, 0.84 and 0.92, respectively (Figure 7). The high LD and r2 values show that these SNPs are coinherited, and the mutant allele is rare. However, these SNPs do not contribute individually to association studies since these are in complete linkage disequilibrium.

Discussion
T2DM is one of the most common chronic metabolic diseases within developing countries (Olokoba et al., 2012). Predominantly, the drug "Metformin" is the most common glucose-reducing drug administered orally (Scarpello & Howlett, 2008;Viollet et al., 2012). Despite the emergence of new drug targets and augmented therapies for the treatment of diabetic patients, patient care still remains insufficient and not so promising because of the various side effects of the drugs used to treat the disease symptoms or disease, in general. To date, there has been the identification of nine classes of drugs employed for diabetes treatment in T2DM patients explicitly. Of all these, metformin has been considered the first line of treatment globally (Hu et al., 2012;Thule, 2012). In this study, the mutations in PMAT were systematically analyzed by combining sequence and structure based computational methods with further experimental verification analysis. Firstly, the SNPs were functionally annotated by various bioinformatics tools; out of 516 SNPs obtained from dbSNP, 13 SNPs were predicted as damaging/deleterious. These were further passed to evolutionary conservation analysis, which showed that the variants N86H, F234I, G461S, P429T/S as highly conserved and exposed on the surface of the protein have a functional role. Further, the protein stability analysis showed the substitutions N86H, V79E, L144S, M108V, and P429T/S as highly destabilizing and can affect the protein's structure and function. Moreover, these mutants were also found in the aggregation profile, which shows that the mutations affect the alpha-helices formation, which results in the aggregation of beta-sheets. The aggregation of proteins is associated with a wide range of diseases, including diabetes (Cheng et al., 2012;Mukherjee et al., 2015). Further, the mutant models of PMAT were also generated using PyMOL and subjected to RMSD analysis to identify how deviated the mutants are from the wild-type structure. The results revealed L144S, P429T/S, M108V, N86H, and V79E as highly deviated with the RMSD values >1. These mutations were then analyzed by docking studies using the ClusPro web server and later validated with AutoDock Vina software. The computed buried surface area of the mutant models N86H, M108V, V79E, and L144S, showed a significant decrease in BSA compared to the wildtype complex hence contributing to affecting the protein-ligand interactions. The 2D and 3D plots generated from LigPlot and PLIP servers also confirmed the activity of different residues in the binding pocket of wild-type PMAT and the mutant models. Later, the simulation analysis carried out by CABS-Flex determined the root mean square fluctuations with mutant models showing the highest fluctuations compared to the wild type hence contributing to disrupting the conformational flexibility of PMAT. Earlier, few studies have reported this fluctuation difference in protein native and mutant structures, which can shed light on the stability  . Linkage disequilibrium (LD) plot for 3 SNPs in the region of SLC29A4 gene was produced by R software and genetic packages. Each colored rectangle represents the squared correlation r 2 between a pair of SNPs (specified by LD. measure¼ "r"). Physical map locations of the 3 SNPs provide information on their relative positions, which are indicated on the diagonal line by line segments, and the total length of the genetic region indicated by the text ("Physical Length:0.3 kb"). The SNPs rs149798710, rs151039853, and rs17854505 show strong LD (R^2 value of > 0.8 as shown in color key). These SNPs are located within the 0.3 kb distance in the gene.
These mutations were also further analyzed by the HOPE web server, which provides an in-detail description of the mutations affecting the structure-function of PMAT. For P429T/S, the wild-type residue is more hydrophobic than the mutant residue. These differences in hydrophobicity can affect the hydrophobic interactions with the membrane lipids. Further, the wild-type residue is proline. Prolines are known to be very rigid and therefore induce a unique backbone conformation, which might be required at this position. The mutation can disturb this special conformation. In L144S, the size and the hydrophobicity difference between the wildtype and mutant residue can affect the hydrophobic interactions with the membrane lipids.
The results of V79E showed a difference in charge between the wild-type and mutant amino acid. The mutation introduces a charge; this can cause the repulsion of ligands or other residues with the same charge. The hydrophobicity of the wild-type and mutant residue differs. Hydrophobic interactions, either in the core of the protein or on the surface, will be lost. As was observed in the docking studies, which further adds to our findings. The analysis of N86H showed that the mutant residue is located at a highly conserved position. The size difference between the two can affect the contact with the lipid membrane. For variant M108V, the mutants are located near a highly conserved position. The mutant residue is also smaller; this might lead to loss of interactions (Venselaar et al., 2010).
In light of these results, it can be inferred that with 13 SNPs predicted as deleterious among them, six mutants P429T/S, L144S, M108V, N86H, and V79E were found as most damaging to the structure and function of the protein with L144S, M108V, N86H, and V79E disrupting the protein-ligand bound complex. These mutants were seen as highly deviated from their native structures and predicted as destabilizing to the protein structure. The mutants P429T/S were predicted as highly conserved residues; hence mutation in these positions can cause damage to the function of the protein. M108V was found in the aggregation profile. Further, V79E and N86H were also decreasing the stability of PMAT as predicted by mCSM. The N86H was also predicted as highly evolutionary conserved and is also associated with beta-sheet aggregation. Hence, it could be one of the causative agents of diabetes. Its role should be explored further for exploring its association with disease in detail.
The results obtained from in-silico analysis were further verified experimentally. This was a case-control study. Polymorphisms in the gene SLC29A4 could result in loss-offunction or gain-of-function mutations resulting in the efflux transporters' altered function. PMAT (SLC29A4) was first identified as the 4th member of the mammalian SLC29 gene family from public genomic databases, which encodes molecular and functionally specific equilibrium nucleoside transporters (ENTs) from OCTs family of transporters. This transmitter is located on the enterocyte's luminal side. Genetic polymorphisms in PMAT are associated with reduced absorption of metformin (Zhou et al., 2007). Currently, however, there have been no reports published about the pharmacological impact of metformin on the function of PMAT.In our present study, we have analyzed three variants and established a significant association with SLC29A4 variants M108V (rs149798710), N86H (rs151039853), and V79E (rs17854505), and Metformin clinical efficacy.
The missense mutation, polymorphism M108V A > G (rs149798710), arises in exon 4, allowing methionine to be transformed into valine at amino acid position 108. To date, no literature analysis on this SNP has been published so that the data generated is novel in its orientation. SLC29A4 rs149798710 polymorphism decreased HbA1c levels upon metformin treatment such that patients with the SLC29A4 rs149798710 AA genotype showed more significant reduction compared to those having AG or GG genotype. It might be due to the fact that the AA genotype enhances response to metformin which, in turn, causes a decrease in HbA1c values in these patients. Likewise, after 6 months of metformin treatment, the decrease in HbA1c was more significant in AA homozygotes than patients with the AG þ GG genotype. The effect of SLC29A4 rs149798710 polymorphism on FBG and 2HPP also that AA genotype patients had better glucose lowering effect as well as improved lipid profiles after 6 months follow-up. Nonetheless, replication studies in a relatively large population in a multi-ethnic diabetic cohort are needed to confirm the SLC29A4 rs149798710 polymorphism effect on metformin efficacy both in glycaemic control and lipid profiles.
The N86H A > C (rs151039853) missense mutation occurs in exon 3, causing asparagine to be transformed into histidine at amino acid position 86. No previous data were published in the literature on this SNP; therefore, the produced information is novel in their orientation. SLC29A4 rs151039853 polymorphism has been found to reduce HbA1c upon treatment with metformin. This reduction was more in patients with the SLC29A4 rs151039853 AA genotype than those with AC or CC genotype. Furthermore, after 6 months of metformin therapy, the decrease in HbA1c was greater in AA homozygotes compared to patients with the AC þ CC genotype. The effect of SLC29A4 rs1510398 polymorphism on FBG and 2HPP showed that AA genotype patients had better glucose lowering effect and improved lipid profile after 6 months follow-up in T2DM. Replication studies in a relatively large population in a multi-ethnic diabetic cohort are needed to confirm the SLC29A4 rs151039853 polymorphism effect on metformin efficacy both in glycaemic control and lipid profiles.
The missense mutation, polymorphism V79E T > A (rs17854505), arises in exon 3, triggering the transformation of valine to glutamic acid at amino acid position 79. To date, no literature on this SNP has been published so that the data generated is novel in its orientation. It was observed that SLC29A4 rs17854505 polymorphism decreased HbA1c values upon treatment with metformin. This reduction, however, was more significant in individuals exhibiting SLC29A4 rs17854505 TT genotype than those carrying TA or AA genotype. Comparably, after 6 months of metformin treatment, this decrease in HbA1c was greater in AA homozygotes compared with patients having TA þ AA genotype. In contrast, TT genotype patients had better glucose lowering effects and improved lipid profiles. Replication studies in a relatively large population in a multi-ethnic diabetic cohort are needed to confirm the SLC29A4 rs17854505 polymorphism effect on metformin efficacy both in glycaemic control and lipid profiles. In the current study, these three SNPs of SLC29A4 were associated with an increased risk of T2DM and failure to respond to metformin therapy.
The results of the current study might have a practical implication in future personalized treatment of T2DM patients. However, the small sample size of the present study can be considered as a limitation of the study; therefore, more research in different ethnic groups with larger sample sizes is required to elucidate the role of M108V (rs149798710), N86H (rs151039853), and V79E (rs17854505) variants in Metformin response.

Conclusions
To the best of our knowledge, this is the first study from the Pakistani population that aims to find significant deleterious SNPs associated with PMAT. The current study has first conducted the in-silico analysis that employs various computational tools to determine the structural and functional damaging mutations, which were further validated through experimental analysis. The experimental method confirms the presence of three mutations M108V (rs149798710), N86H (rs151039853), and V79E (rs17854505) in patients. Hence, these three are more crucial in causing a structure-function impact on the PMAT protein. The mutant M108V was found in the aggregation profile, which means forming beta-sheet aggregates, which are most likely associated with Diabetes. These three SNPs are found as strong candidates in causing diseases associated with mutations in PMAT. Therefore, it can be explored further for determining its possible effects on drug discovery.