figshare
Browse
pgen.1011074.s001.pdf (31.53 MB)

Fig A in S1 Tex.

Download (31.53 MB)
journal contribution
posted on 2023-12-18, 18:40 authored by Dinghao Wang, Deshan Perera, Jingni He, Chen Cao, Pathum Kossinna, Qing Li, William Zhang, Xingyi Guo, Alexander Platt, Jingjing Wu, Qingrun Zhang

Stability of cLD and LD revealed by closed-form variance calculation and bootstrapped resampling. The EUR population is shown. a) The gene pairs were split into four different bins based on the cMAF values, i.e., < 0.05, 0.05–0.1, 0.1–0.2, and 0.2–0.4, which has been shown in the y-axis. The x-axis is the ratio between the variance of LD and cLD. b) Probability density distribution of cLD and LD by generating bootstrapped samples. Fig B in S1 Text. Stability of cLD and LD revealed by closed-form variance calculation and bootstrapped resampling. The AFR population is shown. a) The gene pairs were split into four different bins based on the cMAF values, i.e., < 0.05, 0.05–0.1, 0.1–0.2, and 0.2–0.4, which has been shown in the y-axis. The x-axis is the ratio between the variance of LD and cLD. b) Probability density distribution of cLD and LD by generating bootstrapped samples. Fig C in S1 Text. Stability of cLD and LD revealed by closed-form variance calculation and bootstrapped resampling. The EAS population is shown. a) The gene pairs were split into four different bins based on the cMAF values, i.e., < 0.05, 0.05–0.1, 0.1–0.2, and 0.2–0.4, which has been shown in the y-axis. The x-axis is the ratio between the variance of LD and cLD. b) Probability density distribution of cLD and LD by generating bootstrapped samples. Fig D in S1 Text. The comparisons of cLD values between the 3D chromatin interaction regions and non-interaction regions among 13 different distance groups in (a) the whole population, (b) AFR, and (c) EAS. The European population, EUR, has been displayed in the main text Fig 3A. Fig E in S1 Text. The comparisons of LD values between the 3D chromatin interaction regions and non-interaction regions among 13 different distance groups in (a) the whole population, (b) AFR, and (c) EAS. The European population, EUR, has been displayed in the main text Fig 3B. Fig F in S1 Text. The comparisons of cLD values between the gene-gene interaction regions and regions without interactions among 13 different distance groups in (a) the whole population, (b) AFR, and (c) EAS. The European population, EUR, has been displayed in the main text Fig 4. Fig G in S1 Text. Protein docking interaction between 1FYV and 4OM revealed by cLD (0.69) with a binding affinity of -266.77 kJ/mol. a) Structure of 1FYV (red) and 4OM (blue) protein-protein complex. b-c) 2D representation of closest interacting residues around the protein-protein interaction interfaces, including hydrogen bonds (green dotted line) and hydrophobic interactions (read and rose semi-circle with spikes). Residues for the 1FYV are depicted by uppercase letters (A, B) and for the 4OM are depicted by lower lowercase letter (a). Fig H in S1 Text. Protein docking interaction between 3S5N and 4HND revealed by cLD (0.52) with a binding affinity of -302.64 kJ/mol. a) Structure of 3S5N (red) and 4HND (blue) protein-protein complex. b-c) 2D representation of closest interacting residues around the protein-protein interaction interfaces, including hydrogen bonds (green dotted line) and hydrophobic interactions (read and rose semi-circle with spikes). Residues for the 3S5N are depicted by uppercase letter (A) and for the 4HND are depicted by lowercase letters (a, b). Fig I in S1 Text. Protein docking interaction between 1WVA and 6HO2 revealed by cLD (0.40) with a binding affinity of -263.19 kJ/mol. a) Structure of 1WVA (red) and 6HO2 (blue) protein-protein complex. b-c) 2D representation of closest interacting residues around the protein-protein interaction interfaces, including hydrogen bonds (green dotted line) and hydrophobic interactions (read and rose semi-circle with spikes). Residues for the 1WVA are depicted by uppercase letter (I) and for the 6HO2 are depicted in lowercase letter (a). Fig J in S1 Text. Protein docking interaction between 3L81 and 5FUR revealed by cLD (0.33) with a binding affinity of -277.36 kJ/mol. a) Structure of 3L81 (red) and 5FUR (blue) protein-protein complex. b-c) 2D representation of closest interacting residues around the protein-protein interaction interfaces, including hydrogen bonds (green dotted line) and hydrophobic interactions (read and rose semi-circle with spikes). Residues for the 3L81 are depicted by uppercase letters (A, B) and for the 5FUR are depicted in the lowercase letter (a). Fig K in S1 Text. Probability Density Functions of Distributions of the Number of Rare SNVs per Gene in Low-Depth (Red) and High-Depth (Blue) Data. The median number of rare SNVs per gene in the low-depth data is 86, while in the high-depth data it is 94. Although there is a difference in the median values, the distribution of the number of rare SNVs per gene is not significantly different between the two datasets. The definition of rare SNV is MAF < 0.005. Fig L in S1 Text. Probability Density Functions of Distributions of cMAF in Low-Depth (Red) and High-Depth (Blue) Data three population groups. (a) EUR population; (b) AFR population; (c) EAS population. The red line represents the low-depth data (4X coverage) and the blue line represents the high-depth data (30X coverage). The distributions of cMAF are very similar in the three populations, indicating that cLD performs well similarly in low-depth and high-depth data. The cMAF calculation uses a standard cutoff of defining the rare variants as MAF < 0.005. Fig M in S1 Text. The comparisons of cLD values between the 3D chromatin interaction regions and non-interaction regions among 13 different distance groups using High-depth data. (a) EUR, (b) AFR, and (c) EAS. Fig N in S1 Text. Illustration of the robustness of cLD, a) original SNVs b) simulated switching error. a) In the original data, we have 7 individuals and 7 SNVs in the gene. The gene line at the bottom is computed through integration across SNVs. b) We simulated phasing errors in individuals 1, 2, and 3 at SNVs 6–7, 1–3, and 7, respectively. However, after integration, the gene line remains unchanged. This example illustrates the robustness of the cLD against switching errors. Table A in S1 Text. Example of MAF for single SNV. Table B in S1 Text. Cumulative effects of SNVs in a region. Table C in S1 Text. Cumulative genetic “allele”s for cLD calculation. Table D in S1 Text. Example of SNVs data for cLD calculation. Table E in S1 Text. Example of cumulated gene alleles for cLD calculation. Table F in S1 Text. Data of two SNVs. Table G in S1 Text. Multinomial modelling of two SNVs. Table H in S1 Text. Multinomial modelling of two genes. Table I in S1 Text. The quantiles of cLD values in the 3D chromatin interaction regions and non-interaction regions among 13 different distance groups in three populations. Table J in S1 Text. EUR counts for Mantel-Haenszel test (3D-interaction, cLD). Table K in S1 Text. EUR counts for Mantel-Haenszel test (Hi-C, LD). Table P in S1 Text. Average cLD and LD differences under different distance groups within Hi-C regions. Table Q in S1 Text. The number of gene pairs above and below the cutoff (the 0.5 quantile), the ratio and the statistic tests between interaction and no interaction groups. Table R in S1 Text. The number of gene pairs above and below the cutoff (the 0.5 quantile), the ratio and the statistic tests within and without Hi-C regions. Table S in S1 Text. Comparisons between the number of gene pairs with cLD values larger than the cutoffs in whole genome and within 3D regions (only on chromosome 2). Table L in S1 Text. List of 19 gene pairs (not reported in any databases) with large cLD values with cMAF > 0.05 and existing IDs in PDB. Table M in S1 Text. List of candidate proteins with their respective cLD values and binding affinities. All candidates formed stable protein-protein complexes with negative binding energies. Table N in S1 Text. From the top 10 gene pairs with the highest cLD values, we identified 20 unique genes. 14 out of these 20 genes (70%) have been reported to be associated with ASD, including DENND4A, EFCAB5, ABI2, RAPH1, MSTO1, DAP3, ARL13B, PRB2, PRB1, ZNF276, FANCA, ADAM7, SLC26A1 and TUBB8. Table T in S1 Text. The success, success rate and p-value of hypergeometric distribution probability for DisGeNET and SFARI database from top 200 to 2000 gene pairs. Table O in S1 Text. List of parameters and their descriptions explaining their functionality.

(PDF)

History