figshare
Browse

TL sequence conservation and ML prediction of phenotypes from fitness data.

Download (4.42 MB)
figure
posted on 2023-03-22, 17:37 authored by Bercem Dutagaci, Bingbing Duan, Chenxi Qiu, Craig D. Kaplan, Michael Feig

(A) Sequence conservation of TL in different organisms that are eukaryotes (S.cerevisiae and S. pombe from yeast, H. sapiens from mammals, A. thaliana from plants and D. melanogaster from insects), bacterium (T. thermophilus), and archaeon (S. solfataricus). (B) The structure of TL residues and GTP from S.cerevisiae (PDB code:2E2H). The TL is colored with the level of conservation among the species shown in A. (C) Fitness scores for mutants of TL residues under different conditions (see x-axis labels). The x-axis shows the different conditions with three replicates and the y-axis shows the TL mutations with 19 mutants for each residue, y-axis was labeled by only the TL residue number for avoiding a crowded labeling using all mutant names. For each residue, there are 19 mutants and for each mutant there are 21 data points with three replicates each. (D) Phenotypical landscape predicted from the fitness data. Each box is colored with the phenotype; white boxes reflect the WT amino acid at those positions; the row at the bottom depicted as “Mean” shows the average phenotypes for the residues. (E) Latent space of the unsupervised VAE model based on the fitness data with each data point colored according to its corresponding phenotype. The axes z[0] and z[1] are the first and second dimensions of the 2D latent space generated by projecting the fitness scores using the VAE model.

History