Representation of gene modeling and homeologous region identification.
(A) The genomes of several organisms were aligned to the zebrafish protein database using BLASTX. (B) The partial alignments were sorted based on their alignment to the protein sequence. (C) Individual alignments of a protein were sorted and brought together if it was logical to do so (i.e. the alignments were in order relative to the genome and the protein sequences). Often, partial alignments would be found for multiple chromosomes. Many of these partial alignments were removed because there was not sufficient evidence to support a full alignment of the protein at that genomic location. (D) These gene models were then scored based on alignment scores and the percent of the protein that was represented in the gene model. If more than one gene model aligned to the same genomic location, the score was used to determine which was a better fit (or both were kept if they had a similar score). (E) The gene models were then sorted and the information was converted to gff3 format. (F, G) Homeologous regions were found using the density of homeologous gene models found between the different genomic regions. (H) For the Atlantic salmon, the 4R genome was differentiated from the rest of the genome duplications by identifying the most dense relationship in a region.