Computational pipeline to characterize the lincRNome.

posted on 01.03.2013, 12:01 by David Managadze, Alexander E. Lobkovsky, Yuri I. Wolf, Svetlana A. Shabalina, Igor B. Rogozin, Eugene V. Koonin

The subset of orthologous lincRNAs (Kb) was obtained by comparing genomic positions of mouse and human lincRNA genes (minimal overlap 100 nucleotides), with further manual inspection of the genomic alignments. This comparison yielded 196 pairs of unique orthologous pairs of human and mouse lincRNA genes (Kb). Of the 4662 human lincRNAs (Lh), corresponding alignable regions in mouse were detected for 3529. These sequences were designated putative orthologs and checked for evidence of expression using RNAseq data for mouse tissues. Of the 3369 putative lincRNAs, for which the exon models could be determined, 2872 showed expression level greater than zero (Kh). Similarly, the subset of mouse lincRNAs with expressed putative orthologs (Km) was identified by searching for evidence of expression in human tissues. Of the 4156 mouse lincRNAs (Lm), for 3157 corresponding alignable regions with expression level greater than zero were identified in mouse. After applying ORF (<120 nucleotides), indel and expression thresholds (see Methods for details), final results (Figure 2 and Table 1) were obtained using a Maximum Likelihood Model (see Methods for details) and Lm, Lh, Km, Kh, Kb as input parameters (shown by dashed arrows) to estimate the size of the human lincRNome (Nh), the mouse lincRNome (Nm) and the orthologous subset of the two lincRNomes (Nb). For details of the procedures see Methods.


