figshare
Browse

Dataset for transcription readthrough events in healthy human tissues

Download (186.7 MB)
Version 2 2025-05-09, 07:25
Version 1 2025-05-09, 06:54
dataset
posted on 2025-05-09, 07:25 authored by Xi ChenXi Chen

Transcription readthrough occurs when RNA polymerase bypasses canonical termination sites, producing elongated RNA molecules called readthrough (RT) transcripts or Downstream-of-Gene (DoG) transcripts. Although RT transcripts have been implicated in stress responses and pathological states, their roles in healthy human tissues are poorly understood. This study collected and analyzed RT events across 43 healthy human tissues, identifying 76,583 RT events from 36,335 transcripts across 11,875 genes. The dataset encompasses the sequences, locations, expression profiles, and comprehensive annotation information of corresponding genes for RT transcripts. It provides a thorough elucidation of RT transcriptomics and its significance in gene regulation, offering a wealth of benchmark data to facilitate further research on RT transcripts.
The file contains data on Downstream-of-Gene (DoG) transcripts with the following column headers:

  • gene_id: Ensembl Gene ID for the gene associated with the DoG transcript.
  • chromosome: Chromosome location of the gene.
  • start_position: Start position of the gene on the chromosome.
  • end_position: End position of the gene on the chromosome.
  • strand: Strand orientation of the gene (e.g., + or -).
  • chromosome (DoG): Chromosome location of the DoG transcript.
  • dog_start_position: Start position of the DoG transcript on the chromosome.
  • dog_end_position: End position of the DoG transcript on the chromosome.
  • strand (DoG): Strand orientation of the DoG transcript.
  • sequence: Nucleotide sequence of the DoG transcript.
  • mean-geneFPKM: Average expression level (FPKM) of the gene across samples.
  • mean-dogFPKM: Average expression level (FPKM) of the DoG transcript across samples.
  • tissue: Tissue type from which the data was derived.
  • all-geneFPKM: Expression levels (FPKM) of the gene across all samples.
  • all-dogFPKM: Expression levels (FPKM) of the DoG transcript across all samples.
  • sample_ids: Identifiers for the samples included in the dataset.
  • Symbol: Official gene symbol for the DoG-associated gene.
  • Synonym: Alternative names or synonyms for the gene.
  • Description: Functional description of the gene.
  • DOG: Unique identifier for the DoG transcript.

This dataset provides comprehensive information about DoG transcripts, including their genomic locations, sequences, expression levels, and associated gene details across different samples and tissues.

Funding

Zhejiang Provincial Natural Science Foundation (LQZQN25H250003)

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC