figshare
Browse
1/1
4 files

Human and Mouse UTRomes

Download all (33.6 MB) This item is shared privately
dataset
modified on 2024-04-01, 02:14

Overview

This dataset contains BED and GTF files representing the cleavage sites and 3'UTR isoform annotations derived from reprocessing Microwell-seq data. These objects are part of the minimum dataset required for verifying the analysis reported in Fansler et al., bioRxiv, 2023.

Description

The BED files contain candidate cleavage sites from the Mouse Cell Atlas and Human Cell Landscape datasets. In brief, paired-end reads were merged with PEAR when overlapping, cell barcodes extracted with umi_tools, poly-A tails removed with cutadapt, and then remaining reads mapped to the hg38 or mm10 genomes using HISAT2. Reads were partitioned into cell types according to annotations from the original publications. Per cell type, the 5' end of alignments were summarized, counts were merged to the mode with 30 nts, and finally filtered to a minimum threshold of 5 TPM. The resulting BED files identify the cell type cluster in the name column and the number of observed reads in the score column.

The GTF files are augmentations of GENCODE vM25 and v39, using novel cleavage sites, and then truncated to 500 nt. In brief, the sites provided in the BED files were harmonized across cell types by merging to the mode within 30 nts. The candidate sites were then serially classified as (1) "validated" if already in GENCODE (2) "supported" if found in PolyASite2.0 at 3 TPM or higher (3) "likely" if cleanUpdTSeq scored the posterior probability of being an internal priming site below 0.0001% (4) "unlikely", otherwise. The "supported" and "likely" candidates were then used to augment GENCODE annotations of protein coding transcripts, and each transcript was truncated to the 500 nts at the 3' end. The final annotations identify the regions where the scUTRquant pipeline will quantify scRNA-seq data.

Data Generation

All code required to generate these files is available at:

Funding

Tri-Institutional Training Program in Computational Biology and Medicine

National Institute of General Medical Sciences

Find out more...

3'UTR-mediated protein-protein interactions determine protein functions

National Institute of General Medical Sciences

Find out more...

Regulation of protein multi-functionality by 3 UTRs

National Institute of General Medical Sciences

Find out more...

Cancer Center Support Grant

National Cancer Institute

Find out more...

Function and therapeutic targeting of 3'UTR-dependent protein localization (UDPL) in cancer

Pershing Square Foundation

Find out more...