figshare
Browse
200110PAGbonopub.pdf (3.19 MB)

Indices for NGS Data and Gene Expression Data Registered in Public Databases

Download (3.19 MB)
poster
posted on 2020-01-08, 02:46 authored by Tazro OhtaTazro Ohta, Takeru NakazatoTakeru Nakazato, Hidemasa BonoHidemasa Bono

In the integrated database project in Japan, Database Center for Life Science (DBCLS) has developed various computational tools for reuse of huge amount of data archived in the public repository (https://dbcls.rois.ac.jp/services-en.html).


Meanwhile, DNA Data Bank of Japan (DDBJ) has archived and maintained data from the high-throughput sequencing platforms in the International Nucleotide Sequence Database Collaboration (INSDC) with NCBI Genbank and EBI ENA (http://www.insdc.org/). In collaboration with DDBJ, we made a search engine for metadata of these INSDC databases which consist of Bioproject, Biosample and Sequence Read Archive (SRA), which is called DBCLS SRA (http://sra.dbcls.jp/). DBCLS SRA is now linked from DDBJ website, and it is planned to be used in DDBJ officially.


Because of the high-throughput sequencing platform, tens of thousands of RNA-seq data have been archived as transcriptome data in SRA described above. On the other hand, transcriptomic data from microarray is still the majority of data in the public gene expression databases known as NCBI Gene Expression Omnibus (GEO) and EBI ArrayExpress (AE). Furthermore, DDBJ started yet another gene expression data repository called Genomic Expression Archive (GEA) in 2018. Thus, it is not easy to draw new discoveries by comparing datasets from those transcriptomic data because of the complexity of relationships among those databases. We therefore constructed an index for those gene expression data repositories, called all of gene expression (AOE) to integrate publicly available transcriptomic data (GEO, AE and GEA). The web interface of AOE (https://aoe.dbcls.jp/) can graphically query data in addition to the application programming interface. By collecting gene expression data by RNA-seq from SRA, AOE also includes data not included in GEO, AE and GEA.


Both DBCLS SRA and AOE are freely available without any registration.

Funding

National Bioscience Database Center, JST

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC