Two subtle problems with over-representation analysis

Ziemann, Mark; Schroeter, Barry; Bora, Anusuiya

Two subtle problems with over-representation analysis

journal contribution

posted on 2025-04-11, 03:26 authored by Mark ZiemannMark Ziemann, Barry Schroeter, Anusuiya Bora

Abstract Over-representation analysis (ORA) is used widely to assess the enrichment of functional categories in a gene list compared to a background list. ORA is therefore a critical method in the interpretation of ’omics data, relating gene lists to biological functions and themes. Although ORA is hugely popular, we and others have noticed two potentially undesired behaviours of some ORA tools. The first one we call the “background problem,” because it involves the software eliminating large numbers of genes from the background list if they are not annotated as belonging to any category. The second one we call the “false discovery rate problem,” because some tools underestimate the true number of parallel tests conducted. Here we demonstrate the impact of these issues on several real RNA-seq datasets and use simulated RNA-seq data to quantify the impact of these problems. We show that the severity of these problems depends on the gene set library, the number of genes in the list, and the degree of noise in the dataset. These problems can be mitigated by changing packages/websites for ORA or by changing to another approach such as functional class scoring.

History

Journal

Bioinformatics Advances

Volume

4

Article number

vbae159

Pagination

1-9

Location

Oxford, Eng.

Publisher DOI

https://doi.org/10.1093/bioadv/vbae159

Open access

Yes

ISSN

2635-0041

eISSN

2635-0041

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Editor/Contributor(s)

Ma L

Issue

1

Publisher

Oxford University Press

Publication URL

http://dx.doi.org/10.1093/bioadv/vbae159

Two subtle problems with over-representation analysis

History

Journal

Volume

Article number

Pagination

Location

Publisher DOI

Open access

ISSN

eISSN

Language

Publication classification

Editor/Contributor(s)

Issue

Publisher

Publication URL

Usage metrics

Categories

Keywords

Licence

Exports