posted on 2024-03-15, 15:18authored byDavide Boldini, Lukas Friedrich, Daniel Kuhn, Stephan A. Sieber
Efficient prioritization of bioactive compounds from
high throughput
screening campaigns is a fundamental challenge for accelerating drug
development efforts. In this study, we present the first data-driven
approach to simultaneously detect assay interferents and prioritize
true bioactive compounds. By analyzing the learning dynamics during
training of a gradient boosting model on noisy high throughput screening
data using a novel formulation of sample influence, we are able to
distinguish between compounds exhibiting the desired biological response
and those producing assay artifacts. Therefore, our method enables
false positive and true positive detection without relying on prior
screens or assay interference mechanisms, making it applicable to
any high throughput screening campaign. We demonstrate that our approach
consistently excludes assay interferents with different mechanisms
and prioritizes biologically relevant compounds more efficiently than
all tested baselines, including a retrospective case study simulating
its use in a real drug discovery campaign. Finally, our tool is extremely
computationally efficient, requiring less than 30 s per assay on low-resource
hardware. As such, our findings show that our method is an ideal addition
to existing false positive detection tools and can be used to guide
further pharmacological optimization after high throughput screening
campaigns.