figshare
Browse

ISMB 2022 Poster: Web scraping pilot study for SARS-CoV-2 variants of concern dashboards

Download (1.13 MB)
poster
posted on 2025-05-09, 16:54 authored by Lisa MayerLisa Mayer

Poster presented at: 30th Conference on Intelligent Systems for Molecular Biology (ISMB)on July 13, 2022 in Madison, Wisconsin.

Co-Authors: Wiriya Rutvisuttinunt, Liliana Brown, Steve Tsang, Jane Lockmuller

Tracking the SARS-CoV-2 variants and mutations is essential to inform the development of medical countermeasures. In response, many dashboards emerged to publish aggregated variant data through independent analyses using their own metrics and visualizations. To leverage knowledge across dashboards and prioritize SARS-CoV-2 variants with high public health impact, we developed a pipeline to automate the collection of data on variants of concern (VOC), variants of interest (VOI) and variants under monitoring (VUM) from relevant dashboards and generate consensus by web scraping with Python Selenium and Beautiful Soup followed by visualization in R. Additionally, we used the FAIR Data Principles criteria to track the data openness for each dashboard. From June 1 through September 9, 2021, we monitored twelve variant-reporting websites and scraped three dashboards (25%). The list of top variants of concerns is in agreement across these dashboards, which highlights the high impact threat levels. The nine other websites (75%) had structures inaccessible to the web scraping pipeline. Some challenges faced included limited programmatically accessible data, difficulty finding documentation, and frequent website structure changes. Overall, all dashboards provided visual variant summaries; however, expanding websites’ machine-readability and documentation would strengthen the impact by improving interoperability and reusability.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC