Datasets analysed in a paper mapping AI research activity against COVID-19. Includes:
-rxiv metadata: A dataset with metadata about 1.8m papers from arXiv, biorXiv and medrXiv as of end May 2020 enriched with dummies about whether the papers are related to AI and/or COVID-19 research (updated 22/06/2020 to fix some ids)
-rxi_geo: A dataset with geographical metadata for papers based on the institutional affiliations of their authors after matching with the GRID database.
-covid_semantic: A dataset with topic information about COVID-19 papers based on a semantic analysis of their abstracts, including the clusters where papers have been classified and their topic mixes (updated 22/06/2020 to fix some ids).
-citation_metadata: Two JSON objects. One contains a lookup between COVID-19 related papers in the rXiv corpus and the papers they cite. Another contains metadata about the cited papers including their fields of study.
-mag_fos: A dataset with the Microsoft Academic Graph field of study hierarchy we use in our analysis (added 22 June 2020)