Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators?
At the 1:AM conference in London last year, our proposal on “studying consistency across altmetrics providers” received an 1:AM project grant, provided by Thomson Reuters. The main focus of the project is to explore consistency across altmetrics providers and aggregators for the same set of publications. Altmetric.com, the open source solution Lagotto and Mendeley.com participated in the study while other altmetrics aggregators (Plum Analytics and Impact Story) didn’t due to some difficulties such as agreeing on a random sample and its size and extracting the metrics exactly at the same date/time. By consistency we mean having reasonably similar scores for the same DOI per source across different altmetrics providers/aggregators. For example, if Altmetric.com and Lagotto report the same number of readers as the source (Mendeley) for a same DOI, they are considered to be consistent. This is very critical to understand any potential similarities or difference in metrics across different altmetric aggregators. This work is the extension of a 2014 study using a smaller sample of 1000 DOIs, and all coming from one publisher (PLOS). In that study we showed that altmetrics providers are inconsistent, in particular regarding Facebook counts and number of tweets
Data & method:
For this purpose, we collected a random sample of 30,000 DOIs obtained from Crossref (15,000) and WoS (15,000), all with a 2013 publication date. We controlled the time by extracting the metrics for the data set at the same date/time on July 23 2015 starting at 2 PM using the Mendeley REST API, Altmetric.com dump file and the Lagotto open source application used by PLOS. Common sources (Facebook, Twitter, Mendeley, CiteULike and Reddit) across different provider/aggregators were analyzed and compared for the overlapped DOIs.
Several discrepancies/inconsistencies among these altmetrics data providers in reporting metrics for the same data sets have been found. In contrast to our previous study in 2014, Mendeley readership counts were very similar between the two aggregators, and to the data coming directly from Mendeley. One important reason is a major update of the Mendeley API between the two studies. On the other hand, we found similar results for Facebook counts and tweets as before that there are still huge differences across Altmetric.com vs. Lagotto in collecting and reporting these metrics.
Possible reasons for inconsistency:
We have summarized here some of the possible reasons we identified for inconsistencies across the different providers such as:
Differences in reporting metrics (aggregated vs. raw score/public vs. private posts)Different methodologies in collecting and processing metrics (Twitter API)Different updates: possible time lags in the data collection or updating issuesUsing different identifiers (DOI, PMID, arXiv id) for tracking metricsDifficulties in specifying the publication date (for example different publication dates between WoS and Crossref) influence data collectionAccessibility issues (resolving DOIs to URLs issues; cookies problems, access denies) differ across different publisher platforms
All in all, these problems emphasize the need to adhere to best practices in altmetric data collection both by altmetric providers/aggregators and publishers. For this we need to develop standards, guidelines and recommendations to introduce transparency and consistency across providers/ aggregators. Fortunately, the National Information Standards Organization (NISO) has initiated a working group on altmetrics data quality in early 2015 which aims to develop clear guidelines for collection, processing, dissemination and reuse of altmetric data that can benefit from a general discussion of the results of this project. Much works need to be done!