Uberization of Symplectic Elements Citation Data Entries and use of Curation Bins

2017-08-03T12:51:23Z (GMT) by Muhammad Javed
At Cornell University Library, the primary entity of interest is scholarship, of which people and organizations are, by definition, both the creators and consumers. From this perspective, the attention is focused on aggregate views of scholarship data. In Scholars@Cornell, we use “Symplectic Elements” [1] for the continuous and automated collection of scholarship metadata from multiple internal and external data sources. For the journal articles category, Elements captures the title of the article, list of the authors, name of the journal, volume number, issue, ISSN number, DOI, publication status, pagination, external identifiers etc. - named as citation items. These citation items may or may not be available in every data source. The Crossref version may be different in some details from the Pubmed version and so forth. Some fields may be missing from one version of the metadata but present in the another. This leads to the different metadata versions of the same scholarly publication - named as version entries. In Elements, a user can specify his/her preferred data source for their scholarly publications and VIVO Harvester API [2] can be used to push the preferred citation data entries from Elements to Scholars@Cornell. In Scholars@Cornell, rather using VIVO Harvester API, we built an uberization module that merge the version entries from multiple data sources and creates a “uber record”. For creation of an uber record for a publication, we ranked the sources based on the experience and intuition of two senior Cornell librarians and started with the metadata from the source they considered best. The uberization module allowed us to generate and present best of the best scholarship metadata (in terms of correctness and completeness) to the users. In addition to external sources (such as WoS, PubMed etc.), we use Activity Insight (AI) feed as an internal local source. Any person can manually enter scholarship metadata in AI. We use such manually entered metadata (which is error-prone) as a seed (in Elements) to harvest additional metadata from external sources. Once additional metadata is harvested, uberization process merge these version entries and present the best of the best scholarship metadata that is later fed into Scholars@Cornell. Any scholarship metadata that could not pass through the validation step of Elements-to-Scholars transition, is pushed into a curation bin. A manual curation is required here to resolve the metadata issues. We believe such curation bins can also be used to enhance the scholarship metadata, such as adding ORCID ids for the authors, GRID ids for the organizations, adding abstracts of the articles, keywords, etc. We will briefly discuss the (VIVO-ISF ontology driven) data modelling and data architecture issues, as lessons learnt, that were encountered during the first phase of Scholar@Cornell launch. https://scholars.cornell.edu