VIVO2015_vestdampdf. (1.92 MB)

Fuelling VIVO with rich meta-data from a RIS system

Version 2 2015-09-23, 20:21

Version 1 2015-08-18, 16:05

dataset

posted on 2015-09-23, 20:21 authored by Thomas Vestdam

A modern Research Information System (RIS) aims to capture very complete, fine-grained and coherent meta-data about research at an institution - such as research outputs, projects, external income and expenditure, awards, activities, impact, and research data sets as produced by the academics/faculty at the institution. One of the main purposes of having a RIS is to perform internal or external research assessments and evaluation of the workfore, or just plain measurement in context of income and output. The over-all requirements for a RIS is a rich and fine-grained model, advanced tools for managing and visualising data within the RIS, supplying workflows, reporting on data, as well as means for public exhibition of data stored in the RIS, in effect showcasing the strength of the institution.

In contrast to a RIS like VIVO - most comercial RIS systems are build on top of a relational database - for many good and sound reasons. So the question is - what if you wan’t to supply networking capabilities between traditional, enterprise level, RIS systems, similar to what you can achieve with a number of connected VIVO systems? You could 1) create you own networking tool that aggregates appropriate information from the individual systems as done before for VIVO systems, 2) you can setup a VIVO “index” for each system populated RIS with information from each individual system, or 3) build a SPARQL parser that translates SPARQL quires into you own internal data-model. Common for all these solutions is that they allow your RIS system to connect to a VIVO network. So far we have implemented two of these options - and, would like to share the experiences we have made so far, both in terms of the usefulness of the tools, but also in terms of the experiences we made build the tools.

Option 1: The concept of a VIVO “index” is quite simple - just push whatever information you have in your master system to a tripple store, while adhering to the VIVO ontology. So you basically just need a triple store and a SPARQL server, like Apache Jena (Fuseki). The challenges here are the mapping of meta-data between the two diffrent meta-data models and implementing a mechanism for incremental updates of the triple store based on updates in the master system. Benefit of the solution is having a VIVO “compliant” exhibition of data via a SPARQL endpoint. Downside being that you now have yet another “server” to maintain. We will elaborate much more on these pros and cons during the presentation.

Option 2: Creating your own networking tool - or a Community Service as we call it - is fairly simple - especially if you are only aggregating data from systems that have a well-defined standard interface for harvesting. Our Community Service currently aggregates more than 90.000 researchers, 3.000.000 publications, 115.000 grants and a small number of equipment (new feature). The Community Services is fed with information from Pure systems (or RIS) via their individual web-services. This web-service provide a basic harvesting mechanism that allows clients (the aggregator) to harvest all information in the instance, or simply harvest changes in the RIS since last harvest. Benefit of such an approach is a lot of aggregated data, where the downside is a closed ecosystem. We will elaborate more on the details of this solution, as well as present pros and cons during the presentation.

Finally we will discuss how we could establish a standard for exhibition for networking tools that is platform agnostic.