Crosswalking Research Vocabularies in VIVO.pptx
Many VIVO sites use different vocabularies to indicate the research areas they are affiliated with. For example the biological sciences uses PubMed MeSH subject headings but the Physical sciences might use a controlled vocabulary from a commercial vendor like Clarivate’s Web of Science Keywords or FAST terms from the Library of Congress. This can lead to redundancy and confusion on a VIVO site that allows the end user to filter based on a vocabulary term. The same or similar terms might display multiple times. Generally an end user isn’t concerned with the originating vocabulary of the term. They just want to filter or center their experience on that term. An example is how one can draw an equivalence between the Mesh Term “Textile Industry” at https://www.ncbi.nlm.nih.gov/mesh/68013783 and the same Agrovoc term visible at: http://oek1.fao.org/skosmos/agrovoc/en/page/?uri=http://aims.fao.org/aos/agrovoc/c_7696&clang=en
These both indicate “Textile Industry”. In VIVO the problem arises if one publication indicates the Mesh “Textile Industry” term while a different publication might indicate the Agrovoc “Textile Industry” term. VIVO now will show two “Textile Industry” concepts.
It gets more confounding as we search through the other vocabularies. Some sites like wikidata might have links to the term in various vocabularies, but not all. Looking at wikidata we have:
This has links to other vocabulary synonyms for “Textile”, but no links to FAST, MeSH, LCSH, Fields of Research (FOR) , or others.
Hence challenges are presented for VIVO sites that ingest publications from various sources, either directly or via applications like Symplectic Elements.
Our VIVO site, experts.colorado.edu is now impacted by this problem. We have thousands of publications from various sources using different vocabularies for research terms. We would like to import these publications and their terms into our VIVO. As a University that serves many disciplines how do we standardize which terms we will use. At first glance it seems that the amount of manual curation to do this properly is daunting.
The question then becomes what are the use cases for using Research Areas and harmonizing the terms within a site or across multiple site. An obvious case would be a journalist searching an institution for experts within a certain subject area. The journalist might not know specifically what the subject area is so it’s important to provide a top level view of general subject areas and allow them to drill down. This also might imply that the vocabularies utilize a SKOs type broader/narrower implementation. In this case each of the broader and narrower terms also needs to be harmonized with other vocabularies.
Solving this problem is crucial, especially if one wants to traverse multiple machine readable VIVO sites to locate items that might share a similar research area. Potential solutions could be that a VIVO site imports a crosswalk list of same-as statements between different research vocabularies or they utilize a lookup service.
Other options include a federated vocabulary harmonizing service where all VIVOs register and have their taxonomies mined in order to be synced with a master service. Perhaps something similar to a distributed blockchain service.
One reason this might be preferable is because many if not most VIVO sites require some sort of autonomy regarding the use of terms and their associations with other objects. Hence it’s imperative that the VIVO application continues to offer this flexibility.
This workshop or panel will discuss and weigh the various options of modeling and displaying this data, in machine readable and html format, and align these options with the needs of the typical VIVO sites taking into account the governance mechanisms and uses cases for these various VIVO scenarios. This is a very broad topic hence discussion will be scoped to maintain an objective of having a VIVO site display research areas in a similar fashion as commercial sites like Amazon do.