Evidence that identifiers are a source of problems for data integrators.

posted on 20.06.2016 by Julie McMurry

Advances in computing power and expansion of the Internet have led to increasing optimism that big data will lead to new insights. However, in the life sciences, relevant data is not only "big"; it is also highly decentralized across thousands of online databases. Wringing value from it depends on the discipline of data science and on the humble bricks and mortar that make it possible -- identifiers.

However, our collective handling of identifiers has lagged behind these advances. Diverse identifier problems (for instance broken links and ‘content drift’) make it difficult to integrate data and derive new knowledge from it. This is a snapshot of a living document intended to show real-world examples of identifier problems representative of those encountered by data integrators. It is not meant to be exhaustive.