Compact Identifier services

The resolution system provides consistent access to life science data using Compact Identifiers (CIDs). CIDs consist of an assigned unique prefix and a local provider designated accession number (prefix:accession). The resolving location of CIDs is determined using information that is stored in the Registry, which contains high quality, manually curated information on over 700 life science data resources (largely databases).

The prefix assignment process involves registration of a unique prefix to individual life science data collections together with recording a variety of useful metadata, including a description of the data resource, accession identifier pattern and a list of known resolving locations. When a Compact Identifier is presented to the resolver, redirection can be accomplished in either a resource specified or location independent (resource unspecified) manner. The latter method takes into consideration information such as the uptime and reliability of all available hosting resources, for example, pdb:2gc4, GO:0006915.

Besides resolution, provides a number of additional services, including the ability to harvest and display (and bioschemas) metadata markup associated with datasets by presenting a CID to the metadata service.

We have also re-engineered the system to address the need to provide scalable, highly available and low latency services within global scientific e-infrastructures. We have deployed the infrastructure in multiple cloud environments including Amazon web services and Google Cloud Platform, bringing our services closer to the data. The new system benefits from the auto-scaling and multiple zone availability afforded by cloud provision.




CC BY 4.0