Report on the state and quality of biosystematics documents and survey reports

The traditional audience for books and scientific papers in which scientists report their findings has been the human reader. Now we can enhance publications by attaching to them many different kinds of digital objects (such as the sounds made by birds, maps that show where they occur, or images and videos) or by adding computer-readable sections and terms that allow computers to extract information for re-use. We refer to these enriched and marked-up documents as 'enhanced'.

While the technology is available, only a tiny proportion of scientific publications are enhanced. Without enhancement, the research that is reported in the biosystematic (= taxonomic) literature cannot participate quickly and easily in the big data world.

The EU e-Infrastructure coordination project "pro-iBiosphere", targeting the preparation of the European Open Biodiversity Knowledge Management System, makes thirteen recommendations to enhance the publication process in order to to make biodiversity data accessible, computable and re-usable. These recommendations are a plea for a major change in how we publish biodiversity research. If adopted, they have the potential to transform the role of scientific publications. The recommendations include the need to make all biosystematic literature "openly and freely accessible to the maximum extent possible and for it to be marked up with computer-legible terms from an open, platform-independent XML or similar language. Mark-up allows computers to understand what is in a document, and to extract content for use elsewhere. The complete list of recommendations is available on "The State and Quality of Biosystematics Documents and Survey Reports".

The report demonstrates that publishers and scientists are not using new technologies to their full potential. The process of publication has historically targeted scientific articles that are suited to human readership. Nowadays, articles can also be used as vehicles for data that can be extracted and then re-used by others. This process does not reduce the value of articles for people, but creates a richer and more valuable resource at a very small additional cost

The authors of the report recognise that the community must build an infrastructure that can capture and manage the rich supply of re-usable data, and make it available to other users, for example through the Linked Open Data Cloud. Only through this process, can the very rich corpus of data and its underlying sources in the published record be linked to the huge production of born-digital genomic data. With developments such as this, biologists will be better placed to rise to the grander challenges, such as understanding how the natural world will respond to a warming world. (Description by Donat Agosti, September 2013)