Interoperable Infrastructures for Digital Research: A proposed pathway for enabling transformation
Adam Farquhar and James Baker, The British Library
Governments, research organisations, cultural institutions, and commercial entities have invested substantial funds creating digital assets to enable new research in the arts and humanities. These assets have grown to include millions of items and petabytes of material covering all forms of content – manuscripts, monographs, maps, images, sound, and more. Unfortunately, scholars have been unable to fully exploit these digital assets. The supporting infrastructures are restrictive. The assets are distributed unevenly across organisations and systems. Access restrictions unpredictably limit where, how and who can use items.
This poster will outline a pathway to remedy this unacceptable state of affairs. It will explore the need for a simple-to-use infrastructure for digital scholarship. Built primarily using off-the-shelf technologies and services, we argue that such an interoperable infrastructure should, as far as possible, work like something the user already knows: it should allow the researcher to bring their own content, tools and creativity to a familiar environment. Where we envisage it differing from a local PC setup is by hosting otherwise difficult to obtain and too big to download digital content, offering the computational capacity required to quickly analyse big data using automated processes, and providing network services capable of robustly supporting digitally-driven research.
A key context of this proposed poster is research infrastructure developments around cloud, virtual and remote workflows. Notable among these are ongoing cyber-infrastructure work at the HathiTrust Research Centre1 and the deployed cloud research infrastructure used by the European Bioinformatics Institute.2 Whilst these observations and experiences point to a potentially crucial role for infrastructure in humanities research, we remain mindful of the robust critiques of recent digital humanities infrastructure projects. Quinn3 These critiques have highlighted how infrastructure development should not make strong assumptions about how researchers work, what tools they need, the sorts of problems that they will strive to solve, or even the specialised standards that they will employ. Our proposed pathway avoids these known problems by suggesting that researchers must be enabled to bring their own tools, work in whatever way they want, use any workflow, and address any sort of problem. We envisage this being achieved by infrastructure development that works with many digital content providers, supports a wide range of content types, and is embedded within arts and humanities research that uses a variety of data-driven methodologies. It would support growth in big data research in the arts and humanities using researcher appropriate standards and guidelines.
The informal, conversational setting of a poster session will prove a valuable opportunity to visit the key questions and problems around digital research infrastructure. These include:
What are the benefits of scholars being able to use off-the-shelf technologies to work with big data across major content holders?How can these infrastructures enable transformative research?Do hybrid cloud infrastructures provide a sustainable approach to service provision?
Such infrastructure could establish the foundation for scholarly work with large scale content collections for years to come, enabling in turn transformative research that uncovers the value hidden in these digital assets and society to benefit from its investment. Such transformation requires leading-edge researchers, and eventually the majority of researchers, to adopt, learn and use new methods and techniques; to not just answer old questions in new ways but to arrive at new answers and to start asking entirely new questions as a consequence. This proposed infrastructure pathway aims to explore the next steps towards making this transformation a reality
This poster builds on experience providing researchers with digital content. Scholars increasingly demand scalable access to large quantities of digital content – big data – that they can analyse using their own software and tools. Early on, the amounts of digital data were small; it was possible to provide copies or enable network downloads. With the growing volumes of big data, this is no longer plausible. Instead of moving hundreds of terabytes of data to researchers, we must allow researchers to bring their tools to the data. This is consistent with changes in the broader IT landscape. We have established five principles to guide our pathway:
Keep it simple. Any new infrastructure should be simple to use and understand.Lower the bar. Any new infrastructure should not expose or require users to understand new or complex technologies or processes. It should, as much as possible, work like something they already doBring your own tools. Users should be able to employ the tools that they already understand and work with. For example, if a researcher uses Mathematica for image analysis in her office, she should be able to use it on large collections of digital assets distributed across multiple content organisations.Be creative. Users should be able to use data in creative, novel, unexpected ways. Many systems and infrastructures limit what users can do.Start small and grow big. Users should be able to try things out; explore, experiment and debug; and then deploy on large content sets.
1. Beth Plale, Opportunies and Challenges of Text Mining HathiTrust Digital Library, Koninklijke Bibiotheek, 15 November 2013 www.hathitrust.org/documents/kb-plalehtrc-nov2013.pdf
2. Creating a Global Alliance to Enable Responsible Sharing of Genomic and Clinical Data, 3 June 2013 www.ebi.ac.uk/sites/ebi.ac.uk/files/shared/images/News/Global_Alliance_White_Paper_3_June_2013.pdf
3. Quinn Dombrowski, What ever happened to Project Bamboo?, DH2013