A Novel Approach for Efficient Submission of Research Data to the National Database for Autism Research (NDAR)

dataset

posted on 2015-08-17, 17:30 authored by Julie HawthorneJulie Hawthorne, Philip Langthorne, Frank J. Farach, David Voccola, Charles Tirrell, Leon Rozenblit

Researchers seeking to share their data with coordinating centers such as the National Database for Autism Research (NDAR), face numerous barriers to establishing new connections and maintaining existing ones. We sought to dramatically reduce the time and money required to establish and maintain the interoperability of data between research centers, by establishing a process where manual recoding of data is replaced by data sharing instructions in the form of extraction and transformation scripts. Over the course of seven typical (20-60 subjects, 400-1000 fields each) data submissions to NDAR, the need for duplication, retranscription, or restructuring of the source data was fully eliminated. Separating the extraction and transformation scripts from data files also eradicated the impact of additional data collection on the time required to repeat successful transmissions. Revision controlled management of these scripts also provided a new benefit: traceability of the transformation process itself. Now, point-in-time retrieval of extraction scripts and explanations for modifications to the data sharing interface are possible. This method has proven to be successful and efficient for interfacing research data with NDAR. It presents little-to-no impact to transmitting investigators’ data, ensures high data integrity, trivializes the complexities of repeatedly modifying a growing dataset over time, and introduces traceability to the collaborative process of integrating two collections of data with one another.