SI2-SSI: Inquiry-Focused Volumetric Data Analysis Across Scientific Domains: Sustaining and Expanding the yt Community
datasetposted on 07.08.2017 by Matthew Turk, Nathan Goldbaum, 0000-0002-6226-7689, Leigh Orf
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Grant proposal for NSF SI2-SSI for yt project ( http://yt-project.org/ ) development. Awarded as OAC-1663914.
Scientific discovery across the physical sciences is increasingly dependent on the analysis of volumetric - or three-dimensional - data, that may come from a supercomputer simulation, direct measurement, or mathematical models. Researchers typically seek to extract meaningful insights from this data by visualizing and analyzing it in various ways. The ways in which scientists process volumetric data are actually quite similar across domains, but cross-disciplinary knowledge transfer and tool development is blocked by barriers of terminology. This project seeks to enhance an analysis and visualization toolkit named yt that is currently primarily used for astrophysical simulations. yt allows scientists to access and analyze data at several different levels by providing an interface that is designed to answer questions motivated by the underlying scientific problem, while worrying less about details such data formats, specific analysis techniques etc. yt's utilization in computational astrophysics has dramatically increased access to advanced algorithms for both visualization and analysis, and fostered the growth of a community of researchers sharing techniques and results. This project seeks to make yt available and adopted by scientists in other domains of science thus reproducing its success in astrophysics in these other science domains. This project will expand the yt community beyond theoretical astrophysics and enable and promote collaboration and advanced data analysis in the fields of meteorology, seismology and global tomography, observational astronomy, hydrology and oceanography, and plasma physics.
Improvements to the yt project will proceed along four principal technical avenues. The first is to develop a system that adapts the way yt presents data via a set of domain contexts that encode the ontology, domain-specific vocabulary, and common analysis tasks for a given field of study. This will include creating a domain context system as well as a set of five pilot domain contexts developed in collaboration with domain practitioners. The second is to overhaul the yt field system, adding more versatility and enabling significant optimizations. Thirdly, the project team will implement non-spatial indexing schemes, providing methods for accessing and analyzing data that is not organized according to the standard spatial axes. The final improvement will be the development of a non-local analysis system, allowing generalized path traversal as well as domain convolutions. To ensure wide dissemination and use of these improved capabilities, the team will design domain-specific documentation and training materials, and organize outreach and training events for early-career researchers. This will consist of both hands-on technical workshops and curricula developed in collaboration with Data Carpentry for utilization at other institutions. This combination of technical developments and social investments has been designed to ensure both readiness of the software and engagement of the targeted research communities.