Untapped data: Initiatives to build ML communities around big data generators

2018-12-06T09:40:37Z (GMT) by Jason Rigby
Machine learning (ML) research often operates within silos, separate from the people who created the data and disconnected from its original purpose. Published datasets add value to the ML community, yet ML research outcomes are rarely incorporated by the data generators themselves.<br>As one of Australia’s largest data generators, Monash University seeks to cultivate ML communities that are embedded in the data generation process, from experiment design to analysis. By upskilling researchers with intimate knowledge of their data in applied ML techniques and enabling access to technology, we believe that an ML-centric mindset will unlock unseen potential in data insights and operational efficiencies. This presentation will showcase our efforts to raise awareness of GPU accelerated ML approaches powered by the MASSIVE supercomputer, and novel modes of access via our “Strudel” and “Strudel Web” HPC desktops application. We show two projects that have seen benefits from ML approaches to data analysis through proactive outreach: ASpirin in Reducing Events in the Elderly (ASPREE), a joint study attracting over US$50m in funding between the US and Australia, and the largest prevention trial ever conducted in Australia; and NHMRC-funded research in X-ray video analysis of rabbit kitten breathing as an analogue to human infant respiratory development.<br>We will show preliminary outcomes demonstrating how these initiatives build lasting partnerships with researchers, opening dialogue and encouraging novel collaborative ML approaches to research data analysis.<div><br></div><div>Presented at the NVIDIA AI Conference, Singapore 24 October 2017</div>