figshare
Browse
E1-2 Project orchestration and building self-service tools Daniel Bentall.pdf (1.2 MB)

Project orchestration and building self-service tools

Download (1.2 MB)
presentation
posted on 2024-03-04, 09:40 authored by eRNZ AdmineRNZ Admin, Daniel Bentall

Project orchestration, defined as the management of various data sources, pipelines and artifacts in a data project, is a complex and iterative process. A typical machine learning project includes pipelines such as data ingestion, exploratory data analysis, labelling, modelling, and deployment. These actions are rarely executed in a waterfall fashion, and are often evolving, with data, configuration, code, and metrics in constant flux. Tracking these moving parts and understanding their interconnections pose significant challenges. Effective project orchestration can greatly reduce this complexity and facilitate the creation of basic self-service tools, further simplifying the process for downstream users.

This presentation introduces a solution comprised of a project orchestration command-line interface (CLI), DVC- and git-based project repositories and independent pipeline library repositories. The project orchestration CLI abstracts away DVC and git commands, enabling the creation of reproducible projects composed of generic pipelines and project-specific pipeline configuration. Project repositories manage project data, configuration, code, and artifacts, while the pipeline repositories house collections of related pipelines, such as general data processing pipelines or deep learning pipelines. The pipeline libraries greatly benefit from using Prefect to define and deploy pipelines. This makes it easy to decouple the software environment and code from the projects which utilise the pipelines, as well as providing powerful python-based pipeline definitions, observability, and infrastructure configuration.

Finally, to demonstrate this solution, we will look at how a user with no programming experience can develop a deep learning computer vision model using a self-service tool built with this system.

ABOUT THE AUTHOR
Daniel Bentall is a data scientist with 5 years of experience in developing deep learning computer vision projects for horticulture and aquaculture at Plant and Food Research (PFR) across a variety of scientific fields. He is currently leading a research aim in PFR’s digital twin programme, with a focus on developing information extraction methods from orchard imagery of planar cordon apple trees.


For more information about eResearch NZ / eRangahau Aotearoa, visit:
https://eresearchnz.co.nz/

History

Usage metrics

    eResearch NZ

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC