figshare
Browse
datum.pdf (61.91 kB)

Datum: A System for TFRecord Dataset Management

Download (61.91 kB)
Version 2 2021-08-08, 13:41
Version 1 2021-08-08, 07:32
preprint
posted on 2021-08-08, 13:41 authored by Mrinal HaloiMrinal Haloi, Shashank ShekharShashank Shekhar

Deep learning model training efficiency depends on the performance of the input pipeline. Especially when training very deep neural network using GPU servers, efficient input pipeline can significantly help reducing overall learning time. Tensorflow provides TFRecord format to store data in a sequence of serialized protocol buffers as a binary record and tf.data.Dataset API for building input pipeline. TFRecord files are very read efficient and light on hard disk space usage. TFRecord files can be loaded as tf.data.Dataset for training neural network models. Creating and loading the TFRecord dataset involves writing a lot of complex codes and a time-consuming process. We develop Datum to automate this complex process.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC