1/2
24 files

Bengalese Finch song repository

dataset
posted on 08.05.2021, 21:02 by David NicholsonDavid Nicholson, Jonah E. Queen, Samuel J. Sober
This is a collection of song from four Bengalese finches recorded in the Sober lab at Emory University. The song has been hand-labeled by two of the authors.

Set-up:
We have added a shell script to untar the compressed archives on Unix systems: untar_bfsongrepo.sh
Running this script will produce a directory structure like:
BFSongRepo/
bird_ID/
day_1/
day_2/

Usage:
To make it easy to work with the dataset, we have created a Python package, "evfuncs", available at https://github.com/soberlab/evfuncs (Please see "References" section below for a direct link).

How to work with the files is described on the README of that library, but we describe the types of files here briefly. The actual sound files have the extension .cbin and were created by an application that runs behavioral experiments and collects data called EvTAF. Each .cbin file has an associated .cbin.not.mat file that contains song syllable onsets, offsets, labels, etc., created by a GUI for song annotation called evsonganaly. Each .cbin file also has associated .tmp and .rec files, also created by EvTAF. Those files are not strictly required to work with this dataset but are included for completeness.

We share this collection as a means of testing different machine learning algorithms for classifying the elements of birdsong, known as syllables.

Citation:
please cite the DOI if you use this dataset. If you are developing machine learning algorithms, we ask that you cite our publications and software (see below) and consider benchmarking against the algorithms that we have developed. Our impression is that it will require a community of researchers working together to advance the state of the art in this area.

Works that use this dataset (URLs as links are below in "References"):
Comparison of machine learning methods applied to birdsong element classification
https://conference.scipy.org/proceedings/scipy2016/david_nicholson.html

Latent space visualization, characterization, and generation of diverse vocal communication signals
https://www.biorxiv.org/content/10.1101/870311v1.full.pdf

TweetyNet: A neural network that enables high-throughput, automated annotation of birdsong
https://www.biorxiv.org/content/biorxiv/early/2020/10/13/2020.08.28.272088.full.pdf

the paper above makes use of the following libraries:
https://github.com/yardencsGitHub/tweetynet
https://zenodo.org/record/4662200

https://github.com/NickleDave/vak
https://zenodo.org/record/4718767

https://github.com/NickleDave/crowsetta
https://zenodo.org/record/4584198

https://github.com/NickleDave/hybrid-vocal-classifier
https://zenodo.org/record/4678768

Fast and accurate annotation of acoustic signals with deep neural networks
https://www.biorxiv.org/content/biorxiv/early/2021/03/29/2021.03.26.436927.full.pdf

Please feel free to contact David Nicholson (nicholdav at gmail dot com) with questions and feedback

Funding

This work was supported by National Institutes of Health National Institute of Neurological Disorders and Stroke R01 NS084844, National Institutes of Health National Institute of Neurological Disorders and Stroke F31NS089406

History