Bongjun Kim, Mark Cartwright, Fatemeh Pishdadian, Bryan Pardo
VimSketch Dataset combines two publicly available datasets, created by the Interactive Audio Lab for the task of Query by Vocal Imitation (QBV).
VimSketch contains 542 reference sounds (including a variety of animal sounds, musical snippets, and environmental noise samples) and 12,543 vocal imitations of those reference sounds with a minimum of 13 and a maximum of 37 vocal imitations per reference. The two datasets included in VimSketch are:
VocalSketch Dataset: a dataset containing thousands of vocal imitations of a large set of diverse sounds.
[pdf] Bongjun Kim, Madhav Ghei, Bryan Pardo, and Zhiyao Duan, “Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology,” Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2018.
[pdf] Mark Cartwright, Bryan Pardo, “VocalSketch: Vocally Imitating Audio Concepts,” in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015.
[pdf] Fatemeh Pishdadian, Bongjun Kim, Prem Seetharaman, and Bryan Pardo, “Classifying Non-speech Vocals: Deep vs Signal Processing Representations,” Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.