Skip to main content

VimSketch

vocal imitation of a sound

Bongjun Kim, Mark Cartwright, Fatemeh Pishdadian, Bryan Pardo

VimSketch


VimSketch Dataset combines two publicly available datasets, created by the Interactive Audio Lab for the task of Query by Vocal Imitation (QBV).

VimSketch contains 542 reference sounds (including a variety of animal sounds, musical snippets, and environmental noise samples) and 12,543 vocal imitations of those reference sounds with a minimum of 13 and a maximum of 37 vocal imitations per reference. The two datasets included in VimSketch are:

  1. Vocal Imitation Set: a collection of crowd-sourced vocal imitations of a large set of diverse sounds collected from Freesound, which were curated based on Google’s AudioSet ontology.

  2. VocalSketch Dataset: a dataset containing thousands of vocal imitations of a large set of diverse sounds.

[pdf] Bongjun Kim, Madhav Ghei, Bryan Pardo, and Zhiyao Duan, “Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology,” Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2018.

[pdf] Mark Cartwright, Bryan Pardo, “VocalSketch: Vocally Imitating Audio Concepts,” in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015.

[pdf] Fatemeh Pishdadian, Bongjun Kim, Prem Seetharaman, and Bryan Pardo, “Classifying Non-speech Vocals: Deep vs Signal Processing Representations,” Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.