Skip to main content

Fine-grained Vocal Imitation Set

vocal imitation of a sound for fine-grained search

Bongjun Kim, Bryan Pardo

Fine-grained Vocal Imitation Set


This dataset includes 763 crowd-sourced vocal imitations of 108 sound events. The sound event recordings were taken from a subset of Vocal Imitation Set.

While the Vocal Imitation Set only contains vocal imitations of a single reference recording per class, this new dataset contains vocal imitations of multiple reference recordings per class.

[pdf]Bongjun Kim, Madhav Ghei, Bryan Pardo, and Zhiyao Duan, “Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology,” Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2018.