Effective and Inconspicuous Over-the-Air Adversarial Examples with Adaptive Filtering

Audio Examples

Here we provide 40 audio comparisons between a baseline frequency-masking attack (Qin et al.+, see Qin et al. 2019 / Szurley & Kolter 2019 / Dörr et al. 2020 / Wang et al. 2020) and our proposed adaptive filtering attack (Adaptive Filtering). For reference, we also include the original unperturbed audio (Original), the spoofed utterance from the target speaker (Target), and a waveform-additive projected gradient descent attack using the same loss function and optimization procedure as the proposed filtering attack (PGD). All attacks are optimized through a set of simulated "over-the-air" environments, inducing large-magnitude perturbations. In the case of the PGD attack, the adversarial perturbation should be clearly audible. By contrast, the perturbations introduced by the Qin et al.+ are rendered more subtle through the use of a complex perceptually-inspired loss and two-stage optimization procedure. Finally, our proposed Adaptive Filtering attack improves on the perceptual quality of the Qin et al.+ attack without the use of a complex perceptually-inspired loss or two-stage optimization procedure. In a user study, listeners rate our proposed attack as less conspicuous than Qin et al.+ by 65.9% - 34.1% given a two-way forced choice. For additional details, see our preprint.

Source Speaker	Target Speaker	Source Audio	Target Audio	PGD	Baseline (Qin et al.+)	Proposed (Adaptive Filtering)
5683	1320
1089	61
2300	4970
5105	1580
7021	2830
4507	8224
4970	2300
4077	4446
2961	5142
3729	8463
121	8230
237	5683
1580	7127
8230	1089
1188	672
1284	1995
8463	1284
8455	237
2830	6930
61	4077
6930	7021
672	2094
7176	6829
4446	908
3575	7729
5142	1221
8555	121
2094	2961
908	4992
7127	3575
8224	4507
7729	3570
1320	5639
3570	260
1995	8455
4992	8555
260	5105
1221	3729
5639	1188
6829	7176