As intelligent devices become increasingly prevalent in our daily life, the requirement of privacy has been significantly increased. To address the issue of privacy protection, the topic of adversarial attack appeared in recent year. Initially, adversarial attack was predominantly applied to image recognition. However, due to the unique characteristics of audio data, the attacks suitable for images, e.g., additive perturbations, may not be applicable in audio cases. The goal of this study is to perform adversarial attack on speech signals such that they cannot be recognized by automatic speech recognition (ASR) systems but still be identified by humans. We introduce several distinct methods for noise addition and precision-reducing to generate adversarial examples for ASR systems. The proposed approach leverages audio features extracted through filtering and time-frequency transformations. The adversarial samples generated using the proposed methods not only retain their intelligibility for human listeners but also achieve a 100% success rate in blind attacks against ASR systems with unknown architectures and parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.