Deep learning-based speech enhancement methods make use of their nonlinearity properties to estimate the speech and noise signals, especially the nonstationary noise. DCCRN, in particular, achieves state-of-the-art performance on speech intelligibility. However, the nonlinear property also causes concern about the robustness of the method. Novel and unexpected noises can be generated if the noisy input speech is beyond the operation condition of the method. In this paper, we propose a hybrid framework called LDCCRN, which integrates a traditional speech enhancement method LogMMSE-EM and DCCRN. The proposed framework leverages the strength of both approaches to improve the robustness in speech enhancement. While the DCCRN continues to remove the nonstationary noise in the speech, the novel noises generated by DCCRN, if any, are effectively suppressed by LogMMSE-EM. As shown in our experimental results, the proposed method achieves better performance over the traditional approaches measured with standard evaluation methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.