KEYWORDS: Education and training, Performance modeling, Transformers, Data hiding, Information fusion, Associative arrays, Parallel processing, Data modeling, Matrices, Mathematical optimization
BERT has shown remarkable performance in several natural language processing tasks, but it fails to exhibit the same high performance in cross-lingual tasks, particularly machine translation. To address this issue, we propose a BERT-enhanced neural machine translation (BE-NMT) model that optimizes the use of the information contained in BERT by NMT. Our proposed model comprises three components: (1) A MASKING strategy to mitigate knowledge forgetting caused by fine-tuning of BERT on the NMT task; (2) Serial and parallel processing of multi-attention models for incorporating BERT into the NMT system; (3) Fusing multiple hidden layer outputs of BERT to supplement the missing linguistic information of its final hidden layer output. We conducted experiments on several translation tasks, and our proposed model notably outperforms the strong baseline by improving 1.93 BLEU points on the United Nations Parallel Corpus English→Chinese task. Additionally, our model also achieves remarkable performance on other translation tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.