The cross-language news topic discovery task aims to cluster news texts in different languages that describe the same topic and classify the topic in the form of keywords. At present, most cross-language topic discovery methods are based on machine translation or external resources like bilingual dictionaries and parallel sentences to solve cross-language problems. However, Vietnamese is a low resource language and it is difficult and expensive to manually annotate ChineseVietnamese bilingual aligned corpora. To solve this problem, this paper proposes a Chinese-Vietnamese cross-language topic discovery method based on generative adversarial networks (GAN). Firstly, News texts are represented as vectors by BERT, and then the bilingual vectors are mapped to the same semantic space by GAN. Finally, k-means clustering algorithm is used to cluster the representation vectors and extract the topics. Experiments on the Chinese-Vietnamese bilingual news topic discovery corpus show that the proposed method is superior to the baseline.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.