Paper
20 April 2023 Principle research of word vector representation in natural language processing
Shijia Kang, Linggang Kong, Bin Luo, Cuifang Zheng, Jiaju Wu
Author Affiliations +
Proceedings Volume 12602, International Conference on Electronic Information Engineering and Computer Science (EIECS 2022); 1260209 (2023) https://doi.org/10.1117/12.2668487
Event: International Conference on Electronic Information Engineering and Computer Science (EIECS 2022), 2022, Changchun, China
Abstract
Natural language processing is a research direction in many fields such as linguistics, computer science, and data fusion of study. The representation of word vector is a method to map words into the real vector space, which is the core technology of many current natural language processing tasks. This paper summarizes and studies some typical expression methods of word vector as well as research the word vectors of linguistics and mathematical principle. We first elaborate the process of mapping the words to the vector, namely, encoding natural language information to word vector according to semantics. Secondly, we analyze several typical methods such as co-occurrence matrix, Word2Vec, GloVe, ELMo on information carrying capacity. Thirdly, on the basis of analyzing the principles of these methods, this paper also uses SVD decomposition, neural network, and other methods respectively to reproduce the specific process of generating word vectors. Finally, combining word similarity calculation and text sentiment classification task, we compare the performance of word vectors trained by various methods in different tasks. Experiments verify the conclusion that different word vector generation methods have different emphases in carrying linguistic ability and perform differently in different tasks.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Shijia Kang, Linggang Kong, Bin Luo, Cuifang Zheng, and Jiaju Wu "Principle research of word vector representation in natural language processing", Proc. SPIE 12602, International Conference on Electronic Information Engineering and Computer Science (EIECS 2022), 1260209 (20 April 2023); https://doi.org/10.1117/12.2668487
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Cooccurrence matrices

Semantics

Artificial intelligence

Analytical research

Data modeling

Singular value decomposition

Back to Top