PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12943, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
International Workshop on Signal Processing and Machine Learning (WSPML 2023)
It is generally recognized that poetry is somehow related to emotion. Automatic mining of poetic emotions is hotly discussed in recent years. This paper demonstrates a descriptive analysis on a well-known sad poem When You Are Old in a new perspective of phonetics, aiming to figure out the relationship between sound and emotion in poetry. It is found that in this poem, alveolar is a salient group in place of articulation. Also, the frequency of monophthong and that of diphthong is significantly different. However, no significant results are found in voicing, manner of articulation, vowel duration, vowel height and vowel location. It is still possible that readers can recognize the poet’s suggested sentiment since the frequency of some phonemes are so statistically different that evokes readers emotion. Findings here could contribute to automatic emotion recognition, AI poetry therapy, machine translation, and automatically poem creation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper considers the smart city traffic signal control problem. The application of reinforcement learning in smart city traffic signal control has always been a hot research field. However, agents cannot learn good policies in complex environments with a large number of agents. Therefore, this paper proposes a course learning method with increasing number of agents in a hybrid environment, completes multi-agent course transfer learning based on MADDPG algorithm, and applies it to the field of traffic lights in smart city. Experimental results show that the performance of the proposed system is better than the widely used traffic signal control algorithms in large-scale intersection environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The text posted by users often consists of a mixture of Chinese, English, and emojis, rather than a single language corpus. However, traditional sentiment analysis models have not effectively solved the problem of sentiment calculation in bilingual text with emojis, or even the problem of sentiment fluctuations between different modules. In response to these issues, this paper proposes a centrifugal fusion method to accomplish sentiment fusion of mixed-language phrases, and a centrifugal factor to address sentiment fluctuation in bilingual short text incorporating emojis. The bilingual sentiment analysis experiment, incorporating emojis into the Weibo dataset, yielded promising results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that pH sensor based on ion sensitive field effect transistor (ISFET) will produce temperature drift when temperature changes, a compensation method based on the combination of improved adaptive genetic algorithm (IAGA) and BP neural network is studied. After the sensor compensation network model is established, this method first uses an improved algorithm to improve the weight threshold of the BP neural network, and then trains the BP neural network constructed using the optimized weight threshold. Finally, the model obtained after training is implemented to sensor compensate. Experimental results show that the compensation error of the IAGA-BP model is less than 3%, and compared with the BP, RBF and AGA-BP models, the IAGA-BP model has better stability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we propose a method for constructing a Lhasa Tibetan prosodic lexicon based on a continuous speech database, which leads to significant improvements in speech synthesis performance for low-resource and complex languages. The experiment begins by utilizing a 3.95-hour speech database of a Lhasa Tibetan speaker, focusing on the prosodic feature of “tone sandhi” to investigate the phonological features and grammatical functions of Lhasa Tibetan. Drawing inspiration from the “Usage-Based Theory” in cognitive linguistics, we extract prefabs (prefabricated chunks) from 2,526 utterances. According to the prosodic features and grammatical structure of these prefabs, we construct a Prefabs Lexicon consisting of 175 thousand entries. In the comparative experiment, we employ a sequence-to-sequence speech synthesis approach and automatically segment the input sequence using both the Prefabs Lexicon and the conventional Tibetan lexicon. To evaluate the performance, a 56-minute dataset from another professional Lhasa broadcaster is used as a test set. Compared to the conventional Tibetan lexicon, the Prefabs Lexicon achieves an improved 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 of 0.92. Additionally, in the synthesis experiment for the toneless Amdo Tibetan, the Mean Opinion Score (MOS) increases to 4.17, indicating the universal applicability of the Prefabs Lexicon across dialects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Total impulse is highly related to the performance of solid rocket motors. Accurate prediction of total impulse is essential for both design and operation purposes. However, the traditional methods heavily rely on expert knowledge and are incapable of analyzing modern sophisticated equipment. In this paper, a novel total impulse prediction method based on deep learning is proposed. We established a CNN-LSTM-Attention deep neural network model, which can automatically process raw data of highly nonlinear for feature extraction and prediction with high accuracy. Practical rocket data are used for validations which are collected in ignition process. We compared the proposed method with the other popular algorithms to verify the effectiveness and superiority of this method. The outcomes indicate that the proposed data processing and prediction method can achieve promising performance, with the average percentage error of under 2%. By using the downsampling method in data processing, the dependency of the deep learning based method on the data amount is largely reduced. In this way, the proposed method has good application prospects in engineering problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Word sense disambiguation (WSD) is a critical task in natural language processing (NLP) and artificial intelligence. Supervised methods, such as decision list algorithms, are considered the most accurate machine learning algorithms for WSD. However, they are strongly influenced by knowledge acquisition bottleneck, making their efficiency dependent on the size of the tagged training set, which can be difficult, time-consuming, and costly to prepare. In this paper, we developed a hierarchical decision list algorithm for low resource languages that are morphologically rich, using a statistical method for collocation extraction from a big untagged corpus. Our approach identifies the most important collocations, which are the features used to create learning hypotheses. We manually construct the decision list based on the priority of the senses, improving the efficiency and accuracy of the algorithm. Our experimentation is based on a dataset of 800 sentences, focusing on 20 Setswana polysemous words. Using precision and recall to test our WSD system, we achieved 78% accuracy compared to 50% accuracy of the existing decision list algorithm. Our method can be used for other resource-limited languages. The system poses challenges in accurately determining the appropriate sense of a word, especially when dealing with idiomatic expressions and ambiguous contexts. The proposed approach could be further enhanced by incorporating additional NLP applications, such as a morphological analyzer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vocal tone is an important component of language and it plays a key role in language comprehension and communication. However, children with hearing loss face challenges in vocal tone recognition due to hearing impairment. In this study, five deaf children and two children with normal hearing were recruited to compare the differences in third and fourth tone recognition tasks between deaf and normal children. The results revealed that (1) some of the deaf children's brain regions that process vocal tones did not work properly due to hearing loss; (2) deaf children may rely on different neural networks when processing vocal tone information. (3) Deaf children process vocal tone information with hemispheric characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Performance indicators such as thrust and pressure of solid rocket motors (SRMs) are essential for rocket monitoring and design. However, measuring these signals requires high economic and time costs, and thrust data is difficult to measure accurately in practice. To address this challenging problem, we propose a deep learning-based cross-modal data prediction method that uses pressure data to predict the thrust data of SRMs. By building a novel RepVGG deep neural network architecture, it automatically learns features from the original data and predicts new time-series data with different modes. We verified the effectiveness of the proposed method by calculating the error between predicted and actual data, which was less than 3% as a percentage error between the predicted and actual data. The predicted data can supplement the SRM ground experiment data and reduce the cost of data measurement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective: This study aims to investigate the brain activity patterns of both deaf and hearing children during the processing of three different tones (first tone, second tone, and third tone) utilizing resting-state functional magnetic resonance imaging (fMRI). Furthermore, the study seeks to identify disparities in brain activation regions between deaf and hearing children while engaged in the tone processing task. Methods: The study enlisted a cohort of five deaf children and two hearing children as participants. Resting-state functional magnetic resonance imaging (fMRI) scans were conducted on these subjects utilizing an fMRI scanner. The acquired fMRI data underwent preprocessing and subsequent analysis to scrutinize the patterns of brain activity. Results: During tone recognition tasks, it becomes evident that deaf children and hearing children exhibit variations in brain activation regions. These discrepancies manifest across multiple areas, including the pre-central gyrus, superior temporal gyrus, middle occipital gyrus, supplementary motor area, superior parietal lobe, and interior frontal gyrus, among others. A comparative analysis suggests the possibility that the brains of deaf children demonstrate heightened plasticity and compensatory mechanisms. These findings significantly contribute to the comprehension of the neural underpinnings of tone processing, potentially enhancing intervention strategies. Moreover, they furnish a theoretical foundation for the language development and rehabilitation of deaf children.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article utilizes the "academic journals" on CNKI as the data source, employing tools such as CiteSpace, to conduct a bibliometric analysis of Chinese word meaning research. The goal is to identify the core authors, keyword hotspots, and keyword emergence in the research on Chinese word meaning system over the past thirty years. Combining with conventional literature surveys, it organizes the main contents of Chinese word meaning system research, which helps to discover the field of lexical system research, promotes the study of Chinese history, and contributes to the compilation and revision of Chinese dictionaries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The construction of the standard knowledge graph is to realize the reorganization of the internal knowledge of the traditional standard texts and achieve the purpose of innovating knowledge storage and the practice of knowledge supply. Technical requirement extraction is a typical relation extraction task in the construction process of a standard knowledge graph. However, the existing relation extraction models can not achieve ideal performance due to standard texts' unique organizational forms and special writing characteristics. Therefore, This paper proposes a Graph Convolutional Neural Network model based on Syntactic structure and Similarity features (SSGCN), integrating expert knowledge in syntactic pruning, dynamically interfering with the weight matrix by the pruning strategy based on attention, and making full use of supervised relation label semantic similarity features. The experiment in this paper compares the large language models (LLMs) such as GPT-3, and our model achieves better experimental performance than other relation extraction models on specialized datasets in the field, providing a related solution for standard technical requirement extraction tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In adaptive filtering, the maximum correntropy criterion (MCC) is a strategy that can effectively deal with impulse noise interference. The least mean square (LMS) algorithm of the graph signal processing (GSP) based on the MCC (GSP LMSMCC) shows good performance against impulse noise and Gaussian noise. Nevertheless, the GSP LMSMCC algorithm has two parameters that need to be analyzed, namely the step factor and kernel parameter, and a fixed step size cannot achieve better coordination between steady-state error and convergence rate. Therefore, based on the advantages of deep learning such as the ability to train on parameters, we extend the iterative formulation of the GSP LMSMCC algorithm to a multilayer network with the step size and kernel parameter trainable at each layer, where the input of each layer is the noisy graph signal and estimation of the graph signal, and the output is the next estimation of the graph signal after iteration. We train the network with the signal dataset and back propagation. Simulation results show that the method proposed not only avoids the discussion of the two related parameters, but also achieves a better compromise between steady-state error and convergence speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problems of low recognition accuracy of traditional classifiers due to multiple penetration measures and difficulty in identifying cone targets, an improved instance weighted naive Bayes (IWNB) classifier based on classification fuzzy regions is proposed. Firstly, the classifier constructs a naive Bayes classifier based on the training set data, and then makes full use of data information to improve the classification accuracy by the lazy instance weighting. Secondly, the training set data is used to test the performance of the classifier, and the classification fuzzy region is established by collecting the samples which are not classified correctly. Then the feature is screened by means of sorting the importance of them through the dispersion of the universe to achieve the effect of data dimensionality reduction. The samples in the test set are firstly recognized by the classifier. For the samples in the fuzzy region, the K-nearest neighbour classifier is employed for secondary target classification to get the final result. The experimental results show that the algorithm can effectively improve the classification effect of different targets in the ballistic midcourse and has good robustness, and also has high accuracy under the condition of small sample size, at the same time, has good performance under different data sets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Classification of reflected signals from surface sediments can improve our understanding of the properties of these sediments. In this paper, we propose a method for classifying reflection signals using deep learning techniques. The method uses a pulse compression algorithm to convert reflection signals into reflection compressed data, and then uses a one-Dimensional Convolutional Neural Network - Double Long Short-Term Memory (1DCNN-DLSTM) network to classify these data. The advantage of this method is that the pulse compression algorithm can improve the resolution of the stratigraphic reflection signal, thus better capturing the details of the signal. Meanwhile, 1DCNN can effectively extract the spatial features of reflection compression signals and capture the differences between different sediment types. DLSTM, on the other hand, can capture the temporal dynamic features of the signals, which is very advantageous for modeling temporal information. By fusing these two network structures, it is possible to categorize deep-sea surface sediments in a more comprehensive way. To verify the feasibility of the method, we conducted experiments using reflection data from surface sediments on the South China Sea continental slope. The experimental results show that the method is feasible in classifying the reflection signals from deep-sea surface sediments. We obtain high classification accuracy by training and testing different types of reflection compression data. This indicates that the method can effectively distinguish different types of deep-sea surface sediments, which helps us to better understand the deep-sea environment and related geological processes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
AI speech technology has the capability to generate language with diverse accents; however, notable disparities exist between synthesized sound and human speech. This study makes comparisons on the perception of six English onset consonants /n/ /l/ /v/ /r/ /f/ /m/ by Cantonese speakers of English, Mandarin speakers of English and native English speakers. The results indicate that Cantonese speakers, as compared to Mandarin speakers, exhibit a lower accuracy in perceiving the target sounds. Additionally, the listeners display better perception of /l/ compared to /v/ when the vowel is /u/, and better perception of /r/ compared to /f/ when the vowel is /o/. Similarly, for the vowel /e/, the listeners demonstrate a higher correctness rate for /m/ compared to /v/. Comparable patterns are observed among Mandarin and native English speakers. These findings offer valuable data for future investigations concerning speech identification and error correction models, particularly for Cantonese-speaking English learners. As the strategic significance of the Guangdong-Hong Kong-Macao Greater Bay Area (GBA) grows, research involving Cantonese native speakers becomes increasingly crucial. These discoveries can provide insights and recommendations for English language teaching tailored to Cantonese native speakers. Moreover, the experimental data can serve as pertinent resources for discerning and distinguishing between human and AI speech, contributing to the advancement and enhancement of AI in the future.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the process of the electric truck, its application at mining site has attracted increasing attention. Transporting goods with autonomous electric trucks can significantly improve the energy saving in which speed planning plays a key role. This work proposes a method to plan economic speed trajectories. This method first develops a novel state-space model to capture vehicle station, speed, acceleration, and jerk. Moreover, the speed planning problem is then formulated into a quartic problem. By comprehensively utilizing the information on road topography and the regenerative braking system, an economic speed profile can be obtained. The proposed method has been tested and validated in an autonomous electric truck in real mining site environments. Its performance has been evaluated both in terms of the quality of the computed result and with respect to the required computing time. The experiment results show that the energy consumption by using the proposed method is reduced and the regenerative braking system is fully leveraged.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
According to the national military standard, the long-term stored fuzes need to be inspected and sampled regularly. However, the circuit signal of the fuze is difficult to extract for detection, because the fuze electronic safety and arming device(ESAD) in the weapons inventory product is fully packaged. It is a common and effective method to realize equipment health monitoring by detecting vibration signal and analyzing its characteristics. However, the vibration signal of the electronic circuit system is extremely weak, and there is the problem of electromagnetic interference caused by high voltage circuit transient large current. Optical fiber sensor has the advantages of anti-electromagnetic interference, long transmission distance, and high sensitivity. Meanwhile, deep learning method has the advantage of automatically extracting data features and classifying them. This paper combines the advantages of optical fiber sensor and deep learning to diagnose the fuze ESAD. A sensing probe formed by a weak reflectivity FBG pair was used to detect the weak vibration signal during the operation of the fuze ESAD, and the deep learning model was constructed to realize the recognition of stop, start and five typical failure modes of ESAD, with the recognition accuracy of 99.3%. It can provide an effective solution for the diagnosis and evaluation of the high-voltage circuit state of ESAD.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
During the process of mineral grinding, mill load parameters (MLPs) determine the information of the mechanical signals. So, online MLPs detection is one of the key factors for improving the production efficiency of mineral processing plants. In this paper, the correlation between the multichannel mechanical signal and the different MLPs is explored by the power spectral density. Furthermore, the contribution rate of the multisource and multicomponent mechanical signals to the MLPs and mill load is measured on the basis of the correlation coefficient. Finally, a prediction model for MLPs can be constructed according to an adaptive decomposition strategy and the appropriate sub-signals.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
During the process of mineral grinding, the ball mill generates the mechanical signals containing rich information due to different positions, which include mill loads and mill load parameters (MLPs). Actually, as one of the key factors, online MLPs detection is usually exploited to realize the intellectualization and improve the production efficiency of mineral processing plants. In this paper, on the basis of the ensemble empirical mode decomposition technique, the multisource and multicomponent mechanical signals of an experimental ball mill under changing ball, material, and water load conditions are analyzed to obtain different physical sub-signals via the intrinsic mode function.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Partial discharge is not only a manifestation of electrical equipment insulation defects, but also aggravated the degree of equipment insulation deterioration. Therefore, effective partial discharge positioning is an important means of monitoring the status of power equipment. The existing partial discharge positioning methods are affected by the environment, and the positioning accuracy is unstable. This paper proposes a location method for UHF partial discharge based on time fingerprints. The method first uses UHF sensors to collect the arrival time of the partial discharge signal, and uses the arrival time difference between the sensors to form the time fingerprint of the monitoring point to obtain the measured Time fingerprint library of the area. When a partial discharge occurs, input the collected time fingerprints into the time fingerprint library, and use the neural network algorithm for matching to obtain the local discharge source location result. The experimental results show that the average positioning error of the positioning method proposed in this paper is 1.78m within the measured area of 900 square meters, which can lock about 60% of the positioning error within 2m.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, state-of-the-art unsupervised person re-identification(Re-ID) algorithms employ cluster methods to create pseudo labels by grouping unlabeled data to train the model, but it inevitably generates noisy pseudo labels, which limits the performance of unsupervised person Re-ID task. To tackle the above issue, we suggest the contrastive regularization loss function for the model to concentrate on learning the representation information of correct pseudo labels and ignore those of noisy pseudo labels to the maximum extent. Under the guidance of this method, the model training focuses more on the learning of high quality correct pseudo labels and eliminates the negative consequences of noisy pseudo labels on it, thus boosting the efficiency of the unsupervised person Re-ID mission. The proposal is adequately experimented on three person Re-ID benchmark datasets, and the results prove the usefulness of the method and outperform other state-of-the-art approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The aim of fusing infrared and visible images is to achieve high-quality images by enhancing textural details and obtaining complementary benefits. Since the details of the visible images are not obvious in low light, it is difficult for the current fusion methods to complete the complementary contours and texture details. With the intention of addressing the challenge of poor quality of infrared and visible light fusion images under low light conditions, a novel fusion method for infrared and visible light is presented in this study utilizing generative adversarial networks (referred to as UFIVL). Specifically, based on the existing densely connected decoder, pruning is introduced to reduce the network complexity without quality loss. A new overall optimization objective includes the adaptive limit contrast histogram equalization loss and the joint gradient loss are designed to deal with the defects of high contrast and brightness loss of the fused image, and the difficulty of capturing detailed features in low light scenes, respectively. Experimental results on LLVIP datasets show that compared with other state-of-the-art methods, the fused image generated by the proposed method has better subjective and objective performances.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An improved YOLOv8n network model is proposed to cope with key challenges in road damage detection, including feature extraction, multi-scale feature processing, fusion, and efficiency. By integrating the feature extraction structure RepVGG-SSE and the multi-branch downsampling into the backbone, the receptive field of our model is broadened so that it is capable of dealing with the diverse road damage scales. As part of our model, the Efficient-GFPN feature pyramid structure makes effective fusion of multi-scale features possible, and the performance for detecting objects of different sizes and complexities is enhanced greatly. Additionally, the lightweight convolution model GPConv is proposed to replace the 3x3 Conv in the C2f structure in the neck layer, so that both the parameters and computational complexity of the network model can be reduced greatly without compromising accuracy, so as to achieve the balance of efficiency and performance of the detection model in a reasonable way. The Improved YOLOv8n network was trained and validated on the RDD-2020 and UAPD datasets, and both the ablation and comparison experimental results demonstrate that the improved YOLOv8n model is both effective and efficient, and outperforms the state-of-the-art methods, suggesting it a promising solution to the real-world road damage detection tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional software use case testing usually writes test cases based on requirements or design documents, but the writing of test cases often cannot cover all possible situations and paths. Therefore, this article aims to study and explore the embedded software test case design method based on black box technology. This article adopts a new test case design method, which combines multiple test case design techniques such as equivalence class division, boundary value analysis, decision table testing and state transition testing. At the same time, this article also designs a set of comprehensive and effective test cases in combination with the actual embedded software testing requirements and scenarios. In the process of test case design, this article also considers the key issues of test case repeatability, verifiability and scalability. Through the actual test of the designed test cases, the correct rate of the research method in this paper is 89%, 93%, 91% and 90% in four experiments respectively. The test cases designed in this article can effectively cover the various functions and performance requirements of embedded software, and can find multiple defects and errors in the software at the same time. Compared with the traditional test case design method, the test cases designed in this article are more comprehensive and accurate, and the test cost is also effectively controlled.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Glioblastoma is a highly malignant tumor. In recent years, many scholars have conducted research on the automatic segmentation of preoperative primary glioblastoma magnetic resonance imaging (MRI) and achieved good results. The automatic segmentation of postoperative residual glioblastoma MRI plays a crucial role in treatment planning. However, there is still no research specifically focusing on the segmentation of postoperative residual glioblastoma MRI due to the limited data and the difficulty in establishing standards. In this study, a large amount of preoperative tumor data was utilized to pretrain the segmentation model. Postoperative residual tumor MRI data from 53 patients were collected and annotated by medical students specializing in radiology. The pre-trained segmentation model was then applied to segment the postoperative residual tumor data, obtaining preliminary segmentation results that roughly indicate the location of the residual tumor. Based on the similarity between the preliminary segmentation results and the residual tumor annotations, a simple and effective active learning strategy is designed to select the cases that need to be reannotated. The preliminary segmentation results, along with the postoperative residual tumor data, were fed into a new segmentation network to achieve precise segmentation of the residual tumor after surgery. Ultimately, the proposed network achieved a Dice coefficient of 0.871 for the segmentation of residual tumor data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As the Internet grows rapidly and smart devices become more popular, a huge amount of data is being generated and stored. These data have the characteristics of diversity, real-time, and massive, providing more data resources and creative inspiration for art and design. Artificial intelligence (AI) technology allows computers to deal with and analyze large-scale data and extract information and patterns that are useful. As important representatives of technology, the integration of big data (BD) and AI with art can bring new ideas and forms of expression to artistic creation. Intelligent environmental art design has emerged in this context. This paper established an intelligent environmental art model with user satisfaction prediction and design optimization index formula through the research related to the analysis of the important impact of BD and AI on intelligent environmental art design, with the help of the key technology exploration and application process analysis of the two respectively on the intelligent environmental art design. The following conclusion was drawn: six design project samples of different environmental arts were selected through simulation experiments, and the application of intelligent environmental art optimization models improved quality and efficiency compared to traditional methods, with a comprehensive average improvement of approximately 9.5% and 14.4%. This indicates that the intelligent environment art model based on the combination of BD and AI has good results in practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional art design is often plagued by shortcomings such as singularity, high technical barriers, and difficulty for audiences to fully understand. Therefore, establishing an interactive art design platform can greatly expand audiences' interest and hobbies in art. In order to popularize art works to audiences who lack professional knowledge as much as possible, multimedia technology is the main tool, as it can fully stimulate the audience's interest through the multidimensional display of sound, figures, and images. Therefore, in order to establish an interactive art design platform based on a multimedia perspective, this article used deep learning and information fusion technology for assistance. Moreover, this article also took a memorial hall as an example to observe the attractiveness of the new platform to tourists by installing a new social platform to count the number of tourists. The results indicated that the number of tourists to the memorial hall after installation increased from approximately 800-1000 to 950-1150, with a significant increase.So this study found that through deep learning and information fusion technology, interactive art design can be assisted, and art display can be carried out in a multidimensional manner.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Because of the wide distribution and complex channel conditions of transmission lines, faults caused by human impact often occur. Traditional anti-collision monitoring methods based on artificial and laser sensors cannot meet the actual business needs in accuracy and efficiency. This paper proposed an online anti-collision monitoring method for transmission lines based on cloud-fog cooperative calculation. This method mainly identifies man-made impact threats, including the following aspects: Firstly, through the camera module and microwave ranging module, the moving target is automatically detected, and the three-dimensional feature values of the identified target: H histogram, minimum rectangle of the sample, and microwave ranging result are collected by the fog calculation method, which reflects the color, shape and distance characteristics of the target respectively. Secondly, upload the collected feature values to the cloud for database building and neural network training. The initial values of the neural network are optimized by using particle swarm optimization algorithm. Thirdly, when the foreign object is close to the line, the cloud-fog cooperative calculation results can be used to determine whether it is a threat sample and give timely warning. The experiment results show that the recognition accuracy of the proposed method can reach 95.97%, which can provide reliable monitoring results for operation company.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The existing transmission line foreign matter monitoring methods mainly rely on manual on-site line inspection and background video monitoring, which consumes a lot of manpower and is difficult to ensure the real-time monitoring. In order to effectively prevent the occurrence of external damage accidents of transmission lines, an on-line monitoring system for foreign objects in transmission lines is developed. Through the target extraction and feature recognition of foreign objects around transmission lines, the threatening foreign objects can be warned in time. However, in the foreign object extraction link of the system, it is difficult to distinguish the actual target in the case of multi-target, resulting in large extraction error. To solve this problem, this paper proposes a transmission line foreign object target extraction method based on affinity propagation clustering. In the target extraction link, the pixels are affinity clustered based on coordinates, so as to accurately extract the actual target pixels and establish the foundation for subsequent pattern recognition. The experimental results show that the target extraction algorithm in this paper can improve the pixel extraction accuracy by about 30%, and then improve the foreign object recognition accuracy by 23.48%. The application effect is good, and effectively improves the effectiveness of the transmission line on-line monitoring system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
5G is an essential platform for realizing industrial upgrading. Various industries including manufacturing have established a wide range of 5G-enabled use cases. Many of the networks used in these use cases are private network or networks created specifically for use by business customers. Thus, the private 5G network is expected to carry significant future business traffic. This paper introduces China Telecom's private 5G network technology networking mode and provides architecture proposals for an end-to-end customized solution. Afterward, this paper considers critical aspects, particularly the key performance indicator in URLLC capabilities, and experimental performance results can reach the requirement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-document reading comprehension is an important and difficult task in natural language processing. To address the issue that ELECTRA pre-training model has length limitation and cannot be directly adapt to multi-document reading comprehension task, this paper proposes a novel model based on ELECTRA and document sliding windows. In the model multiple documents are split and merged through document sliding windows, new segmentation embedding is introduced, answer position in documents is modelled as a learning target, and ELECTRA is used for joint training in each window. After obtaining all prediction outcomes of each window, the results are comprehensively sorted to achieve the optimal answer. The experiments show that Rouge-L of this model reaches 51.28% on the multi-document reading comprehension dataset MS-MARCO, ranking the current best result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Regular exercise refers to the physical activities that people perform in their daily life, such as walking, running, swimming, cycling, etc. These activities are often performed to stay healthy, build up and relax. The purpose of regular exercise is to improve the individual's physical fitness and health level. The red movement is the political movement conducted in the Chinese mainland and other socialist countries. These movements were usually organized by the government or political parties and aimed to promote the socialist revolution and construction. The purpose of the Red movement is to realize the ideals and goals of socialism through collective action. In order to better inherit and carry forward the spirit of China, one must strengthen digital construction. The operational mechanism of the red sports culture system, from the perspective of virtual cultural reality, refers to the establishment of an advanced education model that conforms to the characteristics of the times and the needs of modern society through scientific analysis of the development and changing directions of the new era and situation. This article takes the research on the red sports culture system from the perspective of VR (Virtual Reality) as the theme. In order to deeply understand and explore the value of red culture in sports, this article applies the theory of red sports culture system and, with the help of VR technology, delves into the deep connotation of red sports culture. At the same time, the significance and prospects of red sports culture are elaborated, which also plays a positive role in promoting research on red sports culture.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The multibeam echo sounder (MBES) is widely used in oceanic exploration for various purposes such as submarine topographic surveying, geological investigations, and salvage operations for submarine wrecks. The multibeam data is prone to measurement errors due to various factors such as the intricate acoustic environment in the ocean, motion sensor errors, and sonar echo multipath interference. The task of automatically cleaning MBES datasets remains a challenging one. This study proposes an automatic cleansing strategy for MBES datasets based on the density clustering method to identify and reject anomalous data by continuous clustering of actual topographic data. The analysis initially focuses on the characteristics of various types of multibeam data outliers. Then, the study proposes an improved OPTICS method for detecting outliers in multibeam data. The RD computed through OPTICS serves to detect isolated outliers, while topological analysis is employed to identify structural outliers. Finally, the proposed algorithm is tested on MBES data collected by iBeam 8140 sonar sensor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Effective storage and retrieval for large-scale images is made possible based on the binary representation used in hashing. Variable hash code (HC) lengths reflect the swap between retrieval speed and accuracy necessary for creating a hashing framework in practical applications. Considering all this, the present hashing algorithms must train several frameworks for various HC lengths, decreasing hashing flexibility and increasing training time costs. Considering that several HCs of varying lengths may be used to describe a sample, there are helpful correlations that can enhance the efficiency of hashing techniques. Nevertheless, the hashing techniques do not entirely use these connections. We suggest a novel method, Asymmetric Supervised Deep Pairwise Hashing (ASDPH), for discriminative learning and to concurrently train HCs of various lengths to overcome the identified issues. Three pieces of information are obtained in this proposed ASDPH approach from HCs of multiple sizes. The samples' original characteristics and labels are used for hash learning. To validate the proposed module, we evaluated the method's performance on 16, 32, 64, and 128 bits for NUSWIDE, CIFAR-10, and MSCOCO datasets by achieving 2%, 7%, and 12% improved mean average precision than other state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a new learning paradigm preserving data privacy in distributed machine learning, federated learning becomes increasingly attractive,in which Homomorphic Encryption (HE) technology is utilized to encrypt the private intermediate data during computation. However, the homomorphic operation in HE impose significant computing overhead on federated learning. A hardware solution is important to speed up the training process in federated learning. As current two popular hardware accelerating platform, both GPU based and FPGA based accelerators are introduced into this area, but which is a better choice? We device and customize a FPGA implementation of the homomorphic encryption, as well as a GPU version. The experiment results demonstrate that GPU is more efficient for PHE computations in most case, because GPU version outperforms its counterpart on performance in terms of throughputs. However in some cases FPGA version achieves better performance than the GPU, with far lower clock frequency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To overcome the limitations of existing algorithms in both efficiency and accuracy, this paper presents an innovative approach for 3D object detection by leveraging a Bird's Eye View (BEV)-based algorithm. Firstly, we introduce a novel sorted matrix decomposition algorithm inspired by the fixed frustum projection and a Hight-compression-based prime extraction method to improve the efficiency of BEV pooing. This approach effectively mitigates the problem of redundant BEV pooling and results in faster process. Secondly, we propose a channel and spatial adaptive fusion algorithm to enhance location accuracy for distant objects. By intelligently fusing BEV-level LiDAR features and camera features, our method achieves precise detection results for objects located at great distances. Finally, we validate the effectiveness of our proposed algorithm in nuScenes dataset, demonstrating the efficiency and accuracy improvements attained by our proposed method. Our approach contributes to advancing the field of 3D object detection through BEV-based camera/LiDAR fusion and offering substantial gains in both efficiency and accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study explores how the voice quality is encoded in Zhangzhou Southern Min, a Sinitic dialect spoken in South Fujian province of mainland China. Empirically, three phonation types of breathy, creaky, and modal can be identified in its monosyllabic synchronic speech, whose distribution is discovered to be primarily conditioned by vowel quality, and further constrained by pitch contour and syllable coda type. The dynamic encoding is well attested in acoustic signals with varying quantifiable waveforms and spectral tilt patterns of H1-H2, H2-H3, and H1-H3. This study contributes a new methodology to demonstrate the phonation differences in natural speech. The realisation reflects the critical fact that phonation can be encoded in a way that is far more complicated than our general assumption and expectation. It is thus imperative to ground linguistic analysis in phonetic reality to uncover its nature and to improve our cognitive knowledge of human languages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In practical applications, factors such as pulse interference, sensor malfunctions, and other factors can lead to observation noise exhibiting a non-Gaussian distribution, which will impair the performance of the classical cubature Kalman filter (CKF) algorithm. The existing CKF algorithm exhibits some limitations in handling complex non-Gaussian noise, and its performance may be somewhat inadequate for such scenarios. In this letter, a modified generalized minimum error entropy criterion with fiducial point (GMEEFP) is studied to ensure that the error comes together to around zero, and a new CKF algorithm based on the GMEEFP criterion, called GMEEFP-CKF algorithm, is developed. To demonstrate the practicality of the GMEEFP-CKF algorithm, several simulations are performed, and it is demonstrated that the proposed GMEEFP-CKF algorithm outperforms the existing CKF algorithms with impulse noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A biochip is an array of biosensor spots arranged on a durable substrate that can be used to detect and differentiate between different biochemical analytes. This paper looks at the effectiveness of different supervised learning models to detect analytes using biochip spot patterns using the case-study of antibiotic pollution detection with models generated by extracting RGB values from a chip with sixteen spots. We evaluate the performance and accuracy of four types of model, Decision Trees, Random Forest, Na¨ıve Bayes and Neural Networks, by analysing metrics such as processing time, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Our analysis shows that different models have their different strengths and weaknesses for reading biochip data. Decision trees and Na¨ıve Bayes have the advantage of being explainable, so that biologists can understand which particular spot values lead to a given classification, although they are significantly less accurate that other methods. Random Forrest and Neural Networks have the advantage of high accuracy but act like a black box so the biologists have little clue as to what spot patterns lead to a particular classification or how much a reading relies on a small change in value or a small number of spots. This is important in order to assess the reliability of the chip reading and determine if further tests or required or if subsequent action can be taken and help chip-designers to determine if the design of their chips needed to be improved. We also found that Random Forrest classifiers have significantly better computational performance than Neural Networks, which makes them suitable to be used in interfaces that allow users to re-run the classifications to determine how changes in spot values can change the classification. Ultimately the accuracy and computational performance of Random Forrest classifiers would make them the preferred option (to be used with interfaces than can show and allow us to test different values) for biochips of the type described in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces an innovative method that combines Computer Vision and Deep Learning to extract headlines from a historical newspaper. Through the illustrations from historical newspapers, one of our goals is to use these extracted headlines to support digital humanities. The research goes beyond traditional image analysis by exploring how new digital technologies can facilitate the understanding of newspaper content by visualizing through time and place. The experimental results reveal that our recommended approaches, which involve Optical Character Recognition (OCR) with scraping and Deep Learning Object Detection models, can successfully obtain the required information for more advanced analytics. Due to the distinctive historical and humanities values, we chose "The Hongkong News" from the Hong Kong Early Tabloid Newspaper collection to illustrate the efficacy of our methodology. In addition, we constructed several visualization applications to demonstrate the viability of our suggested approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents several advances in indoor navigation, multi-agent systems, 3D perception, and 360° imaging for multi-robot human collaboration. First, we proposed a light-assisted dead reckoning (LiDR) system integrating visible light positioning (VLP) and pedestrian dead reckoning (PDR) for high-accuracy indoor localization within 0.7 m average error. VLP provides calibration for PDR drift utilizing light-emitting diode (LED) lights enabled by visible light communication technology. Second, developments in multi-agent reinforcement learning for robotics are explored, emphasizing path planning and collaboration in partially observable environments. The impact of field-of-view settings on communication-based coordination is investigated. For 3D perception, a cross-dimensional refinement methodology is introduced leveraging 2D image features to enhance geometrical details in real-time volumetric reconstruction. This joint 3D geometry and semantic prediction addresses limitations of current visual methods. Finally, solutions for calibration and pose estimation are proposed enabling 360° cameras for 3D reconstruction in the perception. Equirectangular projections are converted to cube maps, then aligned via rigid body transformations based on robot location. Together, these innovations in indoor navigation, multi-agent systems, 3D perception, and 360° imaging showcase critical technologies for emerging applications in localization, robotics, and immersive analytics. The methodologies presented provide comprehensive solutions to key challenges in these domains.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work provides an open-source method for extracting rel- evant information from scanned documents, such as bills, bank accounts, and invoices. The solution supports documents in 10 different languages and can extract data from these documents irrespective of their template or structure. We have pre-existing solutions based on OpenCV and deep learning technologies, but none provide a generic solution with high accu- racy and support for multiple languages. The proposed method identifies the language of the input document using a pre-trained fast-text model. The document is segmented into different text regions using Run Length Smoothing Algorithm (RLSA). The output of RLSA is passed through a custom pattern recognition algorithm to filter out the regions having the possibility of relevant data based on invoices or account statements. The filtered segments are passed through the Tesseract OCR module for raw text extraction. Based on the identified language of the document, extracted raw text is mapped against the language-specific entity libraries, and final key-value pairs are stored in JSON or CSV files. After being tested on more than 1000 documents, our proposed solution had an average accuracy of 90.27% for all language documents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.