PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11574, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a lightweight convolutional neural network, called ECDet, for real-time accurate object detection. In contrast to recent advances of lightweight networks that prefer to use pointwise convolution for changing the number of feature map’s channel, ECDet makes an effort to design equal channel block for constructing the whole backbone network architecture. Meanwhile, we deploy depth-wise convolution to compress the feature pyramid network (FPN) detection head. The experiments show that ECDet only has 3.19 M model size and needs only 3.48B FLOPs with a 416×416 input image. Our method has a 5% improvement in accuracy compared to YOLO Nano, and it requires less computation. The comprehensive experiments demonstrate that our model achieves promising results in terms of available speed and accuracy trade-off on PASCAL VOC 2007 datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural networks are currently popular multi-layer neural networks. They differ from traditional neural networks in some ways. They are mainly reflected in the introduction of three new concepts: weight sharing, receptive fields, and pooling. In this paper, for the handwritten digit character data set, a deep neural network is used to construct a LeNet network for training and recognition, and the data is enhanced differently to study and compare the recognition accuracy of the final network structure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The consensus mechanism, as the basic rules on Blockchain, along with the cryptography and data storage structures, are called three key technologies of Blockchain. In recent years, the application areas of blockchain technology have been further expanded due to the development of smart contracts. The personalized design and improvement on consensus mechanisms, which make them more applicable to actual industries, have become a research focus in the field of blockchain application. The theoretical background of this paper is a Smart Industry based on Alliance Blockchain, in which smart contracts are signed among network nodes to automate transaction processes. In this paper, a new consensus mechanism named Delegate Proof of Job-Relevance(DPoJ) is proposed, the activity and professionalism of nodes are emphasized, the application areas of smart contracts are innovatively classified, the JobRelevance of nodes is calculated as a parameter in equity calculation in the process of consensus. Next, the influence of Job-Relevance on the growth rate of nodes’ equities under DPoJ consensus is analyzed from the perspective of game theory and mechanism design. At last, it is proved that in most cases, rational nodes are more inclined to increase their equities in the process of consensus by improving their Job-Relevance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image processing methods based on feature matching are generally used for detecting and recognizing pointer meters in substation. Under the influence of environmental factors, such methods run into problems with low detection accuracy and reading success rate, when deployed in substation inspection robots. To improve the situation, a new method based on CNN (Convolutional Neural Network) for detecting and reading meters is proposed in this paper, through analyzing existing meter recognition process in robot’s vision subsystem. The new method detects and segments pointer meters using YOLOv3 (You Only Look Once) and U-Net separately, classifies scale values using AlexNet, and finally estimates readings though post-processing based on CNN models. The field experiment shows that, the proposed method has improved the reading success rate by 45% comparing to that of the conventional methods, while keeping the deviation within the permissible limits.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cross-domain text classification has broad application prospects in the field of data mining. Since transfer learning can help target domain data to achieve the sharing and transfer of semantic information with the help of existing knowledge domains, transfer learning are generally used to achieve cross-domain text processing. Based on this, we propose a cross-domain text classification algorithm -MTrA. The algorithm is based on TrAdaBoost, taking into account the distribution differences between the source domain and the target domain. It uses the Maximum Mean Discrepancy(MMD) as the initial weight parameter of the two domain. MTrA adds a weight backfill factor that considers the accuracy of the source domain classification and balances the weight update method of the source domain data. Through the verification in the dataset 20 Newsgroups, Compared with the traditional TrAdaBoost algorithm, it improves the classification accuracy by 9.4% on average. it proves the effectiveness and advantages of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a workflow and a deep learning algorithm for recognizing Quadrature amplitude modulation signal(QAM), this design adopts a convolutional neural network (CNN) and Extreme Learning Machine (ELM) as the core,leverage the powerful feature extraction of CNN and fast classification learning of ELM. The spectrogram image features of the signal obtained by short-time Fourier transform (STFT) are input to the CNN-ELM hybrid model, the modulation mode of the QAM signal is finally recognized by ELM. This algorithm surmounts the shortcomings of traditional methods well, Simulation results also verify the superiority of the proposed system whose classification accuracy is beyond 99.86%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In computed tomography (CT), segmentation of organs-at-risk (OARs) is a key task in formulating the radiation therapy (RT) plan. However, it takes a lot of time to delineate OARs slice by slice in CT scans. The proposal of deep convolutional neural networks makes it possible to effectively segment medical images automatically. In this work, we propose an improved 2D U-Net to segment multiple OARs, aiming to increase accuracy while reducing complexity. Our method replaces vanilla convolutions with Octave Convolution (OctConv) units to reduce memory use and computation cost without accuracy sacrifice. We further plug a ‘Selective Kernel’ (SK) block after the encoder to capture multi-scale information and adaptively recalibrate the learned feature maps with attention mechanism. An in-house dataset is used to evaluate our method, where four chest organs are involved: left lung, right lung, heart, and spinal cord. Compared with the naive U-Net, the proposed method can improve Dice by up to nearly 3% and has fewer float-point operations (FLOPs).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Micro-blog is a platform for users to get information and convey their own ideas. In recent years, the emotional analysis of micro-blog has gradually become a hot topic. The publication of micro blog not only includes text, but also emoticons are a part that cannot be ignored. Traditional research methods ignore the importance of emoticons to the emotional polarity of text when preprocessing the micro blog. This paper proposes a research method of text emotion analysis based on the fusion of emoticons. By micro-blog to crawl the data preprocessing, selected text in the emoticons, using emotional dictionary gives corresponding weights and calculate the score, then transform text into the corresponding word vector sequence, using Bidirectional Gated Recurrent Unit network context information text emotion tendency, finally selects the Conditional Random Field polarity judgment of text. The experimental results show that the accuracy of the proposed method is up to 89%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Surveillance occupies an important position in intelligent airport development. However, surveillance video storage puts much pressure on the airport. Apron surveillance video is a common type of airport surveillance video. In this paper, we propose a video compression approach for apron surveillance in bad weather. In this approach, the video storage structure has changed and is divided into three layers: a static layer, an object layer and an environment layer. Such a storage structure can reduce the video storage space. The approach also stores semantic information that can be used in subsequent research. To validate our approach, we conduct experiments on different apron surveillance videos. Experimental results show that the proposed approach can outperform the widely used video coding standards H.264 and MPEG-4 in terms of the SSIM index and PSNR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single image haze removal has been a challenging problem due to its ill-posed nature. In this paper, we propose a simple but effective image prior – the transmittance fusion of the dark channel prior and color attenuation prior. By constructing the weight factor of transmittance, the respective advantages of both priors can be well presented. We combine the defogging effectiveness of the dark channel algorithm with the high applicability of the color attenuation prior. Brightness information makes the algorithm more adaptive in the dehazing process. Observed through a large amount of experimental data, the method in this paper can enhance the dehazing image effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A key to dictionary learning is to attain a robust dictionary, which enables difference between test samples and training samples of the same class to be alleviated. Owing to this factor, the dictionary can bring proper representations of test samples and produce better classification results for them. For face recognition, because of varying facial appearance caused by changeable illuminations, poses and facial expressions, a robust dictionary is definitely preferred. In this paper, we propose a robust dictionary learning method for face recognition. Robustness is attained in a two-fold way. First, auxiliary faces are produced via original face images. Second, the scheme to attain the dictionary under the condition that label coefficients can deviate from sample coefficients is designed. Auxiliary faces express possible variations of faces. Moreover, it seems that difference between auxiliary faces and original training samples of the same class somewhat reflects difference between test samples and training samples, thus use of auxiliary faces is beneficial to improve robustness of the method. The scheme to attain the dictionary further enhances robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As one of the most critical tasks of natural language processing (NLP), emotion classification has a wide range of applications in many fields. However, restricted by corpus, semantic ambiguity, and other constraints, researchers in emotion classification face many difficulties, and the accuracy of multi-label emotion classification is not ideal. In this paper, to improve the accuracy of multi-label emotion classification, especially when semantic ambiguity occurs, we proposed a fusion model for text based on self-attention and topic clustering. We use the Pre-trained BERT to extract the hidden emotional representations of the sentence, and use the improved LDA topic model to cluster the topics of different levels of text. Then we fuse the hidden representations of the sentence and use a classification neural network to calculate the multi-label emotional intensity of the sentence. After testing on the Chinese emotion corpus Ren_CECPs corpus, extensive experimental results demonstrate that our model outperforms several strong baselines and related works. The F1-score of our model reaches 0.484, which is 0.064 higher than the best results in similar studies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Poetry and couplets, as a valuable part of human cultural heritage, carry traditional Chinese culture. Auto-generation couplet and poetry writing are challenges for NLP. This paper proposed a new multi-task neural network model for the automatic generation of poetry and couplets. The model used seq2seq encoding and decoding structure, which combined attention mechanism, self-attention mechanism and multi-task learning parameter sharing. The encoding part used two BiLSTM networks to learn the similar characteristics of ancient poems and couplets, one for encoding keywords and the other for encoding generated poems or couplet sentences. The decoding parameters were not shared. It consisted of two LSTM networks which decode the output of ancient poems and couplets, respectively, in order to preserve the different semantic and grammatical features of ancient poems and couplets. Poetry and couplets have many similar characteristics, and multi-task learning can learn more features through related tasks, making the model more generalized. Therefore, we used multi-task model to generate poems and couplets, which is significantly better than single-task model. Also our model introduced a self-attention mechanism to learn the dependency and internal structure of words in sentences. Finally, the effectiveness of the method was verified by automatic and manual evaluations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The dense deployment of the small base station (BS) in fifth-generation commination system can satisfy the user demand on high data rate transmission. On the other hand, such a scenario also increases the complexity of mobility management. In this paper, we developed a Q-learning framework exploiting user radio condition, that is, reference signal receiving power (RSRP), signal to inference and noise ratio (SINR) and transmission distance to learn the optimal policy for handover triggering. The objective of the proposed approach is to increase the mobility robustness of user in ultra-dense networks (UDNs) by minimizing redundant handover and handover failure ratio. Simulation results show that our proposed triggering mechanism efficiency suppresses ping-pong handover effect while maintaining handover failure at an acceptable level. Besides, the proposed triggering mechanism can trigger the handover process directly without HOM and TTT. The respond speed of triggering mechanism can thus be increased.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the application of internet of things technology, fog computing has provided computing and storage services near the bottom network. It can solve the problem of rapid response and bandwidth consumption of delay sensitive applications at the edge of local network as a highly virtualized platform. However, in the case of large scale service requests, if the job scheduling problem cannot be effectively solved, it will increase service delay, reduce resource utilization and user satisfaction. In this paper, we have improved the basic ant colony optimization (ACO) and developed a new job scheduling strategy improved ant colony optimization named IACO. IACO can assign to the resource with the lowest total cost of all selected tasks, which is always determined by calculating the total cost value of the task on the resource. We have finished some experiments by imitating the foraging process of multiple ants and repeated it iteratively. The optimal task scheduling sequence can be obtained through the different pheromone concentration left by ants. Experimental results show that IACO scheduling algorithm is better than ACO in the total cost, completion time and economic cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vessel detection has been widely used in Marine Surveillance to automatically detect potential threats over a huge oceanic area. However, the uncertainty of ship direction and the interference of environment factors, such as sea waves and cloud, will greatly reduce the detection accuracy. In order to solve the above problems and provide robustness to environmental conditions, we combine the property of polar-logarithmic coordinate and proposed our method named “Dual-operator log-pol top-hat filter (DOLPTH)” to make better use of the difference information between the vessel and the background. Different situations are designed to test the performance of our algorithm compared with other algorithms. The experimental results show that in both cases, DOLPTH can maintain high accuracy and has good detection performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The process of infrared images via computer-based algorithms for better application is a frontier field integrating physical technology with computer science. One of the key techniques in infrared image processing is the detection of infrared targets. This technique is extensively applied in security and defense systems and search and tracking systems. However, due to their small size, dim light and lack of texture, the detection of infrared targets is a technical problem. One strategy to address this problem is to transform the detection work into a non-convex optimization problem of recovering a low-rank matrix (background) and a sparse matrix (target) from a patch-image matrix (original image) based on IPI (infrared patch-image) model. When targets are clear and recognizable, the APG (accelerated proximal gradient) algorithm works effectively to solve it. However, when targets become much dimmer and are screened by the intricate texture of background, the experimental detection results degrade dramatically. In order to solve this problem, a novel method via IRNN (iteratively reweighted nuclear norm) is proposed in this paper. Experimental results show that under different complicated backgrounds, targets with higher SCRG (signal-to-clutter ratio gain) values and BSF (background suppression factor) values can be acquired through IRNN algorithm compared with the APG algorithm, which means that our method performs better.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many studies have been devoted to sports video summarization and content-based video search. However, the semantic importance of caption box or scorebox (SB) appearing in broadcast sports videos has been almost neglected as SB holds key elements for conducting these research tasks. SB localization is challenging as there exists a huge variety of SBs, and almost every broadcast sports video contains a different SB with unique features such as geometry, font, colors, location, and texture. Every time a new sports series emerges, it contains a new type of scorebox that never resembles any other sports series. One can say that, SBs are evolving with unexpected features and novel challenges. Thus, traditional learning-based methods alone are not suitable for detection. This paper proposes a robust method for detecting and localizing SBs appearing in broadcast sports videos. It automatically learns the template of SB and further utilizes the template, as the SB may translate from the usual location and may disappear for a short time. We performed comprehensive experiments on a real-life dataset SP-1 and comparison with state-of-the-art methods shows that the proposed method achieves better performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an improved genetic algorithm to study the path planning of mobile robot in grid environment. First, rasterize the motion plane of the robot, use serial number coding method and design a heuristic median insertion method to establish the initial population, ensure that the planned initial paths are all feasible paths, thereby speeding up the convergence of the algorithm. Then assign different weights to the path length, path security, and path energy consumption and combine them to generate a multi-objective fitness function. Finally, improve some genetic operations to maintain the population diversity of the algorithm in the later period, and avoid the algorithm from falling into avoid premature. Simulation experiments show that the proposed algorithm can quickly plan a feasible path in the grid environment. The path is not only shorter in length, but also more stable. At the same time, the running speed of the algorithm is 45.7% higher than other improved algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Path planning plays an essential role in mobile robot system. Aiming at the low efficiency of A* algorithm in the process of robot path finding, this paper proposes two improvement methods to solve the problem of inefficient searching of A* algorithm. The first method is to improve the distance calculation formula in A* algorithm, and the second method is to perform the evaluation function in A* algorithm. Weighted processing. Experimental data shows that the comprehensively improved A* algorithm dramatically reduces the number of useless access nodes and speeds up the time-consuming search process, which reduces the search time by about 12~14% compared to the standard A* algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In our paper, a game framework based on motion capture device is designed that can meet virtual and real interaction for people and its surrounding. Firstly, we design a method to reduce the noise from the device of motion capture using Kalman filter method. From our experiments, we find our method can calculate the location of marker and improve data accuracy. In order to use in game, a collision detecting method is used based on the shape of object, which can prevent object penetration well during game entertainment. In the end, the game framework and system are implemented and designed with our capture device. The experiment results show that our method obtain a good interaction in different scenes with human and virtual objects or real objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation with image RGB information is significantly useful for intelligent perception of robotics. However, semantic segmentation with only RGB information does not perform well for objects with the same color during grasping manipulation. This paper proposes a new semantic segmentation scheme based on the fusion of RGB and heights transformed from depth information, which is not simple fusion of RGB-D method. It modifies the height information so that different objects of the same color can be distinguished in height. It outperforms the classical RGB segmentation scheme at improving speed and 7.42% higher at the final performance of semantic segmentation of manipulator grasping scene (contains objects with the same color). Because of the need of RGB-D information, this paper proposes a method of self-collecting and self-labeling data of manipulator grasping scene, which reduces the cost of manpower by making full use of the highly automated equipment and the characteristics of the scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In views of the problems of particle contour overlap and unclear texture detection of coal dust explosion, this paper proposed a method based on improved deep learning vgg-16 convolutional neural network model to obtain the feature information of particle image. Based on the vgg-16 network model, the SELayer is added after sampling under the first two convolutional layers to compress and extract the deep features of the particle image. The original SoftMax classifier was replaced by a binary classifier to optimize the model parameter structure. The weight parameters of convolution layer and pooling layer in the pre-training model were shared by micro-migration learning to speed up the operation. Samples were randomly selected from the constructed coal dust image as training set and test set to test the performance indexes of the model. The experimental results show that the proposed method has 2% promoted of recognition accuracy to the conventional methods, and achieved a lower loss value, which can meet the detection requirements of coal dust particle image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For visual object tracking in the block motion blur deformation background interference and other issues, put forward in combination with characteris-tics of multiple characteristic scale estimate of background perception related filter tracking algorithm, it through in the object area will be the basis of original algorithm and expand, increase extraction Histogram of Oriented Gradient (HOG) and the characteristics of Color Names (CN) to learn more background information filter, improve object localization accuracy. On this basis, the binary matrix is reasonably constructed to improve the effective response of the filter to the object region on the premise of effectively sup-pressing the background, and the size of the object is estimated by using the training scale filter. Experimental and simulation results show that the pro-posed algorithm can solve the problems such as background interference of occlusive motion blur deformation in tracking. In OTB-100 datasets, the ac-curacy and success rate of proposed algorithm are improved by 1.3% and 1.4% respectively. In the background interference sequence of occluding mo-tion blur deformation of OTB-100 datasets, the accuracy of the proposed al-gorithm is 1.9%, 4.0%, 4.3% and 3.4% higher than that of the Backline-Aware Correlation Filters (BACF) algorithm. The FPS can reach 13.7, this result can show that it has high theoretical value and engineering value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A visual simultaneous localization and mapping(vSLAM) system in a dynamic environment are affected by the wrong associated data caused by the moving targets, which causes error in the pose estimation of the mobile robot. Combining semantic segmentation information to remove dynamic feature points is an effective method to improve the accuracy of the SLAM system. However, the existing semantic visual SLAM usually adopts the fully supervised methods to segment the dynamic scenes. The accuracy of this method relies on a large number of training data sets with annotation information, which limits the application of SLAM system. To address this issue, a visual semantic SLAM system (vsSLAM) that applies weakly supervised semantic segmentation to dynamic scenes is proposed to broaden the application range of the system. Firstly, the system extracts the feature points of input image and checks the moving consistency, and then segments the dynamic target with the weakly supervised methods. Secondly, the semantic segmentation results are used to remove the dynamic feature points in the image. Finally, the system uses stable feature points for pose estimation. this paper also uses the Automatic Color Equalization algorithm to pre-process the input image, which improves the accuracy of weakly supervised semantic segmentation. Experiments were performed on the public TUM data sets and lab environment. The results show that the accuracy of the SLAM system based on the weakly supervised network adopted in our work is better than the traditional ORB-SLAM2 system, and also higher than the SLAM system of the weakly supervised network DSRG. The accuracy is close to the fully supervised semantic SLAM system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, there is a lack of voice samples in the speech emotion recognition field, which leads to poor recognition rate and over-fitting of data. Inspire by this, we propose speech emotion recognition based on data enhancement. The Berlin Emotional Corpus is enhanced from two directions: Time Domain and Frequency Domain. The samples was extracted and trained. Research and analyze the recognition rate of two classifiers: K-Nearest Neighbor and Support Vector Machine. Experiments show that the effect after data enhancement is better.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Seafood growing in the bottom of sea is difficult to harvest. In this work, an auto visual servoing manipulator is proposed to automatically collect the seafood scattered on the sea floor. The realization of the self-grasping method is due to the close feedback control of the visual information acquired by the vision system. YOLO framework is used as the detection unit to recognize the object from the eye-in-hand module. Then, the center coordinate of the object will be transformed to a 3D position by camera calibration matrix. Movement of each joint of the manipulator are then calculated. The proposed manipulator features low-weight and low-cost and can be mounted at any underwater unmanned vehicles, helping to harvest quickly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Substantial progress has been made recently in training context-aware language models. CLOTH is a human created cloze dataset, which can better evaluate machine reading comprehension. Although the author of CLOTH has done many experiments on BERT and context2wec, it is still worth studying the performance of other models. We applied the CLOTH dataset to other models and evaluated their performance based on different model mechanisms. The results showed that ALBERT performed well on the cloze task. The accuracy of ALBERT is 92.24%, which is 6.34% higher than the human performance. In addition, we introduce adversarial training into the model. Experiments show that adversarial training has significant effects in improving the robustness and accuracy of the model. On the BERT-large model, the accuracy rate is up to 0.15% after using adversarial training.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a tool to express common semantics of objects, language can be used to describe the attributes and locations of objects within the scope of human vision. Searching for the location of an object in the field of vision through natural language is an important capability of the human. Proposing a mechanism to learn this ability of human is a major challenge for computer vision. Most existing object localization methods usually use strong supervised information of the training set to train the model. However, these models lack interpretability and require expensive labels which are difficult to obtain. Facing these challenges, we propose a new method for locating object by natural language descriptions for fine-grained image. Firstly, we propose a model that can learn the semantically relevant parts between fine-grained images and languages, and achieve ideal localization accuracy without using strong supervisory signal. In addition, we have improved the contrast loss function to make natural language descriptions better match target regions of fine-grained images.The multi-scale fusion techniques are utilized to improve the ability of capturing details on fine-grained images. Comprehensive experiments demonstrate that the proposed method achieves ideal localization results on the CUB200-2011 dataset. And the proposed model has strong zero-shot learning ability on untrained data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The practical value of unmanned aerial vehicles (UAVs) can be improved by using image mosaic to fill the gaps due to insufficient coverage of UAV aerial images. Research on UAV image mosaic is majorly focused on improving the operation speed and mosaic accuracy. To this end, this study first analyzes the UAV aerial image characteristics and causes of ghosting and blurring of images. A Quick-Scale-Invariant Feature Transform (Quick-SIFT) operator is constructed to reduce computer time by reducing the octaves and levels of the Gaussian pyramid and selecting the third level image in each octave for feature point extraction. Subsequently, the As-Natural-As-Possible (AANAP) algorithm is used for image registration and projection transformation. The difference region between adjacent sequence images is calculated by the frame difference method. The difference region thus obtained is subjected to the region growing algorithm to segment the moving objects by single sampling, the other regions are processed by linear weighted fusion, thereby eliminating ghosting effectively. Lastly, A special aerial images dataset for image mosaic is constructed, based on which the comparable experiments with state-of-art image mosaic methods are conducted. The experimental results indicate that the matching time with the proposed algorithm is improved by at least 78% compared to SIFT, thus realizing fast image mosaic processing. The proposed algorithm also effectively eliminates the motion ghosting in images and achieves stable, high quality results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the process of detection of tunnel voids by Ground Penetrating Radar(GPR), the shape of void disease is complex, data analysis depends on artificial recognition and other issues. This paper constructs a convolution neural network which integrates the mechanism of guiding anchoring to detect tunnel voids. The network consists of four parts: feature extraction, recommendation box generation of anchor area, pooling of interested area and classification regression: feature extraction network to extract disease features of the rich samples; guide anchor area recommendation network to join the GIoU evaluation standard, and predict the anchor shape through learning; the feature maps obtained are clustered after the region of interest. Finally, the disease features are classified and the boundary box regression is carried out. Compared with the existing target detection algorithm, the experimental results show that the improved network achieves 92.61% classification accuracy, and the trained model has good generalization ability and robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Based on the Internet of Things, intelligent logistics warehouses are flourishing. As an automatic transport vehicle, AGV (Automated Guided Vehicle) plays an important role in improving the efficiency of production logistics system. Since the environment of the intelligent warehouse is dynamic and partially unknown, AGV need to have self-learning and adaptive capabilities to cope with changes in the environment. The traditional path planning algorithm is difficult to operate in unknown environment without prior map knowledge. To solve this, we propose an end-to-end AGV path planning method to make AGV obtain the optimal action from the original visual image and LIDAR information. In addition, a deep reinforcement learning method is employed to train AGV, combining priority experience replay mechanism and double deep Q network with the dueling architecture, to make AGV has a certain generalization ability for unknown environment and adaptability for a dynamic environment. Finally, our simulation experiments show the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The traditional collaborative filtering recommendation algorithm has the problem of data sparsity and expansibility. Aiming at this problem, and improved bisecting k-means collaborative filtering algorithm proposed.The algorithm first removes unrated items in the rating data matrix based on the Weighted Slope One algorithm preprocessing to reduce its sparsity. Then the preprocessed rating data is clustered based on the bisecting K-means algorithm, which reduces the nearest neighbor search space of the target user by assembling similar objects, thereby improving the algorithm’s expansibility. Finally, use the recommendation algorithm to generate the final result.Experimental results show that the improved bisecting k-means algorithm improves the recommendation effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are a lot of fractures in the fault zone, but it is difficult to simulate the 3D fracture zone. In this paper, a 3D simulation method of fracture zone based on fractal theory is presented. Considering the self-similarity of cracks, the fractal dimension is obtained by processing the pictures, which is obtained from the similarity simulation material experiment, while the maximum fracture length and opening of related fault zones are predicted. Then the approximate height range of the fracture zone is calculated by the empirical formula, and a fracture-based model under wire-frame is designed to simulate the fracture combined with fracture-related parameters. The experiment results show that the maximum crack parameters obtained by the prediction algorithm of crack parameters are less error than those obtained by similar simulation material tests, and the 3D model of fracture zone has a more realistic simulation effect, which provides a feasible method for 3D simulation of fracture distribution in the overburden fracture zone.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spacecraft cluster flight, which is a novel multi-spacecraft flight mode, has become an important research direction of distributed space system for the future, especially has unique advantage in continuous detection on area. Aiming at the satellite group formed by several satellite clusters, an orbit design method satisfying the long-term continuous and stable area-coverage is proposed. The basic dynamic model is built under the influence of J2 perturbation, the calculation method of "node period" and "node day" is put forward and then the long-term continuous stable group orbit initialization design conditions are built. Validation of orbit design is made under the typical scenario and through the STK simulation, the simulation results show that this design method can realize the fixed time revisiting and long-term stability of coverage of the target area.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.