PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083601 (2018) https://doi.org/10.1117/12.2516621
This PDF file contains the front matter associated with SPIE Proceedings Volume 10836, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083602 (2018) https://doi.org/10.1117/12.2326820
This paper mainly studies the berthing ship target detection method of overhead-view image under the condition of a few training samples. Because of the limited training samples, we use the complete data set unrelated to the target detection task for pre-training to obtain a classification model, then expand the data according to a certain percentage and finally complete the training of the target detection model. This paper uses the idea of segmentation to solve the target detection problem. We adjusted the configuration of the region proposal network including the size of anchor frame and the threshold of non-maximum suppression according to the target morphology, so that the network generates a more accurate region of interest. Finally, the confidence levels, bounding-boxes and image masks of multi-objective generated concurrently. We performed experiments on self-made data sets which labeled from NWPU VHR-10 and produced good results, which proved the feasibility of this method in target detection of berthing ship target.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083603 (2018) https://doi.org/10.1117/12.2326993
Person re-identification is a task that aims to recognize a person of interest between two non-overlapping cameras. However, because of different camera angles and quality of the camera, person re-identification is becoming challenging task until now. Many existing methods have been proposed to solve these kinds of problems. Metric learning is already becoming a hot research topic especially for the effectiveness in matching person images problem. For metric learning process, most of them utilize all sample pairs without considering the ratio between distance of each pair. However, not all pairs are useful for the training process. We consider that there are some outliers that can not give good effect in the learning process. For example, a distance too far or too close can influence other samples and mislead the training process, resulting in longer training process time, less accuracy, and other bad effects. In this paper, we propose a new method based on eliminating the outliers, which is called Outlier Sample Elimination. Our method divides negative pairs into three groups using some thresholds to find the proper sample for learning process. During the learning process, we only use the best sample pairs to take into account in our loss function. We eliminate other samples that are considered as outliers.We try to evaluate our proposed method using the challenging VIPeR dataset. Our experiment shows that our method achieves a competitive performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083604 (2018) https://doi.org/10.1117/12.2326997
Multi-exposure image fusion technique is an important approach to obtain a composite image for High Dynamic Range. The key point of multi-exposure image fusion is to develop an effective feature measurement to evaluate the exposure degree of source images. This paper proposes a novel image fusion method for multi-exposure images with sparsity blur feature. In our algorithm, via the sparse representation and image decomposition, the sparsity blur descriptor is used to measure the exposure level of source image patches to obtain an initial decision map, and then the decision map is refined with gradient domain guided filtering. Experimental results demonstrate that the proposed method can be competitive with or even outperform the state-of-the-art fusion methods in terms of both subjective visual perception and objective evaluation metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083605 (2018) https://doi.org/10.1117/12.2500142
Coin grading means evaluating coins’ physical condition. In this paper, we proposed a process for coin grading by quantification of "unexpected elements" such as scratches and dirty marks. We detect respectively significant and tiny “unexpected elements” with the help of handcrafted filters and Deep Learning techniques. The result of our process, which is close to the manual expert one, is considered as a useful help for numismatists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083606 (2018) https://doi.org/10.1117/12.2502098
Motion blur is caused by the relative motions between the camera and the objects. Most of the existing deblurring algorithms focus on the uniform motion blur for the entire image. However, this assumption generally does not hold in the real world. This means that the task of deblurring needs to involve segmentation of the image into regions with different blurs. In this paper, we present an algorithm on multiscale spatially-varying blur detection and extraction. Firstly, the singular value decom-position (SVD) is performed in multiscale images. For each scale, a robust singular value feature is selected as the local blur characteristic. Then, a more accurate blur distribution map is calculated by normalization and fusion for each pixel. Finally, the input image is segmented into blur/clear regions combined with morphological filtering automatically. The algorithm is tested on the local motion blurred natural image datasets, the results show our method is highly consistent with the human subjective segmentation results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083607 (2018) https://doi.org/10.1117/12.2502130
In this work, we tackle the problem of nighttime image degradation caused by haze and weak illumination. We propose an improved atmospheric scattering model which can achieve single image haze removal and enhancement simultaneously. The input image firstly is decomposed into the structure layer and the texture layer based on the image total variation model. The structure layer contains the main scenes of original image including the haze and brightness, and the texture layer contains the detail and noise. In order to avoid the influence of the glow and multiple light sources on the estimation of atmospheric map, the glow layer then is stripped from the structure layer and the background layer can be calculated. Followed by performing the estimation method of atmospheric map and transmission we proposed, the structure layer can be restored according to the atmospheric scattering model. We finally fuse the restored background layer and optimized texture to obtain the haze-free and enhanced image. Experimental results demonstrate the efficacy of our proposed model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083608 (2018) https://doi.org/10.1117/12.2503883
In this paper, we propose a novel classification algorithm based on convolutional neural networks (CNNs) to diagnose the severity of diabetic retinopathy (DR). We adopt a series of preprocessing operations to improve the quality of dataset. In addition, data augmentation is implemented on the training data to tackle the problem of imbalanced dataset. We design a CNNs model named DR-Net with a new Adaptive Cross-Entropy Loss, which emphasizes the difference of the penalty when training data are misclassified into different intervals. We train DR-Net on the publicly available Kaggle dataset. Experimental results show that our DR-Net achieves an accuracy of 0.821 and a kappa score of 0.663 on 3338 testing images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083609 (2018) https://doi.org/10.1117/12.2504488
The main purpose of this paper is to develop an automatic print head alignment algorithm for a material jetting-based color 3D additive manufacturing system. This printing system has six print heads, respectively, filled with six different photocurable resin (C, M, Y, K, W) and support. Start of all, the printing system with initial parameters prints a standard platform, which is used to confirm whether the print head is aligned well or not. Next, a scanner is used to gain the printed image. An image processing algorithm is used to get the characteristic points of the capture image and calculate the compensation parameters, which will be imported to the printing system and make the system aligned well. This paper proposed an automatic smart print head alignment algorithm, which can be used on multiple print head based- 3D additive manufacturing system. Replacing of human alignment, this technology can save more time and reduce the error. It also can automatically work and oversee the system in real-time, which will improve the quality and the color performance of the printed products.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360A (2018) https://doi.org/10.1117/12.2504606
Automatic road crack detection using image/video data plays a crucial role in the maintenance of road service life and the improvement of driving experiences. In this paper, an improved automatic road crack detection system is proposed to reduce false detection under various noisy road surface conditions and to improve sensitivity in detecting light and thin cracks. The proposed system combines a variety of traditional image processing techniques, such as filtering and morphological processing, with scalable and efficient machine learning algorithms. Real road images with various noise conditions are taken to evaluate the performance of the proposed system. Experimental results have shown that the proposed system improved detection sensitivity and reduced false detection compared to some existing system, thus achieving higher detection accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360B (2018) https://doi.org/10.1117/12.2506418
With the increase of expectation for higher quality of life, consumers have higher demands for quality food. Food authentication is the technical means of ensuring food quality. One approach to food authentication is Near Infrared Spectroscopy which, for instance, can be used to differentiate between organic and non-organic apples. It is effective but time-consuming and expensive. This paper presents a novel approach where low-cost hardware devices are used to collect apple images by using smartphone combined with pattern approach. We using a smartphone to obtain the apple image, the color always changes over time during the processing of the acquisition, and record the image during the color change. We convert the image into a feature vector in RGB space so that can be analyzed in some pattern recognition algorithm. In this paper we use Partial least squares discriminant analysis (PLS-DA), k-nearest neighbors (KNN) and support vector machine (SVM) to analyze the data. Experiments were carried out on a reasonable collection of apple samples and cross validation was used, resulting in an accuracy of around 90% between organic and non-organic apples. Our studies conclude that this approach has the potential to lead to a viable solution to empower consumers in food authentication.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360D (2018) https://doi.org/10.1117/12.2513881
This paper studies the iris recognition under unconstrained conditions. In these circumstances iris recognition becomes challenging because of noisy factors such as the off-axis imaging, pose variation, image blurring, illumination change, occlusion, specular highlights and noise. A robust algorithm for localization of non-circular iris boundaries is proposed. It can localize the iris boundaries more accurately than the methods based on the Daugman algorithm. Operating on the filtered iris images, this method determines the outer iris boundaries. First we implemented Canny algorithm for edge detection in the segmented image. Then we ran the edge link algorithm on the edge map, achieving edge lists of connected edge points and selecting the longest one that has maximum number of points for outer iris boundary localization. Finally, we investigated how to extract highly distinctive features in the degraded iris images. We present a sequential forward selection method for seeking a sub-optimal subset of filters from a family of Gabor filters. The recognition performance is greatly improved with a very small number of filters selected. Experiments were conducted on the UBIRIS.v2 iris database and promising results were obtained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360E (2018) https://doi.org/10.1117/12.2513889
As an important branch of multi-source image fusion, infrared and visible image fusion not only inherits the basic theory and method of image fusion, but also has its own characteristics. Visual saliency detection method reflects the significant information in infrared image, and the feature has the consistent in the object information and background details with the infrared and visible light images. So, this paper proposed a novel framework of infrared and visible image fusion by using visual saliency and non-subsampled shearlet transformation. Comparing the proposed fusion method with some existing algorithms, the experimental results show that the proposed method can not only highlight the object information, but also can preserve the abundant background information effectively in the visible light image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360F (2018) https://doi.org/10.1117/12.2513980
The Euler number of a binary image is an important topological property for pattern recognition, image analysis, and computer vision. In the proposed algorithm, only three comparisons need to be completed for processing a bit-quad in the given image. Moreover, the proposed algorithm processes three rows simultaneously in the scanning which will reduce the number of checked pixels from 4 to 1.5 for processing each bit-quad, which will lead to an efficient processing. Experimental results demonstrated that the performance of the proposed algorithm significantly overpasses conventional Euler number computing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360G (2018) https://doi.org/10.1117/12.2513984
Since the labels of training samples are related to bags not instances, the multiple instance learning (MIL) is a special ambiguous learning paradigm. In this paper, we propose a novel bag space (BS) construction and extreme learning machine (ELM) combination method named BS_ELM for MIL, which can capture the bag structure and use the efficiency of ELM. Firstly, sparse subspace clustering is performed to obtain the cluster centers and a new bag space is constructed. Then ELM is used to classify bags in the new space. Experiments on data sets demonstrate the utility and efficiency of the proposed approach as compared to the other state-of-the-art MIL algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360H (2018) https://doi.org/10.1117/12.2513987
Fabric defect detection is an important part of quality control in textile producing enterprises. In order to effectively improve the detection probability, the fabric defects detection algorithm based on multifractal spectrum (MFS) and support vector machine (SVM) is proposed in this paper. The detection process is divided into two main parts: feature extraction and classification, including image segmentation, MFS feature extraction, SVM model training, detection classification and classification results. The simulation experiment results show that the algorithm has good performance of detection and classification based on the detection rate and the false alarm rate, and it has a certain robustness and can be applied to the actual generation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360I (2018) https://doi.org/10.1117/12.2513989
As one of the research hotspots in recent years, especially in pattern recognition, Convolutional Neural Network (CNN) is widely known for its high efficiency. However some researches show that there is a problem in the CNN which cannot learn the high-level features. In order to solve this problem, this paper proposes a new kind of image representation, which we call it “shape encoding maps”. Our experimental results show that, in most cases, the recognition accuracies obtained by inputting the shape encoded maps to a CNN are higher than that of using the original image data for a CNN to learn directly without shape encoding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360J (2018) https://doi.org/10.1117/12.2514014
In this paper, a novel enhancement algorithm for low-light images captured under low illumination conditions is proposed. More concretely, we design a method firstly to synthesize low-light images as training datasets. Then preclustering is conducted to separate training data into several groups by a coupled Gaussian mixture model. For each group, we adopt a coupled dictionary learning approach to train the low-light and normal-light dictionary pair jointly, and the statistical dependency of the sparsity coefficients is captured via Extreme Learning Machine simultaneously. Besides, we use a multi-phase dictionary learning strategy to enhance the robustness of our method. Experimental results show that proposed method is superior to existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360K (2018) https://doi.org/10.1117/12.2514045
Today, recognizing offline handwritten character images is still hard challenge. This is because there are the obstacles, ‘noise’ produced in scanning process. Noise makes handwritten character distorted, murky, and blurred. As a result, it become hard to read and recognize these images for human. In this study, we tried to get rid of various noises using CNN architecture named “U-Net” to analyze 607,200 sample images consisting of 3,036 Japanese characters. Finally, our results indicate that the “U-Net” has efficient ability to remove noise and enhance the parts of strokes even through there are a huge variety of handwritten styles which includes various noises.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360L (2018) https://doi.org/10.1117/12.2514230
In the face of massive image data, how to improve the speed of image retrieval under the premise of ensuring accuracy is the focus of this research. In this paper, a distribute CBIR system based on the deep convolutional neural networks (DCNN) on Spark and Alluxio is proposed. Spark platform is a distributed memory calculation model to achieve higher computing performance. Alluxio is a high-performance, fault-tolerant, memory-based open source distributed storage system. This article lets Spark focus on computing image features and image matching. The storage of intermediate data in the image matching process is handled by Alluxio, thus breaking the bottleneck of Spark computing. The results of a large number of comparative experiments show that the proposed system shows obvious advantages both in terms of image storage capability and image retrieval speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360M (2018) https://doi.org/10.1117/12.2514269
Fast obstacle detection is essential for autonomous driving. In this research, we have developed an obstacle detection model using Single Shot Multi Box Detector. SSD is a regression-based object detecting convolutional neural network that takes images as an input to compute localization and classification at once. By using SSD, processing time is dramatically reduced compare to multi shot detector. SSD object detection model was trained using APIs provided by Google in different patterns of number of classes and availability of transfer learning. Increase of the number of classes tended to decrease the detection rate. Training with transfer learning increased the average precision in general. The effectiveness of transfer learning in image recognition can be confirmed. Also there is a difference in average precision depending on the class.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360N (2018) https://doi.org/10.1117/12.2514444
In order to improve the accuracy of saliency target recognition in digital images, this paper proposes a saliency detection algorithm based on low-level feature optimization for full convolution neural networks. Firstly, a fully convolutional neural network is constructed and trained on the basis of the VGG-16 network, and the initial saliency map is obtained through the output of the full convolutional neural network. Then, the input image is super-pixel divided, and the super pixel is regarded as a vertex of a graph to compose. On the basis of the initial saliency map, the superpixel saliency division is performed. The selected initial seed points are selected based on the central prior, and the low-level eigenvalues such as the superpixel RGB eigenvalues are calculated, and the saliency region merging is performed to obtain the saliency optimization map based on the low-level feature optimization. Finally, the initial saliency map and the saliency optimization map are combined to obtain the final saliency map. The comparison experiments show that the proposed algorithm achieves the excellent precision compared with other algorithms, and illustrates the effectiveness of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360O (2018) https://doi.org/10.1117/12.2514452
Owing to the requirement of high resolution of imaging system, such as the infrared imaging system, the imaging laser radar, the compressive sensing is used in the imaging system with low resolution sensors to obtain high quality image information. In this paper, compressive sensing is applied in an imaging system. A random phase mask is placed on the lens of the optical system. The optical wave propagation process of light field from the object plane to the lens, then to the image plane is analyzed and theoretical formula of the propagation with the form of the Fourier transform expression is deduced, thus the reconstruction speed is high by using the fast Fourier transform. The orthogonal wavelet transform and the orthogonal matching pursuit algorithm are employed in the reconstruction. The simulation results prove the good performance of the reconstruction quality and speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360P (2018) https://doi.org/10.1117/12.2514625
Radiogenomics is a recent promising field in cancer research focusing on associating genomic data with radiographic imaging phenotypes. This study is initiated to establish the mapping between quantitative characteristics of CT images and gene expression data, based on publically available dataset that includes 26 non-small cell lung cancer (NSCLC) patients. On one hand, a set of 66 features are extracted to quantify the phenotype of tumors after segmentation. On the other hand, co-expressed genes are clustered and are biologically annotated that are represented by metagenes, namely the first principal component of clusters. Finally, statistical analysis is performed to assess relationship between CT imaging features and metagenes. Furthermore, a predictive model is built to evaluate NSCLC radiogenomics performance. Experiment show that there are 126 significant and reliable pairwise correlations which suggest that CTbased features are minable and can reflect important biological information of NSCLC patients.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360Q (2018) https://doi.org/10.1117/12.2514629
Due to that the infrared thermal imaging system has the characteristics of low contrast and small dynamic range, this paper proposed an real-time infrared image enhancement algorithm based on Limit Contrast Adaptive Histogram Equalization (CLAHE) and also provided the algorithm implementation. The algorithm firstly divides the pretreated image data into several sub-regions of size, and then the histogram of the sub-region is calculated respectively, the clipping threshold of histogram is determined according to the image gradient information, the captured pixels are evenly distributed to each gray level. Finally, bi-linear interpolation is used to remove the unbalance effect of block edge transition. Experimental results show that compared with traditional algorithms, this algorithm is capable of suppressing the noise and highlight the edges and details of the image, as well as meeting the real-time requirement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360R (2018) https://doi.org/10.1117/12.2514674
In order to identify the status of traffic lights in urban traffic scenes effectively, a recognition method of traffic lights using HSV color space model is proposed in this paper. Firstly, the median filter and the light compensation algorithm are used to preprocess images of urban traffic scenes. Secondly, the template matching method of traffic lights and the Bhattacharyya coefficient are used to detection of the traffic lights area in images of traffic scenes. Finally, the status of traffic lights in urban traffic scenes are identified using HSV color space model. The experimental results show that the proposed recognition method of traffic lights using HSV color space model offers the best performance than RGB color space model and YCbCr color space model. The recognition accuracies of red, green and yellow traffic lights are 96.67%, 95.0% and 88.67%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360S (2018) https://doi.org/10.1117/12.2514676
For the problem that the correlation filter (CF) trackers can not effectively deal with the occlusion and cause the loss of the target, a large number of tracking algorithm improvements focus on the combination of more powerful features to enrich the apparent model of the target. However, this only helps to discriminate the target from background within a small neighborhood. In this paper, an improved context-aware correlation filtering framework is introduced, which can comprehensively integrate global context information in the correlation filter tracker to effectively deal with the target's fast motion, occlusion and other issues. And the criterion APCE is used to judge the reliability of the tracking result, thus adjusting the threshold adaptively for model updating. A large number of experiments demonstrate that this framework has a significant impact on the performance of many CF trackers with only a modest impact on frame rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360T (2018) https://doi.org/10.1117/12.2514693
Applying machine vision in laser marking to feature matching and positioning of interesting objects, in order to improve the robustness of the feature to image translation, rotation, scaling, and the accuracy of point positioning, a corresponding algorithm is needed to implement, but the accuracy of current algorithm is not high, and it takes a long time to feature match and point locate. Therefore, this paper combines the shape features of the image, gives the angle features and scale features of the image, deduces the use of angle and scale mixed descriptors to obtain the shape feature information, and standardizes the descriptors. At the same time, the local curvature and the maximum polar radius are combined to determine the initial position of the key point in the contour, and Euclidean distance is used to measure similarity. The Gaussian pyramid is used to perform hierarchical matching to accelerate the algorithm to meet the requirements of industrial real-time. Finally, the matching key points are fitted with rigid transformation parameters to achieve precise positioning of marking points. Through matching and positioning experiments on the high resolution marking images, the results show that the proposed algorithm has high efficiency, the matching accuracy is 94.73%, and the average timeconsuming is only 11.2% of the SURF matching algorithm. It shows that the algorithm of this paper can accurately and effectively match the workpiece and locate the marking point under the premise of real-time performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360U (2018) https://doi.org/10.1117/12.2514800
An improved non-local means filter algorithm is proposed. The common NLM algorithm only considers the Euclidean distance between pixel values as the calculation standard of weights, neglects the spatial position relationship of pixels and the similarity of texture details between image blocks, which results in the distortion of image structure after filtering, and the edge information is missing. To solve this problem, the author uses the spatial position of pixels in the image to improve the Euclidean distance. At the same time, the structural similarity index measurement (SSIM) is used to measure the similarity of neighbourhood image blocks to obtain the similarity weight, using this weight, the Euclidean distance of the image block is weighted again to reduce the weight of image blocks with low structural similarity. At the same time, the weight of the image blocks with high structural similarity is increased to achieve the ability to maintain the edge information. The experimental results show that the proposed algorithm effectively maintains the edge and detail of the image, and is superior to the conventional NLM algorithm in terms of PSNR and SSIM indicators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360V (2018) https://doi.org/10.1117/12.2514804
An infrared image enhancement algorithm combining dark channel prior and adaptive limited contrast enhancement is proposed. Firstly, using the physical model of infrared atmospheric transmission and combining the principle of dark primary color enhancement, the infrared image before atmospheric transmission degradation is restored. Afterwards, the enhanced image is divided into basic sub-graphs and detailed sub-graphs by using the method of guided filtering. The basic sub-graph is further adjusted by Limit contrast histogram enhancement (CLAHE) algorithm, and the detailed sub-graph is processed by gamma transform after being filtered by non-local mean filtering. Finally, wavelet transform is used to fuse the two enhanced sub-graphs. The experimental results show that the algorithm can effectively improve the contrast of the image and make the details of the image highlight.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360W (2018) https://doi.org/10.1117/12.2514907
High dynamic range (HDR) images are rendered through base-detail separations. The representative detail preserved algorithm iCAM06 has a tendency to reduce the sharpness of dim surround images, because of the discrete calculation of the fast-bilateral filter (FBF). This paper proposes a noble base-detail separation and detail compensation technique using the contrast sensitivity function (CSF) in the segmented frequency domain. Experimental results show that the proposed rendering method has better sharpness features than previous methods correlated by the human visual system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360X (2018) https://doi.org/10.1117/12.2514908
The main obstacle of gait-based gender recognition, compared with other biometrics, is that the influence of changes of clothing, carrying and surface on profile of pedestrians. In this paper, we propose a novel gait representation to reduce the influence of covariate conditions: clothing and carrying invariant area of GEI (Inv-GEI). Firstly, we calculate the GEI and its synthetic common templates that include the various features of female and male under different conditions. Then an improved gait entropy map is proposed to get the mask automatically, containing females and males commonfeatures. To this end, we can use the mask to remove the information that is irrelevant to gait in GEI, and get the gait features that are invariant to condition changes, which is beneficial to gender recognition. This paper explores the performance of Inv-GEI with the state-of-the-art deep convolution model Vgg-16 based on the CASIA B dataset. The experimental results have shown that the proposed method achieves excellent recognition rate under clothing and carrying conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360Y (2018) https://doi.org/10.1117/12.2515278
Combined with deep learning technologies, fashion landmark detection is an efficient method for visual fashion analysis. Existing works mainly focus on eliminating the effect of scale and background, and require prior knowledge of body structure. In this paper, we propose a fashion pose machine which is based on the location method of the landmark for human posture estimation. To increase the accuracy of fashion detection, we utilize convolutional neural network to learn the spatial structure among fashion landmarks in sequential prediction framework, which can eliminate the effect of the clothing placement and model posture on fashion landmark in the image. Our method does not require any prior knowledge of human body structure to learn the dependencies between different landmarks. We evaluated our model on the dataset of FashionAI, and the result showed that our model is 25% better than the state-of-the-art alternative.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xianyu Chen, Mingru Jin, Yang Xu, Wenfeng Shen, Feng Qiu
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108360Z (2018) https://doi.org/10.1117/12.2326970
After years of development, the video tracking algorithm has solved the problem of complex scenes to some extent. However, the traditional video tracking algorithm is based on the characteristics of artificial extraction. Most of them are only aimed at specific goals and scenarios. They have poor generalization ability and are not robust enough to meet the requirements of intelligent monitoring. Based on the research of video tracking technology and deep learning principles and their applications, the performance of each algorithm under different scenarios was analyzed. The deep research on video tracking technology based on deep learning was conducted and proposed a video object tracking algorithm based on the combination of deep network model SSD and Camshift.This method combines deep learning with the mainstream target tracking framework, makes full use of SSD's powerful feature expression capabilities, and shows good tracking performance in complex scenes such as occlusion, deformation, and light changes in video sequences, which has good robustness and accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083610 (2018) https://doi.org/10.1117/12.2504317
Deep learning has been widely used in visual tracking due to strong feature extraction ability of convolutional neural network(CNN). Many trackers pre-train CNN primarily and fine-tune it during tracking, which could improve representation ability from off-line database and adjust to appearance variation of the interested object. However, since target information is limited, the network is likely to overfit to a single target state. In this paper, an update strategy composed of two modules is proposed. First, we fine-tune the pre-trained CNN using active learning that emphasizes the most discriminative data iteratively. Second, artificial convolutional features generated from empirical distribution are employed to train fully connected layers, which makes up the deficiency of training examples. Experiments evaluated on VOT2016 benchmark shows that our algorithm outperforms many state-of-the-art trackers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083611 (2018) https://doi.org/10.1117/12.2504630
To serve users with different bandwidth environment, the JCT-VC proposed a scalable extension of HEVC (SHVC) standard, which can implement the scalability along temporal, spatial or quality dimensions. It encode one video to one based layer (BL) and several enhancement layer (EL) bitstreams to provide scalable video consumption. To speed up the coding process, the SHVC can additionally utilize interlayer coding mode cor- relations, as compared to the HEVC that utilizes spatial, temporal, and inter-depth coding mode correlations. In this research, we investigate coding mode correlations from SHVC code-streams and find out extensive and general inter-layer mode correlations rules. Based on these extensive and general rules, we proposed two fast coding methods: (1) To fast encode one EL CU, it refers to the co-located BL CU depth to reduce the number of coding depth tests. The high and low speedup approaches adopt general but poor quality rules and extensive but good quality rules, respectively; (2) To fast encode EL PUs, the co-located BL PU modes and inter-layer mode correlations and classifications are used to specify required PU modes for test. Experiments showed that the proposed fast SHVC methods that combines the fast CU and fast PU coding procedures can reduce 76.71%and 62.7%, respectively, of processing time with the high and low speedup approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083612 (2018) https://doi.org/10.1117/12.2506632
Recently, human action recognition in videos has attracted much attention. This paper proposed a framework for human action recognition based on procrustes analysis and Fisher vector encoding. First, we apply a pose based feature extracted from silhouette image by employing Procrustes analysis and local preserving projection. It can preserve the discriminative shape information and local manifold structure of human pose and is invariant to translation, rotation and scaling. After the pose feature is extracted, a recognition framework based on Fisher vector encoding and multi-class supporting vector machine is employed for classifying the human action. Experimental results on benchmarks demonstrate the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083613 (2018) https://doi.org/10.1117/12.2513848
The text detection of football match scenes is an important step in text recognition. Due to the complex background of the football scenes, the arbitrary orientation of the target position, the different aspect ratios of the texts, small targets and so on. Therefore, text detection in football scenes is also a challenging problem. The text detection network TextBoxes has been improved to produce a text detector in the football scene, named DTB Net, which can detect texts with different aspect ratios and small target texts in the football scenes. Through experimental comparison, DTB Net has higher precision and recall rate in the text detection of football matches, which lays a foundation for the recognition of texts in football scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083614 (2018) https://doi.org/10.1117/12.2513849
Person re-identification is an important task in video surveillance fields. Large variations in pose, illumination and occlusion could change the appearance of the person, which make person re-identification still be a challenging problem. Developing robust feature descriptors benefit the person matching. In this paper, we propose a new multi-feature fusion person re-identification method focusing on combining hand-crafted feature and deep feature. Specifically, we first extract hand-crafted features both on local regions and global region from each image, which can collaborate local similarities as well as global similarity to overcome the problems caused by local occlusion. Then we train CNN model which has fused three datasets to get deep feature. Finally, we present to optimize and integrate the re-identifying result of hand-crafted feature and deep feature by selective weighting combination. The results carried out on three person re-identification benchmarks including VIPeR, CUHK01 and CUHK03, which show that our method significantly outperforms state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zhigao Cui, Yanzhao Su, Zhenqiang Bao, Jinming Zhang
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083615 (2018) https://doi.org/10.1117/12.2514211
The research of traditional shadow detection is mainly based on the stationary camera. As the dual PTZ camera system can obtain both the multi-view and multi-resolution information, it has received more and more attention in real surveillance applications. However, few works about shadow detection and removal with such system were found in literature. In this paper, we propose a novel framework to automatically detect and remove shadow regions in real surveillance scenes from dual-PTZ-camera system. Our method consists of two stages. (1) In the first stage, the initial shadow regions are detected by comparing the similarities of pixel gray between two camera images after the homography transformation. We have demonstrated that the corresponding shadow points on a reference plane are related by a time-variant homography constraint as the camera parameters changing. (2) In the second stage, the detection of shadow region is treated as a superpixel classification problem, the predicted shadow candidates in the first stage are fed to a statistical model based on multi-feature fusion. We prove the effectiveness of the proposed shadow detection method by incorporating it with a dual-PTZ camera tracking system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083616 (2018) https://doi.org/10.1117/12.2514651
Automatically generating rich natural language descriptions for open-domain videos is among the most challenging tasks of computer vision, natural language processing and machine learning. Based on the general approach of encoder-decoder frameworks, we propose a bidirectional long short-term memory network with spatial-temporal attention based on multiple features of objects, activities and scenes, which can learn valuable and complementary high-level visual representations, and dynamically focus on the most important context information of diverse frames within different subsets of videos. From the experimental results, our proposed methods achieve competitive or better than state-of-the-art performance on the MSVD video dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083617 (2018) https://doi.org/10.1117/12.2500381
Near infrared and visible fusion recognition is an active topic for robust face recognition. Local binary patterns (LBP) based descriptors and sparse representation based classification (SRC) become two significant techniques in face recognition. In this paper, near infrared and visible face fusion recognition based on LBP and extended SRC is proposed for single sample problem. Firstly, the local features are extracted by LBP descriptor for infrared and visible face representation. Secondly, the extend SRC (ESRC) is applied for single sample problem. Finally, to get a robust and time-efficient fusion model for unconstrained face recognition with single sample situation, the infrared and visible features fusion problem is resolved by error-level fusion based on ESRC. Experiments are performed on HITSZ LAB2 database and the experiments results show that the proposed method extracts the complementary features of near-infrared and visible-light images and improves the robustness of unconstrained face recognition with single sample situation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083618 (2018) https://doi.org/10.1117/12.2500635
Discriminative Correlation Filters based tracking algorithms exploiting conventional handcrafted features have achieved impressive results both in terms of accuracy and robustness. In this paper, to achieve an efficient tracking performance, we propose a novel visual tracking algorithm based on a complementary ensemble model with multiple features. Additionally, to improve tracking results and prevent targets drift, we introduce an effective fusion method by exploiting relative entropy to coalesce all basic response maps and get an optimal response. Furthermore, we suggest a simple but efficient update strategy to boost tracking performance. Comprehensive evaluations are conducted on two tracking benchmarks demonstrate and the experimental results demonstrate that our method is competitive with numerous state-of-the-art trackers. Our tracker achieves impressive performance with faster speed on these benchmarks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083619 (2018) https://doi.org/10.1117/12.2513845
Aspect extraction plays an important role in aspect-level sentiment analysis. Most existing approaches focus on explicit aspect extraction and either seriously rely on syntactic rules or only make use of neural network without linguistic knowledge. This paper proposes a linguistic attention-based model (LABM) to implement explicit and implicit aspect extraction together. The linguistic attention mechanism incorporates the knowledge of linguistics which has proven to be very useful in aspect extraction. We also propose a novel unsupervised training approach, distributed aspect learning (DAL), the core idea of DAL is that the aspect vector should align closely to the neural word embeddings of nouns which are tightly associated with the valid aspect indicators. Experimental results using six datasets demonstrate that our model is explainable and outperforms baseline models on evaluation tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361A (2018) https://doi.org/10.1117/12.2513868
In this paper, the Spatio-Temporal graph of Structural-RNN[6] is developed and applied to action recognition task. We proposed a Structural-Attentioned LSTM network by adding joints, changing the specific connection mode in the original spatio-temporal graph, and introducing attention mechanism to enable the network to select edges with best representation of action automatically. We take multiple experiments on the public dataset JHMDB[10] to verify the validity of our model, achieved good results when only limited features were used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361B (2018) https://doi.org/10.1117/12.2513869
Traffic sign detection is an important part of driverless vehicle. high accuracy detection algorithms are difficult to run in real-time. In this paper, we propose a detection model to ease the problem effectively. our model combines three key insights with YOLOv2 to improve the mean average precision(mAP): (1) Focal Loss is used to let our model focus on a sparse set of indistinguishable samples, (2) Inception is used to increase the depth and nonlinearity of network and (3) ResNet is used to ease the difficult in training deep convolutional neural network by adding cross-layer connections. On the German Traffic Sign Detection Benchmark (GTSDB), our model can achieve high accuracy and real-time performance of traffic sign detection at the same time. The recall is 94.46%, the precision is 96.60%, the AUC is 99.75%, the mAP is 88.23% and the average time for processing an image is 0.017s. Results indicate that the modified detection model is competitive compared to others.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361C (2018) https://doi.org/10.1117/12.2514677
In the land battlefield environment, the tracking of personnel target is mainly affected by the complex environment, like smog, rain, shadows and partial shelter from the woods. These elements make the effect of kernelized correlation filters (KCF) target tracking based on visible light very unsatisfactory, because the brightness, color and rich texture information mainly included in visible light imaging are polluted. The infrared imaging system, due to its sensitivity to heat source, can be perceived in the dark and has low dependence on the surrounding environment. However, limited by its own characteristics, the single infrared imaging system loses some visible light information, such as light intensity, texture and color, and the image resolution is low. Considering the characteristics of visible light and infrared imaging are complementary, it’s reasonable to fuse integral channel features (ICF) of infrared gray image into HOG features of visible light images, and adjust model update rate corresponding to the degree of occlusion of target in the infrared image, to achieve more robust tracking effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lei Yuan, Kuangrong Hao, Xuesong Tang, Xin Cai, Yongsheng Ding
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361D (2018) https://doi.org/10.1117/12.2514957
A multi-scale binocular-channels convolution neural network (MBCNN) is proposed to solve complex scene classification and achieved a good accuracy. We use a physiological phenomenon called visual crowding to explain the deficiency of the CNN framework and prove the effectiveness of the double flow model. With the help of a novel bilateral-channels network based on global information and local significant information and our multi-scale feature integration method, the proposed MBCNN can reduce the identification obstacle caused by visual crowding in the V1(Information input area) and V4 (High-level information area) area separately. Experiment results verify that the proposed network has better performance on MIT Indoor 67 and Scene 15 classification datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361E (2018) https://doi.org/10.1117/12.2515147
Detection of premature ventricular contraction (PVC) in children is an important step in the diagnosis of arrhythmia. It not only requires professional knowledge, but also occupies a large amount of repetitive work of clinicians. Deep learning based computer model has recently been applied into the clinical field for disease diagnosis. In this study, we built a Long Short-Term Memory (LSTM) recurrent neural networks (RNN) model to detect PVC with children’s electrocardiogram (ECG). 1019 children with and 1198 without PVC were selected for this study. The lead II of the 12 leads ECG signal for each child was used for diagnosis. In total, 220 studies were selected randomly as validation set, 222 studies as testing set, and the rest as training set. The best LSTM model achieved a testing F1 score 0.94 on PVC classification task. With 10- folds validation, the area under receiver operating characteristic curve (AUC) achieved 0.97±0.01. To conclude, this is a meaningful step towards large scale and efficient PVC diagnosis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361F (2018) https://doi.org/10.1117/12.2326991
With the application of deep learning in the detection of medical images, computer aided systems have become less and less effective in detecting auxiliary nodules. The feature extraction of nodule detection is crucial to the judgment of the final result, which is also true in the automatic classifier. The pre-trained convolutional neural network has a certain degree of success in depth feature extraction. In this paper, we used Unet, a full-volume machine network instead of the traditional convolutional neural network to extract features. We used Resnet to construct a classifier for the extracted features and use Adaboost for integrated learning, which ultimately achieved the best accuracy,up to 72.6%. Compared with the use of traditional classifiers such as SVM, BPNN and other methods, the proposed method has better advantages in feature extraction and detection speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361G (2018) https://doi.org/10.1117/12.2327014
In the field of agriculture, satellite imagery has revolutionized the development of remote sensing in food production, food security and supply chain management. However, existing solutions are too limited to solve all the problems of remote sensing big AgriData completely mainly because they are not designed to permit collaboration among stakeholders to support sharing of data and farming operations in general and create useful knowledge bases. Access to real-time AgriData, real-time forecasting and tracking of physical items will significantly change farm management and operations and in combination with IoT development will lead in the autonomous operation of farm. Our focus is to identify, explore and exploit the added value the big remote sensing AgriData provide in food security context. In this sense the main application related to cropland mapping context are also reviewed and discussed concentrating on their suitability in mapping crop types at small-scale farms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361H (2018) https://doi.org/10.1117/12.2504282
Limited annotated training data is a challenging problem in Action Unit detection. Particularly, for micro-expression AU detection, more training data can help improve the performance of detection. For the purpose of data augmentation, this paper put to use the generative adversarial networks (GAN) which is able to generate High-quality pictures that as a supplementary to our limited database. In addition, we propose a sample and effective model for facial micro-expression action units (AU) detection based on 3D-CNNs and Gated Recurrent Unit (GRU) network. The network is composed of 6 layers including 3 convolutional layers, correspondingly, each convolution layer is followed by a pooling layer, and a single layer GRU unit with 15 hidden nodes. For the task of recognizing AUs, we have trained a network for the DISFA datasets, where the GAN applied on, so as to take full advantage of AU-tagged databases and enable the network convergence faster and easier. We show that our model and the method supplying labeled-AU database achieve competitive performance compared with state-of-the-art deep learning methods and traditional data expansion methods such as rotate angles and increase noise based on original drawings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361I (2018) https://doi.org/10.1117/12.2505997
Automated assessment of Chinese subjective questions is a crossed research direction on linguistics, natural language processing (NLP) and related disciplines. In this paper, we focus on correcting political subjective questions, and based on the analysis of the manual scoring process, a novel automatic scoring framework is created. It mainly includes two parts. Firstly, we represent the sentence semantic by an unsupervised model that involves a weighted average of the word vectors. Then, we propose a correction algorithm which combines keywords matching and semantic similarity computation. Comparison between the results made by our framework and the teacher proves reasonableness of the model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361J (2018) https://doi.org/10.1117/12.2511679
Natural language processing is widely used in the real life. Natural language understanding is an important part of it to make a machine understands a language. This paper proposes an automated scoring system for a short answering subjective test in Thai’s language. A finite state machine and word2vec are used to create a scoring machine. Many challenge issues to be solved such as problems of Thai’s language, problems of a small number of words appeared in an answer, and problems about flexible of grading. The proposed method able to create an automated scoring machine after there is an exam question. With the proposed technique, there is no need to wait student’s answers for creating a model as many previous works. The experimental result shows that the proposed method gives a score nearly scoring by humans.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361K (2018) https://doi.org/10.1117/12.2513857
System logs record the daily status of operating systems, application software, firewalls, etc. Analyzing system logs can help to prevent and eliminate information security events in real time. In this paper, we propose to analyze the system logs for anomalous event detection based on natural language processing. First, we use doc2vec of natural language processing algorithm to construct sentence vectors, then apply several state-of-the-art classification algorithms on the sentence vectors for anomaly detection. The system logs generated by the Thunderbird supercomputer are adopted here to verify the proposed method. The results show that doc2vec combined with machine learning classification algorithms could not only effectively extract the semantic information of the logs, but also perform excellent anomaly detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361L (2018) https://doi.org/10.1117/12.2513939
Inverted files emerged in computer science as information indexing tools for large-scale search applications. After decades of computing, only one general method has been found able to deal quickly and efficiently with vast amounts of data. That is indexing, which is at the heart of both Google search and large scale DNA processing. However, indexing-based pattern recognition is virtually non-existent. The paper provides a mathematical framework that unifies search and pattern recognition algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361M (2018) https://doi.org/10.1117/12.2513979
In this paper, mixed data processing methods in Raman spectrum, such as removing background fluorescence, despike, debaseline, is studied. In addition, Multi-Parameter extraction method is also used for data processing. It showed that the methods in this study had good performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361N (2018) https://doi.org/10.1117/12.2514046
For existing Faster R-CNN and single shot multibox detector (SSD) target detection algorithms, they all have the problem of low object detection accuracy under small target conditions. This paper proposes a general and effective target detection algorithm and the detection accuracy has greatly improved for smaller targets. The algorithm is divided into two parts. In the first part, in the feature extraction process, the feature map extracted by the basic feature extraction network is deconvoluted and merged with the previous layer feature map to generate Multi-scale feature maps with rich semantics and high resolution. Using proposed multi-scale feature maps to generate proposals. The second part uses the generated proposals to be sent to the Faster R-CNN network for classification and detection. Experiments show that using this algorithm for target detection can not only improve the recall of proposals, but also improve the accuracy of target detection, especially for small targets. The algorithm provides a new idea for small target detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361O (2018) https://doi.org/10.1117/12.2514204
Since the labels of training samples are related to bags not instances, the multiple instance learning (MIL) is a special ambiguous learning paradigm. In this paper, we propose a novel bag space (BS) construction and extreme learning machine (ELM) combination method named BS_ELM for MIL, which can capture the bag structure and use the efficiency of ELM. Firstly, sparse subspace clustering is performed to obtain the cluster centers and a new bag space is constructed. Then ELM is used to classify bags in the new space. Experiments on data sets demonstrate the utility and efficiency of the proposed approach as compared to the other state-of-the-art MIL algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361P (2018) https://doi.org/10.1117/12.2514419
The core of the overall fuel cell vehicle control is energy distribution strategy. This study studies fuel cell buses and aims to extend the fuel cell lifespan to guarantee battery lifespan and to enhance the vehicle's overall performance. We proposed a fuzzy method for energy distribution and state of charge feedback and designed a fuel cell bus energy distribution model based on the Takagi–Sugeno fuzzy control. Secondary development based on these results was carried out through the ADVISOR simulation platform. Comparison of simulation results shows that the proposed control strategy not only satisfies the full vehicle dynamic performance requirements, but also enhances the fuel cell lifespan and guarantees the battery lifespan, while remains relatively economically competitive.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361Q (2018) https://doi.org/10.1117/12.2514561
Based on the analysis of rotor fault and its corresponding axial trajectory, the simulation of axial trajectory is carried out by using MATLAB. According to the automatic identification of axis locus of rotating machinery, a characterization of axial trajectory based on Hu invariant moment is studied. The frequency characteristic of the vibration signal improves the search strategy of the matching algorithm, and proposes a fast matching method with variable step size. The method consists of two parts, rough matching and fine matching. The coarse matching ensures the fastness of the algorithm and quick classification of the running state. The fine matching ensures the accuracy of the matching result. The results show that the feature of the axis locus extracted by Hu invariant moments The recognition rate of rotating machinery can provide a reference for the automatic identification of rotor fault diagnosis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361R (2018) https://doi.org/10.1117/12.2514563
In this paper, we apply Extreme Gradient Boosting (XGBoost) widely used in many areas to human motion classification. During this research, we compare the performance of XGBoost and other machine learning methods, such as Support Vector Machine (SVM), Naive Bayes (NB), k-Nearest Neighbors (k-NN). In addition, we make a comprehensive comparison of XGBoost and Random Forest (RF). The experimental results reveal that XGBoost can achieve better results in activity classification based on inertial sensors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361S (2018) https://doi.org/10.1117/12.2514808
Traditional face recognition based on the machine learning often adopts the batch learning method, but in the practical applications, the training data of face recognition system can not be obtained at one time, but is obtained one by one with the passage of time. When there are new training samples, the whole system needs to be retrained by using batch learning method. In order to solve this problem, an incremental learning algorithm, online sequential extreme learning machine, is applied to the face recognition. The algorithm can not only train the data one after another, but also can be learned from one batch after another. Experimental results show that this algorithm has the advantages of high speed, high recognition rate and simple parameter selection in the face recognition, and it can be used as a good choice for the online updating of the face recognition system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361T (2018) https://doi.org/10.1117/12.2514827
When the traditional extreme learning machine is dealing with unbalanced data sets, the classification effect of the small number of samples is not ideal. A weighted extreme learning machine based on KFCM is proposed for this problem, and different penalty factors are given according to the proportion of samples in different categories.At the same time, considering the impact of outliers, the KFCM clustering gets the degree of membership that each type of sample belongs to, and adopts the degree of membership to conduct quadratic weighted means on penalty factors of extreme learning machine. Due to the high cost of calculating the generalized inverse of the weighted extreme learning machine, a method of cholesky decomposition is proposed. The simulation test results of the UCI standard datasets show that the proposed algorithm not only effectively improves the classification accuracy of the minority samples, but also achieves the optimal performance in the F-measure and G-means indexes, and the computation speed is much faster than the ordinary extreme learning machine algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361U (2018) https://doi.org/10.1117/12.2515343
The development of medical decision support systems is an important social and economically significant task, and one of the most important and acute directions in this field of research is cardiology supporting decision-making systems. The report considers the main requirements for the recognition system based on artificial intelligence methods and used to assess the functional state of the cardiovascular system (CVS). The description of the general scheme of the developed decision support system based on the identification and classification of CVS states is given. As various types of neural networks and other classifiers based on machine learning are often used in problems of the cardiovascular states identification, here, the main attention is paid to the use of convolutional and other deep neural networks for the recognition of images in cardiology for diagnostic purposes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361V (2018) https://doi.org/10.1117/12.2326658
As an important research top in human-computer interaction, the relationship between electroencephalogram (EEG) and emotion recognition has attracted wide attention,however, the partial accuracy of emotion recognition is low. For improving the accuracy rate, original EEG signal is filtered to 5 bands (δ, θ, α, β, γ) then the Differential Entropy (DE), Power Spectral Density (PSD), Wavelet Entropy (WE) and Approximate Entropy (ApEn) are selected by feature extraction method. Finally, we select the features and use the support vector machine (SVM), KNN, Naive Bayes classifier and the neural network for classification learning. A large number of data generated by 62 channels is inconvenient to calculate, in this paper, we use SVM for training DE feature to get the higher accuracy with four different electrode placement methods. Through the study we found that the overall accuracy is generally higher than the accuracy of each frequency band. The high frequency band in emotional activities play a more important role than the low frequency band. Smaller band and channels can also achieve the high accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361W (2018) https://doi.org/10.1117/12.2326911
In this paper, consensus problem of mixed-order multi-agent systems composed of first-order and second-order agents subject to input and velocity constraint is investigated. A distributed consensus bounded control law depending on information interchange with the adjacent agents is constructed, the range of communication gain is calculated by using Lyapunov stability theory and Lasalle invariant set principle. Consensus is achieved under the control laws if the communication topology graph is connected and undirected. Finally, the effectiveness of the proposed theorem is verified by numerical simulation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361X (2018) https://doi.org/10.1117/12.2326940
Attention based bidirectional long short-term memory networks have been increasingly concerned and widely used in Natural Language Processing tasks. Motivated by the performance of attention mechanism, various attentive models have been proposed to prompt the effectiveness of question answering. However, there are few researches that have focused on the impact of positional information on question answering, which has been proved effective in information retrieval. In this paper, we assume that if a word appears both in the question sentence and answer sentence, words close to it should be paid more attention to, since they are more likely to contain potential valuable information for the question. Moreover, there also has few researches that consider part-of-speech into question answering. We argue that words except nouns, verbs and pronouns tend to contain less useful information than nouns, verbs and pronouns, so that we can neglect the positional impact of them. Based on both assumptions above, we propose a part-of-speech and position attention mechanism based bidirectional long short-term memory networks for question answering system, abbreviated in DPOS-ATT-BLSTM, which cooperates with traditional attention mechanism to obtain attentive answer representations. We experiment on the Chinese medicinal dataset collected from the http://www.xywy.com/ and http://www.haodf.com/, and comparative experiments are made comparing with methods based on traditional attention mechanism. The experimental results demonstrate the good performance and efficiency of our proposed model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361Y (2018) https://doi.org/10.1117/12.2504521
In this paper, the knowledge construction technique called the repertory grid mind tool and the role-playing game (RPG) are used together in a game-based learning activity. The game has been designed and developed for the students in the Faculty of Science and Technology to practice the observation of the important features of birds in the Frugivores and Insectivores. The important features such as foot, leg, beak, crown, and tail are commonly used in the bird observation practice. By giving the hints in the RPG and the embedded repertory grid, students can share their knowledges and construct what they have learned during the game playing process. To evaluate the effectiveness of the game, the study has been tested in two separating group: one is the controlled group in which the students study and practice in the conventional learning environment without playing the game, whereas the other group has been taught in the collaborative learning environment by using this game. The results show that the repertory grid embedded in the RPG game can promote the students’ learning performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 108361Z (2018) https://doi.org/10.1117/12.2512460
The development of satellite positioning technology has solved the problem of precise positioning in the broad outdoor space, and location service (LBS) has gradually become an indispensable part of people's life and work. However, since satellite positioning signals are shielded by buildings in an indoor environment, satellite positioning technology cannot achieve precise positioning in an indoor environment. Most of the existing indoor positioning systems need to be equipped with additional devices, which increases the cost of hardware purchase and maintenance. At the same time, the map production needs to be manually drawn, which takes a lot of time and manpower. In order to solve these problems, this paper designed a low cost and high efficiency indoor high-precision positioning system, which uses indoor LED lamps as positioning base station and the SLAM technology commonly used in robotics to perform the 2D or 3D indoor map mapping.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083620 (2018) https://doi.org/10.1117/12.2513858
This paper firstly, introduces the definition and basic methods of fault diagnosis on board ship, and the actualities and development of fault diagnosis expert system are discussed, including the research work and practical significance of the project. The construction of a fault diagnosis expert system and its developed tools and are also introduced.
Secondly, the design of fault diagnosis expert system for ship power plant was narrated in more details. The advanced database technology was introduced into the expert system. For example ,on the basis of Access 2000 database in the system, created knowledge database of the expert system which includes rule precondition table, rule conclusion table and dictionary table. The inference engine was designed on the basis of uncertain inferential modal with reliability. It also reflects actual faulty environment of equipment more completely. Which could solve simultaneous problem with multi-trouble and multi-symptom.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083621 (2018) https://doi.org/10.1117/12.2513864
With the improvement of people's living standards, poultry has become accustomed to appearing on the table. However, the emergence of the bird flu virus not only harms the lives of poultry but also harms human health. The vaccine and epidemic prevention method is the main prevention and control method for the bird flu epidemic. The quality of bird flu vaccine is directly related to the safety of poultry and human life. The culture of bird flu vaccine is mainly through the inoculation of chicken embryos with avian influenza strains and inactivated by the proliferation and cultivation of strains in the embryonated eggs of the strains. Therefore, the detection of embryogenesis activity of strains is an important part of the proliferation and culture of avian influenza strains. For some traditional detection methods such as artificial eggthinning, there are shortcomings such as visual fatigue, low detection efficiency, and subjective factors that are easily detected by human eyes. This paper proposes a tensor depth calculation model, which extends the data from the vector space to the tensor space, which can better reflect the underlying relevance of the data. The activity detection of embryonated eggs was performed on real data sets. Comparison with convolutional neural network on vector space can get better recognition rate of this algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Shaojie Zhang, Hongbin Zhang, Yanqiu Ju, Chi Qi, Huichao Lv
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083622 (2018) https://doi.org/10.1117/12.2514022
This paper proposes a particle swarm optimization (PSO) based sensors management algorithm for armed helicopters. With the objective of solving the efficient pairing between multiple sensors and multiple targets, the proposal defines the sensor-target pairing matrix as a particle and defines the aggregated performance using the pairing matrix as the fitness function. Further, the iterative updates of the key parameters, including the velocity, the local optimum and the global optimum, are designed. The optimal aggregated performance is achieved through multiple iterations. Simulation results demonstrate that the proposed algorithm outperforms the existing non-linear optimization algorithms in terms of the computational complexity. While, the proposal can adapt to the variation of both sensors and targets, which makes it more suitable to the dynamic battle environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083623 (2018) https://doi.org/10.1117/12.2514031
Recent years, self-driving technology attracts people’s attention. On self-driving, the most important thing is safety. In order to keep driving safe, driver needs to dodge obstacles on a road safely. So computer should control the car properly. This study focus on avoidance based on “human sense”. People preferentially avoid children or elderly people. So human have some priority of obstacle to dodge. But it’s very ambiguous information. “Fuzzy logic” is mathematical logic that can deal with vague information. This logic is a useful to let computer reproduce human sense. To reproduce more faithfully, I used Genetic Algorithm on optimizing the shape of graph of Fuzzy logic’s function. Using these method, I made risk calculator. The calculator can give us the risk level (0~10) of each obstacles from two materials: “distance from drivers” and “priority of avoidance”. Then, I tried “vision-based self-driving simulation” on 3D environment using the calculator. By controlling the car based on that risk level, computer can drive a car more humanly. It turns out that Fuzzy logic and GA are good tool to simulate human-like driving.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
H. X. Deng, G. H. Wei, J. L. Li, L. Ge, X. Lai, Q. Huang
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083624 (2018) https://doi.org/10.1117/12.2514042
In view of the distortion and hysteresis problem in surface overflow monitoring method, measuring the downhole near-bit parameters directly to research the overflow pre-warning model is an effective way to solve the problem. However, there are few theories about the intelligent overflow early-warning model on downhole parameters measurement currently. In recent years, the rapidly developing artificial intelligence technology has brought opportunities for the solution of the problem. In this paper, based on the study of overflow parameters and their characterization, an overflow intelligent early-warning model based on a layered fuzzy expert system is proposed, in which the drilling experts’ knowledge and experiences are used and overflow intelligent characterization combined to realize drilling overflow intelligent early-warning. The simulation experiment platform is used to verify the drilling overflow intelligent early warning system, which shows that the system can perform early-warning quickly and accurately, and has a good application prospect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083625 (2018) https://doi.org/10.1117/12.2513948
In this paper, we developed an animated agent to serves as a commentator in small-scale cyber security competitions. The overarching aim of the system is to educate and engage novice spectators and serve as practicum for novice security analysts. In much the same way that sports commentators describe plays and moves of the game, the virtual commentator will play the same role in cyber challenge events. By providing an environment to educate those without prior expertise, we make the knowledge more accessible via system cues and interface design. To build a system to behave like its human counterpart, we observed and identified specific verbal and nonverbal behaviors of different professional commentator working with spectators. Based on the observed behaviors, we created tags to embed in text-based agent scripts. This allows the virtual commentator to interact with audiences with facial expressions and the corresponding hand gestures. Preliminary studies conducted at Bowie State University, we see that with the virtual commentator imbued with the same capabilities as its human counterparts, it has the same effect of educating novice level spectators to the cyber security dangers they may face in their daily lives. In addition, there is the added benefit of this system raising awareness in cyber security field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083626 (2018) https://doi.org/10.1117/12.2514036
The thesis develops a network traffic predication application. The application, built on the R language oriented distributed stream processing systems described above, uses JDSU micro probe systems and nProbe to collect network traffic data, make predication on ARIMA model, and provide the basic data visualization function. Compared to similar systems, our network predication platform, has higher scalability and serviceability thanks to the distributed stream processing system, and better development efficiency with the help of R and CRAN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083627 (2018) https://doi.org/10.1117/12.2514261
The chanting sound (Chinese: song du yin sheng) played an important part of daily practice of monks in temples. We collected the chanting sound of live recording, and used selected segments of the sound as experimental signals. Before the experiment, we identified emotional words through semantic surveys. The Subjective auditory perception experiment was performed to evaluate emotional words according to series category method. Experimental data was analyzed by factor extraction. Based on the factor extraction, three main emotional components: quietness, religion and drowsiness were extracted according to the characteristics of the chanting sound. This had a certain meaning for emotion recognition, annotation and music recommendation of this kind of sound.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gaopeng Sun, Yanhua Shi, Hui Liu, Yichuan Jiang, Pan Lin, Junfeng Gao, Ruimin Wang, Yue Leng, Yuankui Yang, et al.
Proceedings Volume 2018 International Conference on Image and Video Processing, and Artificial Intelligence, 1083628 (2018) https://doi.org/10.1117/12.2514420
Although the canonical correlation analysis (CCA) algorithm has been applied successfully to steady-state visual evoked potential (SSVEP) detection, artifacts and unrelated brain activities may affect the performance of SSVEP-based brain– computer interface systems. Extracting the characteristic frequency sub-bands is an effective method of enhancing the signal-to-noise-ratio of SSVEP signals. The sinusoid-assisted multivariate extension of empirical mode decomposition (SA-MEMD) algorithm is a powerful method of spectral decomposition. In this study, we propose an SA-MEMD-based CCA method for SSVEP detection. Experimental results suggest that the SA-MEMD-based CCA algorithm is a useful method for the detection of typical SSVEP signals. The SA-MEMD-based CCA algorithm reached a classification accuracy of 88.3% for a window of 4 s and outperformed the standard CCA algorithm by 2.8%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.