PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 10649 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fine-grained vehicle classification is the task of classifying make, model, and year of a vehicle. This is a very challenging task, because vehicles of different types but similar color and viewpoint can often look much more similar than vehicles of same type but differing color and viewpoint. Vehicle make, model, and year in combination with vehicle color - are of importance in several applications such as vehicle search, re-identification, tracking, and traffic analysis. In this work we investigate the suitability of several recent landmark convolutional neural network (CNN) architectures, which have shown top results on large scale image classification tasks, for the task of fine-grained classification of vehicles. We compare the performance of the networks VGG16, several ResNets, Inception architectures, the recent DenseNets, and MobileNet. For classification we use the Stanford Cars-196 dataset which features 196 different types of vehicles. We investigate several aspects of CNN training, such as data augmentation and training from scratch vs. fine-tuning. Importantly, we introduce no aspects in the architectures or training process which are specific to vehicle classification. Our final model achieves a state-of-the-art classification accuracy of 94.6% outperforming all related works, even approaches which are specifically tailored for the task, e.g. by including vehicle part detections.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Shor quantum factorization algorithm allows the factorization or large integers in logarithmic squared time whereas classical algorithms require an exponential time increase with the bit length of the number to be factored. The hardware implementation of the Shor algorithm would thus allow the factorization of the very large integers employed by commercial encryption methods. We propose some modifications of the algorithm by employing some simplification to the stage employing the quantum Fourier transform. The quantum Hadamard transform may be used to replace the quantum Fourier transform in certain cases. This would reduce the hardware complexity of implementation since phase rotation gates with only two states of 0 and π would be required.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image segmentation is one of the fundamental steps in computer vision. Separating targets from background clutter with high precision is a challenging operation for both humans and computers. Currently, segmenting objects from IR images is done by tedious manual work. The implementation of a Deep Neural Network (DNN) to perform precision segmentation of multi-band IR video images is presented. A customized pix2pix DNN with multiple layers of generative encoder/decoder and discriminator architecture is used in the IR image segmentation process. Real and synthetic images and ground truths are employed to train the DNN. Iterative training is performed to achieve optimum accuracy of segmentation using a minimal number of training data. Special training images are created to enhance the missing features and to increase the segmentation accuracy of the objects. Retraining strategies are developed to minimize the DNN training time. Single pixel accuracy has been achieved in IR target boundary segmentation using DNNs. The segmentation accuracy between the customized pix2pix DNN and simple thresholding, GraphCut, simple neural network and ResNet models are compared.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Flight data recorders (FDRs) play a critical role in determining root causes of military aviation mishaps. Some United States Air Force (USAF) aircraft record limited amounts of information during flight (e.g. T-1 Jayhawk), while others have no FDR on board the aircraft (B-52 Stratofortress). This study explores the use of image-based flight data recording to overcome a lack of available digitally-recorded FDR data. In this work, images of simulated cockpit gauges were unwrapped vertically, and 2-D cross-correlation was performed on each image of the unwrapped gauge and an template of the unwrapped gauge needle. Points of high correlation between the two images were used to locate the gauge needle, and interpolation and extrapolation were performed (based on known pixel locations of gauge tick marks) to quantify the value to which the gauge needle pointed. Results suggest that image-based flight data recording could provide key support to USAF mishap investigations when aircraft lack sufficient FDR data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many surveillance and security monitoring videos are long and of low quality. Moreover, reviewing and extracting anomaly events in the videos is a lengthy and manually intensive process. In this paper, we present two efficient anomaly detection algorithms based on saliency to detect anomalous events in low quality videos. The events’ start times and durations are saved in a video summary for later reviews. The video summary is very short. For example, we have summarized a 14-minute long video into a 16-second video summary. Extensive evaluations of the two algorithms clearly demonstrated the feasibility of these algorithms. A user friendly software tool has also been developed to help human operators review and confirm those events.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we will summarize a commercial grade software tool that can review video summaries and confirm the events in the summaries. The video summaries are generated by other tools. Our tool can handle thousands of videos and their summaries. The software can run in low cost PCs with a user friendly graphical user interface (GUI).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Osteoporosis is an age-based disease causing skeletal disorder. It is described by the little bone mass and weakening of the bone structure thereby resulting in the higher fracture risks. Early identification can help prevent the disease and successfully predict the fracture risks. Automated diagnosis of osteoporosis using X-ray image is a very challenging task because the radiographs from the healthy subjects and osteoporotic cases show a high grade of resemblance. This study presents an evaluation of osteoporosis identification using texture descriptor Local Binary Pattern (LBP) and Shift Local Binary Pattern (SLBP). In contrast with the conventional LBP, with the shifted LBP specific number of binary local codes are generated for each pixel place. The distinguishing ability of the texture descriptors is evaluated using ten-fold cross validation and leave-one out scheme using different machine learning techniques. The results prove the SLBP outperforms the traditional LBP for bone texture characterization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The assessment of osteoporotic subjects from X-ray images poses a significant challenge for pattern recognition and medical diagnostic applications. Textured images of bone micro-architecture of osteoporotic and healthy subjects exhibit high degree of similarity hence amplifying difficulty of classifying such textures. This research is focused on exploring different texture based methods to segregate osteoporotic from healthy controls. We enacted set of well evaluated preprocessing model to enhance the prospects of drawing a distinct line between two classes while exercising diverse texture analysis approaches including Grey Level Co-occurrence Matrix (GLCM), two-dimensional and one-dimensional Local Binary Patterns. Finally we propose a hybrid technique to attain an enhanced class distinction. Consequently experiments were conducted on two populations of osteoporotic patients and controls, with comparative analysis of the results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Every year, forest and wildland fires affect more than 350 million hectares worldwide resulting in important environmental, economic, and social losses. To efficiently fight against this major risk, specific actions are deployed. The efficiency of these actions is tightly linked to the knowledge of the phenomena and in improving the tools for detecting, predicting, and understanding fire propagation. An important step for vision-based fire analysis, is the detection of fire pixels. In this work, we propose Deep-Fire a deep convolutional neural network for fire pixels detection and fire segmentation. The proposed technique is tested on a database of wildland fires. The obtained results, show that the proposed architecture gives a very high performance for the segmentation of wildland and forest fire areas in outdoor non-structured scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object Recognition and Tracking are one of the key research areas in image processing and computer vision. This paper presents a novel technique which efficiently recognizes an object based on full boundary detection using affine scale invariant feature transform method (ASIFT). ASIFT is an improvement to SIFT algorithm as it provides invariance up to six parameters longitude and latitude wise. The six parameters are based on translation (2 parameters), rotation, camera axis orientation (2 parameters) and zoom. Key points commonly referred to as feature points are then obtained using the mentioned parameters which will recognize the object efficiently. Furthermore a region merging technique is used for object recognition and detection in the remote scene environment using ASIFT technique. A short pictorial comparison between SIFT and ASIFT will also be presented based on feature points calculation. After the recognition using ASIFT is performed, an algorithm will be presented for tracking of the recognized object using modified particle filter. The particle filter will use a proximal gradient (PG) approach for tracking of the recognized object in subsequent images. In case an object drastically varies its position w.r.t any of the six parameters mentioned above, ASIFT will again be called for object recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the problem of motion analysis performed from digital data captured by a network of motion sensors scattered over a field of interest where 3D+T motion analysis is performed. Motion analysis, as referred here for digital signals, proceeds through consecutive steps of detection, motion-oriented classification, parameter estimation and tracking. The scheme proposed in this paper is relevant to applications that can be found in medicine, earth science, surveillance and defense. The major challenges involved in the feasibility of this network are as follows: signal sampling from a sensor network, photodetection and optimal strategy for cope with energy harvesting and wireless communication capabilities. The motion sensors implement wireless communications to some gateway or data sink that relays the collected information to a remote central station. Motion sensors are assigned to catch motion with high sensitive sparsely distributed sensors and to build the trajectories. Other sensors can be added to the system for specific purpose like video camera. Video cameras are assigned to catch high resolution images or videos with densely and regularly distributed sensors to perform pattern classification and recognition. The central station implements the motion analysis algorithm. Motion analysis is performed as a dual control referring to both an accurate model based on theoretical mechanics and an adaptive learning system based on a supervised neural network. This paper describes the effective components of the system which are namely the sensor layer, the telecommunication layer, and the application layer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Surveillance images downlinked from unmanned air vehicles (UAVs) may have corrupted pixels due to channel interferences from the adversary’s jammer. Moreover, the images may be deliberately downsampled in order to conserve the scarce bandwidth in UAVs. As a result, the automatic target recognition (ATR) performance may degrade significantly because of poor image quality due to corrupted and missing pixels. In this paper, we present some preliminary results of a novel approach to automatic target recognition based on corrupted images. First, we present a new matrix completion algorithm to reconstruct missing pixels in electro-optical (EO) images. Second, we extensively evaluated our algorithm using many EO images with different missing rates. It was observed that recovering performance in terms of peak signal-to-noise ratio (PSNR) is very good. Third, we compared with a state-of-the-art algorithm and found that our performance is superior. Finally, experiments using an ATR algorithm showed that the target detection performance (precision and recall) has been improved after applying our algorithm, as compared to those results generated by using interpolated images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper summarizes a preliminary study on anomaly detection in low quality traffic monitoring videos. An optical flow based anomaly detection algorithm is proposed to detect anomalies in videos. The algorithm is efficient. Preliminary experiments demonstrate that the proposed algorithm is feasible and has good performance. It should be noted that the anomaly detection algorithm can be used to generate video summaries where the start and end times of anomalies are recorded. In addition, we also developed a user friendly tool that can help operators review video summaries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion detection and estimation is an important task in several applications of image analysis, including scenarios such as satellite cross-cueing or detecting small shifts in terrain. One widely employed technique for estimating the amount of motion between two images is Normalized Cross-Correlation (NCC), although its computational cost is often prohibitively high for time-sensitive applications. In this work, a previously developed algorithm that uses sum tables to calculate the NCC efficiently for 1-D ultrasound traces is adapted to work for 2-D radar images. The performance of the sum tables algorithm is quantified both theoretically as well as with Synthetic Aperture Radar (SAR) data from the RADARSAT-2 satellite, and is shown to provide time savings of 97% or more compared to the direct method. The algorithm described herein could be used to provide more timely intelligence in situations where it is desirable to detect and estimate the motion of targets using remote sensing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wide Area Motion Imagery (WAMI) systems used on surveillance aircraft may suffer from system calibration errors associated with frequent re-installation. These geo-coding errors corrupt the quality of mapping of the tracked objects, ’movers’, from the image frame into world reference frame. In this study, an automated system for calibration of the imagery captured with six-camera WAMI array has been developed. The automatic calibration was achieved by a system of several multi-scale feature classifiers adaptively applied to an image captured by the camera array dependent on the feature availability and classifier accuracy. The feature extraction and association modules were designed to be operating interchangeably on a frame from any given camera. The choice of the module was performed automatically using a decision tree designed as a part of the system architecture. Calculation of the per-frame corrections to mitigate the localisation error of the movers was performed by associating the features detected in each individual camera and features extracted from available satellite imagery used as a datum. The effects of the distance to the feature and the choice of the feature extraction module on the mover localisation accuracy have been evaluated on 300 frames (6 images each) captured with the WAMI array. Significant reduction in the magnitude of the geo-coding error (from 15.77-36.54 m to 5.42-8.55 m on average) was achieved and can be seen in improved alignment of the features projected into the frame as well as the reliable mapping of the mover trajectories across frames. Unlike similar systems, focusing on post-processing, the WAMI calibration system presented in the paper was designed for continuous parameter estimation in real-time
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object trackers for full-motion-video (FMV) need to handle object occlusions (partial and short-term full), rotation, scaling, illumination changes, complex background variations, and perspective variations. Unlike traditional deep learning trackers that require extensive training time, the proposed Progressively Expanded Neural Network (PENNet) tracker methodology will utilize a modified variant of the extreme learning machine, which encompasses polynomial expansion and state preserving methodologies. This reduces the training time significantly for online training of the object. The proposed algorithm is evaluated on the DAPRA Video Verification of Identity (VIVID) dataset, wherein the selected highvalue-targets (HVTs) are vehicles.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present a novel approach for road sign identification and geolocation based on Joint Transform Correlator “JTC” and VIAPIX module. The proposed method is divided into three parts: identification, gathering and geolocation. The first part permits to detect and identify road signs on images acquired by the VIAPIX module [1] developed by our company ACTRIS [2]. To do so, we are based on our own method cited in [3] for road sign identification. The second part of our proposed approach consists in gathering the identified road sign by using the JTC technique [4]. Since the VIAPIX® module provides images at an interval of one image per meter, we identify each road sign by finding the number of images where this road sign has been recognized while computing thereby on each of these images its corresponding pixel coordinates. Finally, each road sign is geolocated using its pixel coordinates on several images. At this stage, we are based on the axial stereovision method [5]. Indeed, relying on the pixel coordinates and the distance between different images, we compute the 3D coordinates of each road sign. Thus, GPS coordinates can be then found using the GPS position of the vehicle basing on Vincenty formulae [6].
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Osteoporosis is an age-based disease causing skeletal disorder. It is described by the decreased bone mass and weakening of the bone structure thereby resulting in the higher fracture risks. Early identification can help prevent the disease and successfully predict the fracture risks. Automated diagnosis of osteoporosis using X-ray image is a very challenging task because the radiographs from the healthy patients and osteoporotic cases show a great resemblance. The texture representation is done using two type of methods: appearance based methods and feature based methods. This study explains two systems, one based on PCA and one based on LDA. The system contains two stages, first one is PCA or LDA based feature computation and the second is the classification stage. The classification has been done using classifiers i.e. kNN, NB and SVM. The discriminating power of the texture descriptors is assessed using ten-folded cross-validation scheme using different machine learning techniques. The scope of the study is to support therapists in osteoporosis prediction, avoiding unnecessary further testing with bone.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Biometric identification method is used to assess the characteristics of human behavior by identifying their different parameters. Gait recognition is an active biometric research topic which has many security and surveillance applications, and also can help in early diagnosis of different medical conditions such as Parkinson disease. It has been concluded from Psychological studies that people have slight but substantial capability to distinguish individuals by their gait characteristics. There are different techniques to perform gait recognition, and can be achieved by analyzing data from either imagery or radar sensors. This particular research project however will involve correct identification of a person from person’s gait by using images/video taken at different distances, angle of views and walking speeds of the person. CASIA Gait Recognition Dataset used in this project contains gait energy images. These images are extracted from images frame sequence of walking subject with camera positioned relative to subject, with increments of 18 degrees. Lower part of GEI is used in feature extraction, as it has most dynamic information. Gait signatures of a person created from gait energy images will be used to train artificial neural networks model to correctly classify the subject. Two Back propagation algorithms are compared in terms of performance. Cross-entropy and ROC curves are used as performance criteria for both training algorithms. Our system performs very well in terms of minimization of cross-entropy and classification rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, developing Surveillance Systems (SS) for security has been one of the most active research fields in most applications. These systems used to adjust, enhance, and improve the security. One of these systems, face recognition system plays an efficient and very important tool for several applications despite the existence of different surveillance systems, like hand geometry, iris scan, as well as fingerprints. This is because it is natural, non-intrusive, and inexpensive. For the past two decades, various face recognition methods have been proposed to reduce the amount of calculation and improve the recognition rate. These proposed methods could be categorized into three significant categories: Local feature approaches, Subspace learning approaches, and Correlation filters approaches. In this paper, we discuss and compare some common face recognition algorithms. Our objective of this work is to demonstrate the effectiveness and feasibility of the best methods for face recognition in terms of design, implementation, and application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object recognition and semantic segmentation have been the two most common problems of traditional scene understanding in the computer vision domain. Major breakthroughs were reported in the last few years because of the increased utilization of deep learning, which offer a convincing alternative by learning the problem specific features on their own. In this paper, a summary of the frequently used framework – convolutional neural networks (CNN) is discussed. Accordingly a categorization scheme has been proposed to analyze the deep networks developed for image segmentation. Under this scheme, thirteen methods from the literature have been reviewed which are classified on the basis on how they perform segmentation operation i.e. semantic segmentation, instance segmentation and hybrid approaches. These method were reviewed from different aspects like their category, the novelty in the architecture of the method, and their special features in contrast with the traditional approaches. Latest review and analysis of these segmentation approaches, which provided outstanding results for image segmentation compared to the ordinary system, reveals that deep learning is increasingly becoming an important part of image segmentation and improvement in deep learning algorithms, which could resolve computer vision problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Modified Forward Backward Linear Prediction, MFBLP, is an effective method for data dimensionality reduction and combined with eigen-vector and eigen-value techniques significant improvements in signal isolation have been shown and discussed in previous notes of this technique. In the present work, a Stochastic Gradient Descent technique is utilized to limit the dimensionality reduction of the MFBLP and the results of this technique is compared in relation to an application of the eigen-vector eigen-value technique to limit the dimensionality reduction of the MFBLP. By using a correlation metric we are able to discuss the measure of goodness of the new implementation of the MFBLP, discuss its potential, and some of its applications in this analysis. The processing approach is for active sensor systems and discussed for comparison.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In machine learning, a good predictive model is the one that generalizes well over future unseen data. In general, this problem is ill-posed. To mitigate this problem, a predictive model can be constructed by simultaneously minimizing an empirical error over training samples and controlling the complexity of the model. Thus, the regularized least squares (RLS) is developed. RLS requires matrix inversion, which is expensive. And as such, its “big data” applications can be adversely affected. To address this issue, we have developed an efficient machine learning algorithm for pattern recognition that approximates RLS. The algorithm does not require matrix inversion, and achieves competitive performance against the RLS algorithm. It has been shown mathematically that RLS is a sound learning algorithm. Therefore, a definitive statement about the relationship between the new algorithm and RLS will lay a solid theoretical foundation for the new algorithm. A recent study shows that the spectral norm of the kernel matrix in RLS is tightly bounded above by the size of the matrix. This spectral norm becomes a constant when the training samples have independent centered sub-Gaussian coordinators. For example, typical sub-Gaussian random vectors such as the standard normal and Bernoulli satisfy this assumption. Basically, each sample is drawn from a product distribution formed from some centered univariate sub-Gaussian distributions. These new results allow us to establish a bound between the new algorithm and RLS in finite samples and show that the new algorithm converges to RLS in the limit. Experimental results are provided that validate the theoretical analysis and demonstrate the new algorithm to be very promising in solving “big data” classification problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Firefighters suffer a variety of life-threatening risks, including line-of-duty deaths, injuries, and exposures to hazardous substances. Support for reducing these risks is important. We built a partially occluded object reconstruction method on augmented reality glasses for first responders. We used a deep learning based on conditional generative adversarial networks to train associations between the various images of flammable and hazardous objects and their partially occluded counterparts. Our system then reconstructed an image of a new flammable object. Finally, the reconstructed image was superimposed on the input image to provide "transparency". The system imitates human learning about the laws of physics through experience by learning the shape of flammable objects and the flame characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the major challenges in deep learning is retrieving sufficiently large labeled training datasets, which can become expensive and time consuming to collect. A unique approach to training segmentation is to use Deep Neural Network (DNN) models with a minimal amount of initial labeled training samples. The procedure involves creating synthetic data and using image registration to calculate affine transformations to apply to the synthetic data. The method takes a small dataset and generates a highquality augmented reality synthetic dataset with strong variance while maintaining consistency with real cases. Results illustrate segmentation improvements in various target features and increased average target confidence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditionally, homodyne and heterodyne detection is a combination of a frequency mixer followed by a low-pass filter. Mixing two signals of frequencies f1 and f2, generates sum and difference frequencies signals and their integer multiples. Both multiplication heterodyne and phase sensitive detection have been demonstrated optically by using photorefractive, four-wave mixing [FWM). The multiplicative characteristic of FWM was used for mixing, and the response time of the photorefractive medium is used for low-pass filtering. If one of the input beams is both spatially and temporally modulated using a wobbling rotating mirror, depending on which mode is heterodyned, one can generate an orthogonal set of Bessel band-pass filters. This scheme can be integrated part within parallel data acquisition systems for applications involved nondestructive testing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real-time holography has been used in numerous analog computing applications; most important was in applications of real-time holography in four-wave-mixing for performing correlation and convolution. Correlation algorithms/devices are one of the important areas of pattern recognition. In controlling the beam ratio in four-wave-mixing for both correlation and convolution it is possible to convert the matched filtering to inverse filtering correlation and convolution to deconvolution. We demonstrate that it possible to optically differentiate images simply by deconvolving images with a step function’s Fourier Transforms. A perspective of image differentiation and its limitation, application with optical differential correlation will be discussed in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Under development by the Joint Collaborative Team on Video Coding (JCT-VC), the recent standardization proposal beyond HEVC known as POST-HEVC (or Future Video Coding FVC) proposition was designated to improve the coding efficiency compared to the HEVC. In order to respond to users’ requests for high quality videos in 4k and 8k, this standard is planned to cover a wide range of multimedia applications. When compared with its predecessor, the postHEVC standard should be capable of providing a bit rate reduction of approximately 30% at the same subjective quality given by HEVC. However, this efficiency comes with additional complexity which shall be taken into consideration. In fact, FVC encoders should be capable of trading off complexity and coding efficiency in order to meet real-time requirements or non-real-time requirements when an important coding efficiency is achieved. More specifically, the increased complexity cognized per the transform module should be reconsidered. This complexity can be seen on the introduction of a new approach called Adaptive Multiple Transform (AMT) involving five different transformation types from DCT/DST family for sizes ranging from 4x4 to 128x128. This can be a real issue since the majority of image and video transmission and processing applications are subject to real-time constraints. Hence, to meet these requirements, a large amount of effort has been devoted to eliminate multiplication operations in order to insure low-complexity transform. In this context, the approximation technique can provide eloquent estimations at low-complexity requirements. The contributions of this method can be mainly seen on the reduction of the device utilization and power consumption in addition to lower computational complexity. This gain comes from the elimination of multiplications which require a great number of logical resources. Furthermore, it has been demonstrated in the literature that some approximation transforms can decrease dramatically the hardware resources with slight degradation of the video quality. Accordingly, the approximation technique seems to be an efficient solution offering an adequate compromise between precision and complexity. This work benefits from the DCT-II approximations evoked in previous works in order to minimize the computation time of the transform module as well as its complexity. This idea comes after a statistical analysis done on 4k videos for different quantization parameters which shows that the DCT-II utilization can reach 60% of the total of possible transformation types. In the last version of the paper, we will provide extensive statistical studies on the compressed bitstream. Then, we will use that to detect bottlenecks of the transformations in order to specify the computation elements that should be optimized and approximate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a remote scene environment consisting of multiple objects and miscellaneous scenarios, detecting an object of interest is a troublesome task especially while tracking the object over successive frames. Numerous methods have been proposed over the years for efficient detection of object of interest in a remote scene environment while in he meanwhile discarding all those which aren’t of interest and thus considered as noise. It is still one of the most actively researched areas in the field of image processing and computer vision. In this paper, a method is proposed which will not only detect a fixed shape object in a remote scene environment but it will also track it over successive frames. However, an additional methodology is also proposed which will detect the object in case of change of viewing angles e.g. scenario’s like rotation of object, zooming etc. First, Scale Invariant Feature Transform (SIFT) will be presented which will provide invariance up to four different parameters i.e. rotation, translation and zoom. In the second phase, ASIFT will be used which will provide invariance up to six different parameters i.e. translation, rotation, zoom and camera axis orientations. After both algorithms are presented, a detailed comparison between both is presented. Detection of object is performed with the help of both SIFT and ASIFT and then comparison is made based on feature points. Finally, Tracking is performed based on Proximal Gradient Particle filters which will further strengthen the comparison between SIFT and ASIFT once the object that needs to be tracked changes its course of motion or zoom. Experimental results will show which one of the two filters is more efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Texture information has shown a significant contribution to pattern recognition in hyperspectral image (HSI) analysis. In this paper, a multi-component based the volumetric directional pattern (MC-VDP) is proposed for HSI classification. The original VDP operator extracts a three-dimensional texture feature from three consecutive bands by applying eight directional Kirsch filters to the raw intensity values. However, the local sign and local magnitude components, that are generated by a local difference sign-magnitude transform, are not incorporated before Kirsch masking. In this work, we first compute the local sign and local magnitude components followed by VDP operator and then combine them with the original VDP feature to form MC-VDP. By analyzing the local sign and local magnitude components, two volumetric texture features are obtained, namely VDP-Sign (VDP-S) and VDP-Magnitude (VDP-M). Thus MC-VDP operator is constituted of VDP-S, VDP-M, and the original VDP features. In details, VDP-S and VDP-M preserve additional discriminant information to describe the volumetric local structures in HSI, and they can be readily fused since their scheme are constructed in the same fashion. From experimental results, it is observed that a fusion of VDP-S, VDP-M, and the original VDP coded maps provides more discriminant information and thus better classification accuracy compared to the other popular spatial feature extraction methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Today, communications security, i.e. the discipline of preventing unauthorized interceptors from accessing telecommunications in an intelligible form, while still delivering content to the intended recipients, is a main issue in our modern society especially. In this paper, attention is drawn to the importance and relevance of optical correlation techniques for detection and tracking people. In order to be efficient, these techniques need pre- or post-processing steps to take into account the environmental conditions. The aim of this work is to improve the performance of the optical correlation method, based on a new decision process in order to reduce the false detection rate. To realize this, we propose a method using a VanderLugt correlator with a phase-only filter for face recognition using two criteria for decision making based on the values of the peak-to-correlation energy and the energy distribution in different parts of the correlation plane. In the three-step algorithm, the first stage consists by dividing the correlation plane into nine equal sub-planes. In the second stage the energy of each sub-plane is computed, while in the last stage the classification criterion is realized and the recognition rate is calculated. Numerous tests were performed using the Pointing Head Pose Image Database. They show the effectiveness of the method in terms of face recognition detection rate without pre-processing phase and with 0% false detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Naïve Bayesian belief network modeling is applied to direct numerically simulated imagery of oscillatory sedimentladen flow to illustrate the feasibility of creating a system model which captures the statistical interrelationship of the surface layer sediment concentration, pressure, and vertical velocity eddy scales with the sub-surface Reynolds stress. From a prognostic reasoning viewpoint, preliminary model results suggest that large sediment concentration eddy scales may result from the application of large positive Reynolds stress. However, from a diagnostic reasoning viewpoint, initial results suggest that robustly inferring sub-surface boundary layer stress from surface sediment concentration eddy scales may be a difficult task. The model formulism used allows for the ability to statistically characterize flow structure at depth from observations taken across a surface boundary layer, making the results relevant to image analysis at the airsea interfacial boundary layer in large-scale coastal and riverine systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synthetic aperture radar (SAR) benefits from persistent imaging capabilities that are not reliant on factors such as weather or time of day. One area that may benefit from readily available imaging capabilities is road damage detection and assessment occurring from disasters such as earthquakes, sinkholes, or mudslides. This work investigates the performance of a pre-screener for an automatic detection system used to identify locations and quantify the severity of road damage present in SAR imagery. The proposed pre-screener is comprised of two components: advanced image processing and classification. Image processing is used to condition the data, removing non-pertinent information from the imagery which helps the classifier achieve better performance. Specifically, we utilize shearlets; these are powerful filters that capture anisotropic features with good localization and high directional sensitivity. Classification is achieved through the use of a convolutional neural network, and performance is reported as classification accuracy. Experiments are conducted on satellite SAR imagery. Specifically, we investigate Sentinel-1 imagery containing both damaged and non-damaged roads.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tracking a moving object in a video sequence is a critical problem in wide area surveillance system applications. Our proposed system provides high-resolution motion imagery surveillance with a low frame rate on a city-sized region within which multiple moving objects are tracked simultaneously in real time. The system faces multiple challenges, including significant camera motion, strong parallax, tracking many moving objects with few pixels for each target, single-channel data, and low video frame rate. In this work, we propose a new method for parallax rectification and stabilization using a wavelet based scale-invariant feature transform (SIFT) flow technique. The results were compared with the existing state of the art methods. Various challenges associated with the detection and tracking of multiple objects in wide area surveillance systems are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we benchmark five state-of-the-art trackers on aerial platform videos: Multi-domain Convolutional Neural Network (MDNET) tracker, which was the winner of the VOT2015 tracking challenge, the Fully Convolutional Neural network Tracker (FCNT), the Spatially Regularized Correlation Filter (SRDCF) tracker, the Continuous Convolution Operator Tracker (CCOT) tracker, which was the winner of the VOT2016 challenge, and the Tree structure Convolutional Neural Network (TCNN) tracker. We assess performance in terms of both tracking accuracy and processing speed based on two sets of videos: a subset of the OTB dataset where the cameras are located at a high vantage point and a new dataset of aerial videos captured by a moving platform. Our results indicate that these trackers performed as expected for the videos in the OTB subset, however, tracker performance degraded significantly in aerial videos due to target size, camera motion and target occlusions. The CCOT tracker yielded the best overall performance in terms of accuracy, while the SRDCF tracker was the fastest.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.