Explanations are generated to accompany a model decision indicating features of the input data that were the most relevant towards the model decision. Explanations are important not only for understanding the decisions of deep neural network, which in spite of their their huge success in multiple domains operate largely as abstract black boxes, but also for other model classes such as gradient boosted decision trees. In this work, we propose methods, using both Bayesian and Non-Bayesian approaches to augment explanations with uncertainty scores. We believe that uncertainty augmented saliency maps can help in better calibration of the trust between human analyst and the machine learning models.
Face images are an important source of information for biometric recognition and intelligence gathering. While face
recognition research has made significant progress over the past few decades, recognition of faces at extended ranges is
still highly problematic. Recognition of a low-resolution probe face image from a gallery database, typically containing
high resolution facial imagery, leads to lowered performance than traditional face recognition techniques. Learning and
super-resolution based approaches have been proposed to improve face recognition at extended ranges; however, the
resolution threshold for face recognition has not been examined extensively. Establishing a threshold resolution
corresponding to the theoretical and empirical limitations of low resolution face recognition will allow algorithm
developers to avoid focusing on improving performance where no distinguishable information for identification exists in
the acquired signal. This work examines the intrinsic dimensionality of facial signatures and seeks to estimate a lower
bound for the size of a face image required for recognition. We estimate a lower bound for face signatures in the visible
and thermal spectra by conducting eigenanalysis using principal component analysis (PCA) (i.e., eigenfaces approach).
We seek to estimate the intrinsic dimensionality of facial signatures, in terms of reconstruction error, by maximizing the
amount of variance retained in the reconstructed dataset while minimizing the number of reconstruction components.
Extending on this approach, we also examine the identification error to estimate the dimensionality lower bound for low-resolution
to high-resolution (LR-to-HR) face recognition performance. Two multimodal face datasets are used for this
study to evaluate the effects of dataset size and diversity on the underlying intrinsic dimensionality: 1) 50-subject
NVESD face dataset (containing visible, MWIR, LWIR face imagery) and 2) 119-subject WSRI face dataset (containing
visible and MWIR face imagery).
KEYWORDS: Surveillance, Video, Video surveillance, Data modeling, Binary data, Video compression, Process modeling, Machine vision, Computer vision technology, Statistical analysis
Surveillance cameras have become ubiquitous in society, used to monitor areas such as residential blocks, city streets, university campuses, industrial sites, and government installations. Surveillance footage, especially of public areas, is frequently streamed online in real time, providing a wealth of data for computer vision research. The focus of this work is on detection of anomalous patterns in surveillance video data recorded over a period of months to years. We propose an anomaly detection technique based on support vector data description (SVDD) to detect anomalous patterns in video footage of a university campus scene recorded over a period of months. SVDD is a kernel-based anomaly detection technique which models the normalcy data in a high dimensional feature space using an optimal enclosing hypersphere – samples that lie outside this boundary are detected as outliers or anomalies. Two types of anomaly detection are conducted in this work: track-level analysis to determine individual tracks that are anomalous, and day-level analysis using aggregate scene level feature maps to determine which days exhibit anomalous activity. Experimentation and evaluation is conducted using a scene from the Global Webcam Archive.
Recognizing faces acquired in the thermal spectrum from a gallery of visible face images is a desired capability for the
military and homeland security, especially for nighttime surveillance and intelligence gathering. However, thermal-tovisible
face recognition is a highly challenging problem, due to the large modality gap between thermal and visible
imaging. In this paper, we propose a thermal-to-visible face recognition approach based on multiple kernel learning
(MKL) with support vector machines (SVMs). We first subdivide the face into non-overlapping spatial regions or
blocks using a method based on coalitional game theory. For comparison purposes, we also investigate uniform spatial
subdivisions. Following this subdivision, histogram of oriented gradients (HOG) features are extracted from each block
and utilized to compute a kernel for each region. We apply sparse multiple kernel learning (SMKL), which is a MKLbased
approach that learns a set of sparse kernel weights, as well as the decision function of a one-vs-all SVM classifier
for each of the subjects in the gallery. We also apply equal kernel weights (non-sparse) and obtain one-vs-all SVM
models for the same subjects in the gallery. Only visible images of each subject are used for MKL training, while
thermal images are used as probe images during testing. With subdivision generated by game theory, we achieved
Rank-1 identification rate of 50.7% for SMKL and 93.6% for equal kernel weighting using a multimodal dataset of 65
subjects. With uniform subdivisions, we achieved a Rank-1 identification rate of 88.3% for SMKL, but 92.7% for equal
kernel weighting.
Correlative interferometric imaging from sensor arrays relies on reconstructing source intensity by using the cross-correlation across near-field or far-field measurements from multiple sensor elements. Often the reconstruction problem is ill-posed resulting in unrealistic reconstructions of signals and images. This paper examines the consequences of using extremal entropy metrics in the reconstruction. These range from inducing sparsity to the closer conformance of the reconstruction boundaries to the support of the actual signal source. Situations involving far-field interferometric imaging of extended sources are considered and experimental results are provided.
In this paper, we present our implementation of a cascaded Histogram of Oriented Gradient (HOG) based pedestrian detector. Most human detection algorithms can be implemented as a cascade of classifiers to decrease computation time while maintaining approximately the same performance. Although cascaded versions of Dalal and Triggs's HOG detector already exist, we aim to provide a more detailed explanation of an implementation than is currently available. We also use Asymmetric Boosting instead of Adaboost to train the cascade stages. We show that this reduces the number of weak classifiers needed per stage. We present the results of our detector on the INRIA pedestrian detection dataset and compare them to Dalal and Triggs's results.
In this paper, an unsupervised pedestrian detection algorithm is proposed. An input image is first divided into overlapping detection windows in a sliding fashion and Histogram of Oriented Gradients (HOG) features are collected over each window using non-overlapping cells. A distance metric is used to determine the distance between histograms of corresponding cells in each detection window and the average pedestrian HOG template (determined a priori). These distances over a group of cells are concatenated to obtain the feature vector pertaining to a block of cells. The feature vectors over overlapping blocks of cells are concatenated to form the distance feature vector of a detection window. Each window provides a data sample and the data samples extracted from the whole image are then modeled as a normalcy class using Support Vector Data Description (SVDD). The benefit of using the state-of-the-art SVDD technique to model the normalcy class is that it can be controlled by setting an upper limit on the permissible outliers during the modeling process. Assuming that most of the image is covered by background, the outliers that are detected during the modeling of the normalcy class can be hypothesized as detection windows that contain pedestrians in them. The detections are obtained at different scales in order to account for the different sizes of pedestrians. The final pedestrian detections are generated by applying non-maximal suppression on all the detections at all scales. The system is tested on the INRIA pedestrian dataset and its performance analyzed with respect to accuracy and detection rate.
Airborne laser scanning light detection and ranging (LiDAR) systems are used for remote sensing topology and bathymetry. The most common data collection technique used in LiDAR systems employs a linear mode scanning. The resulting scanning data form a non-uniformly sampled 3D point cloud. To interpret and further process the 3D point cloud data, these raw data are usually converted to digital elevation models (DEMs). In order to obtain DEMs in a uniform and upsampled raster format, the elevation information from the available non-uniform 3D point cloud data are mapped onto the uniform grid points. After the mapping is done, the grid points with missing elevation information are lled by using interpolation techniques. In this paper, partial di erential equations (PDE) based approach is proposed to perform the interpolation and to upsample the 3D point cloud onto a uniform grid. Due to the desirable e ects of using higher order PDEs, smoothness is maintained over homogeneous regions, while sharp edge information in the scene well preserved. The proposed algorithm reduces the draping e ects near the edges of distinctive objects in the scene. Such annoying draping e ects are commonly associated with existing point cloud rendering algorithms. Simulation results are presented in this paper to illustrate the advantages of the proposed algorithm.
In this paper, a Support Vector Machine (SVM) based method to jointly exploit spectral and spatial information
from hyperspectral images to improve classication performance is presented. In order to optimally exploit this
joint information, we propose to use a novel idea of embedding a local distribution of input hyperspectral data
into the Reproducing Kernel Hilbert Spaces (RKHS). A Hilbert Space Embedding called mean map is utilized
to map a group of neighboring pixels of a hyperspectral image into the RKHS and then, calculate the empirical
mean of the mapped points in the RKHS. SVM based classication performed on the mean mapped points can
fully exploit the spectral information as well as ensure spatial continuity among neighboring pixels. The proposed
technique showed signicant improvement over the existing composite kernels on two hyperspectral image data
sets.
Non-O157:H7 Shiga toxin-producing Escherichia coli (STEC) strains such as O26, O45, O103, O111, O121 and O145
are recognized as serious outbreak to cause human illness due to their toxicity. A conventional microbiological method
for cell counting is laborious and needs long time for the results. Since optical detection method is promising for realtime,
in-situ foodborne pathogen detection, acousto-optical tunable filters (AOTF)-based hyperspectral microscopic
imaging (HMI) method has been developed for identifying pathogenic bacteria because of its capability to differentiate
both spatial and spectral characteristics of each bacterial cell from microcolony samples. Using the AOTF-based HMI
method, 89 contiguous spectral images could be acquired within approximately 30 seconds with 250 ms exposure time.
From this study, we have successfully developed the protocol for live-cell immobilization on glass slides to acquire
quality spectral images from STEC bacterial cells using the modified dry method. Among the contiguous spectral
imagery between 450 and 800 nm, the intensity of spectral images at 458, 498, 522, 546, 570, 586, 670 and 690 nm were
distinctive for STEC bacteria. With two different classification algorithms, Support Vector Machine (SVM) and Sparse
Kernel-based Ensemble Learning (SKEL), a STEC serotype O45 could be classified with 92% detection accuracy.
KEYWORDS: 3D modeling, Buildings, 3D image processing, Image segmentation, LIDAR, Video, Visualization, Information visualization, Data modeling, Imaging systems
In this paper, a semiautomated system for modeling 3D objects, especially buildings from aerial video, over a semi-urban scene is presented. First, the video frames are preprocessed to minimize the rotational effects of camera motion. The 3D translational coordinates of the sensor are used to stitch the video frames into nadir and stereo mosaics. The features extracted from the stereo mosaics, like elevation, edges and corners, visual entropy, and color information, are employed in a Bayesian framework to identify the 3D objects in the scene, such as buildings and trees. The initial 3D building models are further optimized by projecting them onto individual video frames. A novel method for setting the input parameters of vision algorithms required for feature extraction, using the data-driven probabilistic inference in Bayesian Networks, has been designed. This method automates the 3D object identification process and precludes the need for manual intervention. Improvements that can be used to increase the accuracy of 3D models when Lidar data is fused with aerial video during the object identification process are also discussed.
KEYWORDS: Hyperspectral imaging, Niobium, Detection and tracking algorithms, Target detection, Sensors, Data modeling, Digital imaging, Roads, Image classification, Binary data
In this paper, sparse kernel-based ensemble learning for hyperspectral anomaly detection is proposed. The
proposed technique is aimed to optimize an ensemble of kernel-based one class classifiers, such as Support Vector
Data Description (SVDD) classifiers, by estimating optimal sparse weights. In this method, hyperspectral
signatures are first randomly sub-sampled into a large number of spectral feature subspaces. An enclosing
hypersphere that defines the support of spectral data, corresponding to the normalcy/background data, in the
Reproducing Kernel Hilbert Space (RKHS) of each respective feature subspace is then estimated using regular
SVDD. The enclosing hypersphere basically represents the spectral characteristics of the background data in the
respective feature subspace. The joint hypersphere is learned by optimally combining the hyperspheres from the
individual RKHS, while imposing the l1 constraint on the combining weights. The joint hypersphere representing
the most optimal compact support of the local hyperspectral data in the joint feature subspaces is then used
to test each pixel in hyperspectral image data to determine if it belongs to the local background data or not.
The outliers are considered to be targets. The performance comparison between the proposed technique and the
regular SVDD is provided using the HYDICE hyperspectral images.
Recently, a SVM-based ensemble learning technique has been introduced by the authors for hyperspectral plume
detection/classification. The SVM-based ensemble learning consists of a number of SVM classifiers and the
decisions from these sub-classifiers are combined to generate a final ensemble decision. The SVM-based ensemble
technique first randomly selects spectral feature subspaces from the input data. Each individual classifier
then independently conducts its own learning within its corresponding spectral feature space. Each classifier
constitutes a weak classifier. These weak classifiers are combined to make an ensemble decision. The ensemble
learning technique provides better performance than the conventional single SVM in terms of error rate. Various
aggregating techniques like bagging, boosting, majority voting and weighted averaging were used to combine
the weak classifiers, of which majority voting was found to be most robust. Yet, the ensemble of SVMs is suboptimal.
Techniques that optimally weight the individual decisions from the sub-classifiers are strongly desirable
to improve ensemble learning performance. In the proposed work, a recently introduced kernel learning technique
called Multiple Kernel Learning (MKL) is used to optimally weight the kernel matrices of the sub-SVM classifiers.
MKL basically iteratively performs l2 optimization on the Euclidian norm of the normal vector of the separating
hyperplane between the classes (background and chemical plume) defined by the weighted kernel matrix followed
by gradient descent optimization on the l1 regularized weighting coefficients of the individual kernel matrices.
Due to l1 regularization on the weighting coefficients, the optimized weighting coefficients become sparse. The
proposed work utilizes the sparse weighting coefficients to combine decision results of the SVM-based ensemble
technique. A performance comparison between the aggregating techniques - MKL and majority voting as applied
to hyperspectral chemical plume detection is presented in the paper.
In this paper we present an algorithm to extract the Digital Elevation Map or Model (DEM) of a scene from a pair of
parallel-perspective stereo mosaics. Stereo mosaics are built from an aerial video of a scene using parallel ray
interpolation technique. In addition to the left and right stereo pair, we also build a mosaic along the nadir view to
distinguish between the apparent visible and occluded surfaces. This distinction helps us in fitting vertical planes to the
occluded surfaces (in nadir view) and developing a complete CAD model of each object (especially buildings in an urban
scene). Buildings and trees are two important classes of objects observed in urban scenes. So, the nadir mosaic is
segmented and building/tree regions (segments) are identified and separated from terrain. The edges and corners of
different surfaces of a building are identified. We match the control points along these edges to their corresponding
points in left and right mosaic. Using the disparity between the corresponding points in the mosaics, an elevation map is
developed at these points. Optimal surfaces are fit to each of the segments through the edge points. Parts of a DEM of a
scene like buildings extracted from a real airborne video are presented in this paper.
In this paper, we propose an improved triangulation method for building parallel-perspective stereo mosaics using ray
interpolation. The currently available fast PRISM (Parallel Ray Interpolation for Stereo Mosaicing) uses a constant
triangulation technique that does not take advantage of the spatial information. This works well when the inter-frame
displacement in the video is small. However, for large inter-frame displacements, hence large motion parallax, we
observe visual artifacts in the mosaics when the source triangles in the video frames are warped into destination triangles
in the mosaics using affine transformations. Our method uses the edge information present in the scene to ensure that
triangles do not cross over from one planar facet to the other. The video frames are segmented and the edges are obtained
from the boundaries of different segments. The motion over the entire region in each triangle is constant. Hence, this
method avoids unwanted warping of objects with large motion parallax in the scene and reduces the visual artifacts. The
performance of our proposed technique is demonstrated on a series of simple to complex video imagery collected by
utilizing cameras and airborne sensors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.