PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12096, including the Title Page, Copyright information, Table of Contents, and Conference Committee listings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nonparametric Spatio-temporal Activity Learning from Overhead Imagery
Analysis of imagery derived from low-earth orbiting satellites has a long history going back into the midst of Cold War era. For a long time, these imagery have been provided by military satellites and mainly used by the Department of Defense (dod) and Intelligence Community (ic) analysts. Since the mid of 1990s, the international constellation of commercial satellites has been growing with increasing temporal and spatial resolutions such as Maxar constellation currently provides nearly four million square kilometers per day translating into a staggering 100TB of imagery every day. These satellites thus enable tremendous opportunities for various government and commercial tasks. In this paper, we present our software framework combining state-of-the-art object detection and change identification algorithms with statistical learning techniques to detect various objects-of-interest (permanent- and semipermanent-structures and vehicles) and learn their behaviors. Our approach is applicable for detecting both macro- and micro- scale changes by turning vast amount of imagery collected by commercial satellites into information and information into actionable insight.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aided target detection in infrared data has proven an important area of investigation for both military and civilian applications. While target detection at the object or pixel-level has been explored extensively, existing approaches require precisely-annotated data which is often expensive or difficult to obtain. Leveraging advancements in weakly supervised semantic segmentation, this paper explores the feasibility of learning a pixel-level classification scheme given only image-level label information. Specifically, we investigate the use of class activation maps to inform feature selection for binary, pixel-level classification tasks. Results are given on four infrared aided target recognition datasets of varying difficulty. Results are quantitatively evaluated using common approaches in the literature.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, the performance of online and offline tracking algorithms, which are frequently used in the literature, were compared on the defined datasets. Therefore, six different datasets are prepared. The datasets consist of consecutive frames. In each dataset, target has different motion characteristics and the background types are different from each other. In addition, a total of six well known algorithms were used for comparison. These are KCF, MOSSE, CSRT, TLD, Go Turn and Siamese tracking algorithms. In conclusion, especially in cases where the target is small and the SNR value is low, the highest performance is obtained with the KCF algorithm. On the other hand, when the target is big and the SNR value is high, it is observed that the Siamese algorithm better handle changes in the target shape. In this context, considering the real scenarios, it may be possible to use the algorithms in a hybrid way to get the better performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Evaluating tracking model performance is a complicated task, particularly for non-contiguous, multi-object trackers that are crucial in defense applications. While there are various excellent tracking benchmarks available, this work expands them to quantify the performance of long-term, non-contiguous, multi-object and detection model assisted trackers. We propose a suite of MONCE (Multi-Object Non-Contiguous Entities) image tracking metrics that provide both objective tracking model performance benchmarks as well as diagnostic insight for driving tracking model development in the form of Expected Average Overlap, Short/Long Term Re-Identification, Tracking Recall, Tracking Precision, Longevity, Localization and Absence Prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning models are pervasive for a multitude of tasks, but the complexity of these models can limit interpretation and inhibit trust in their estimates of confidence. For the classification task, we investigate the induced geometric relationships between the class conditioned data distributions with the deep learning models’ output weight vectors. We propose a simple statistic, which we call Angular Margin, to characterize the “confidence” of the model given a new input. We compare and contrast our statistic to Angular Visual Hardness and Softmax outputs. We demonstrate that Angular Margin provides a superior statistic for detecting minimum-perturbation adversarial attacks and/or misclassified images than standard Softmax predictions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Common Data Format (CDF) was established to reconcile issues related to the use of various data formats and storage schemes throughout the US Army, C5ISR Research and Technology Integration (RTI) directorate. This paper describes the CDF and its usage for streamlining data sharing for Aided Targeting Recognition (AiTR) consumption. The CDF is based on the well-established Hierarchical Data Format 5 (HDF5). The CDF structure can contain collection imagery, the corresponding frame-synchronous and asynchronous metadata, and the related labeling information in a single file. The CDF is specifically designed to simplify data sharing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Person re-identification is a critical component to target identification and tracking in perception systems, particularly when long-term target tracking is required and kinematic track may not be reliable. Identifying an individual agnostic of their outward visual appearance is a particularly challenging problem that plagues many existing re-identification models that exist today. One growing area of research for performing appearance agnostic identification is the use of time-sequence images to identify an individual’s gait. Several methods of performing gait identification exist today. Existing methods require image preprocessing such as human pose estimation which either requires human keypoint labels or a pre-trained model that may not be optimized for the type of data variance that would be observed for a new scene (for example an aerial perspective). We propose an architecture that performs gait classification for person re-identification without the need for additional labels during training using the concept of pose-transfer. Our framework learns human pose estimation landmarks simultaneously with a gait encoder that may be used as a time-sequence fingerprint of a person for use in long-term tracking systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detection of moving objects at very long distances using infrared sensors is a challenging problem due to the small object size and heavy background clutter. To mitigate these problems, we propose to employ a convolutional neural network (CNN) with mean squared error (MSE) loss and show that this network detects the small objects with fewer false alarm rate than frame differencing methods. Furthermore, we modify a U-net architecture (introduced in1 ) and use both a weighted Hausdorff distance (WHD) loss and MSE loss which jointly achieve higher recall and lower false alarm rate. We compare our proposed method with state-of-the-art methods on a publicly available dataset of infrared images from Night Vision and Electronic Sensors directorate (NVESD) for the detection of small moving targets. We also show the effectiveness of our loss function on Mall dataset reported in.1 Our method achieves 5% and 2% more recall on the NVESD and Mall datasets, respectively. Furthermore, our method also achieves 0.3 per frame and 1 per frame fewer false alarm rate on these datasets, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current capabilities and sales volume of present-day UAVs (unmanned aerial vehicles) strongly demand counterUAV systems in a lot of applications to protect facilities or areas from misused or threatening drones. In order to reach a maximum detection and information gathering performance such systems need to combine different detection subsystems, i.e. based on visual optical, radar, and radio sensors. But available systems on the market are very expensive, the price is typically far over half a million dollars. Therefore, a far more cost-efficient solution has been developed which is presented in this paper. Four high-resolution visual optical cameras offer full 360 degree observation at distances up to several hundred meters. As soon as UAVs are visible in an image as small dots, they are detected and tracked with a GPU-based point target detector. Radar and radio sensor subsystems detect UAVs at higher distances. A full HD camera on a pan and tilt unit successively focuses on each found object to enable a convolutional neural network (CNN) to classify it with a higher local image resolution to identify UAVs and discard false alarms, e.g. from birds. Furthermore, drone type and payload are determined with CNNs, too, and a laser rangefinder on the pan and tilt unit measures the object distance. All information is collected and visualized in a 2D or 3D environmental map or situation representation on the base of geo-coordinates that are computed based on a RTK GNSS sensor self-localization. All software and hardware components are described in detail. The overall system is powerful, modular, scalable, and cost-efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Performing many simultaneous tasks on a resource-limited device is challenging due to the limited amount of available computational resources. Efficient and universal model architectures are the key to solving this problem. Existing sub-fields of machine learning, such as Multi-Task Learning (MTL), have proven that learning multiple tasks with a single neural network architecture is possible and even has the potential to improve sample efficiency, memory efficiency, and can be less prone to overfitting. In Visual Question Answering (VQA), a model ingests multi-modal input to produce text-based responses in the context of an image. Our proposed architecture merges the MTL and VQA concepts to form TaskNet. TaskNet solves the visual MTL problem using an input task to provide context to the network and guide its attention mechanism towards providing a relevant response. Our approach saves memory without sacrificing performance relative to naively training independent models. TaskNet efficiently provides multiple fine-grained classifications on a single input image and seamlessly incorporates context-specific metadata to further boost performance in a world of high variance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real radar returns from four small scale commercial aircraft models are used to train and test a convolutional neural network target recognition system. Many target recognition systems convert the one dimensional stepped-frequency features into two-dimensional using tools such as spectrograms and scalograms, and thereby utilize a two-dimensional CNN. In this paper, a one-dimensional convolutional neural net is used. The unknown target’s azimuth position may be known completely or within a certain range. The recognition performance is compared with that of an optimal Bayesian classifier assuming complete statistical knowledge. A discussion of the advantages and disadvantages of using 1D-CNN is presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep neural networks have recently demonstrated state-of-the-art accuracy on public Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) benchmark datasets. While attaining competitive accuracy on benchmark datasets is a necessary feature, it is important to characterize other facets of new SAR ATR algorithms. We extend this recent work by demonstrating not only improved state-of-the-art accuracy, but that contemporary deep neural networks can achieve several algorithmic traits beyond competitive accuracy which are necessitated by operational deployment scenarios. First, we employ several saliency map algorithms to provide explainability and insight into understanding black-box classifier decisions. Second, we collect and implement numerous data augmentation routines and training improvements both from the computer vision literature and specific to SAR ATR data in order to further improve model domain adaptation performance from synthetic to measured data, achieving a 99.26% accuracy on SAMPLE validation with a simple network architecture. Finally, we survey model reproducibility and performance variability under domain adaptation from synthetic to measured data, demonstrating potential consequences of training on only synthetic data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In most image classification applications, the task is assumed to be ”closed-set” in which the only classifications the model expects to make are of examples that it was originally trained on. However, the real world presents a much more complex ”open-set” in which a given model may encounter examples it was not trained to classify. Open-Set Recognition is the practice of enabling classifiers to recognize when they have encountered a given example that they were not previously trained to classify. Typically, these Open-Set Recognition techniques can be grouped into two categories: those that require a feature space, and those that learn a feature space. However, finding a suitable feature space is difficult and so it is often necessary that one is learned. To accomplish this, one can leverage ”Out of Distribution” examples, or examples that exist outside of the training data. This effort explores the various methods of obtaining Out of Distribution examples and how they compare. Additionally, based on our findings, we make practical recommendations for obtaining Out of Distribution examples to enable Open-Set Recognition techniques for overhead imagery and Synthetic Aperture Radar (SAR) applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of identifying 3D objects that have drastically different 2D representations in different views is challenging. This is because features that are important for matching are not always view-invariant and may not be visible from certain perspectives. This research looks to infer the 3D geometry of specific landmarks such that predictions of a viewpoint’s orientation about the landmark can be made from 2D images. For our dataset we use Google Earth to visit four well-known landmark sites and capture 2D images from a range of perspectives about them. The landmarks are chosen to be sensitive to parallax in order to ensure wide variance in our training images. We implement a 5-layer autoencoder network that takes 224x224x3 sized images and encodes them into 3136 element-long vectors, then replicates the input image from the encoding vector. We use the bottleneck encodings to generate predictions of the camera’s azimuth, elevation, and range relative to the landmark. We then compare input images with predicted parameters and replicated decoded images to measure the accuracy of our model. Our experimentation shows that a simple autoencoder network is capable of learning enough of the 3D geometry of a landmark to accurately predict viewpoint orientations from 2D images of landmarks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To improve cutting-edge deep learning techniques for more relevant defense applications, we extend our wellestablished port monitoring ATR techniques from generic ship classes to a pair of newly curated datasets: aircraft carriers and other military ships. We explore several techniques for data augmentation and splits to represent different deployment regimes, such as revisiting known military ports and new observations of never-before-seen ports and ships. We see reliable results (F1 <0.9) detecting and classifying aircraft carriers by type–and by proxy, nationality–as well as encouraging preliminary results (mAP <0.7) detecting and differentiating military ships by sub-class.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic Target Recognition (ATR) is a valuable application of computer vision that traditionally requires copious and tedious labeling through supervised learning. This research explored if ATR can be performed on satellite imagery at a comparable accuracy to a fully supervised baseline model with a considerably smaller subset of data labelled, on the order of 10%, using a recently developed semi-supervised technique, contrastive learning. Supervised contrastive loss was explored and compared to traditional cross entropy loss. Supervised contrastive loss was found to perform significantly better with a subset of the data labelled on the XView dataset, a publicly available dataset of satellite imagery captured with .3 meter ground sampling. The caveats when nothing and everything is labelled were additionally explored.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Camouflage is the art of deception which is often used in the animal world. It is also used on the battlefield to hide military assets. Camouflaged objects hide within their environments by taking on colors and textures that are similar to their surroundings. In this work, we explore the classification and localization of camouflaged enemy assets including soldiers. In this paper we address two major challenges: a) how to overcome the paucity of domain-specific labeled data and b) how to perform camouflage object detection using edge devices. To address the first challenge, we develop a deep neural style transfer model that blends content images of objects such as soldiers, tanks, and mines/improvised explosive devices with style images depicting deserts, jungles, and snow-covered regions. To address the second challenge, we develop combined depth-guided deep neural network models that combine image features with depth features. Previous research suggests that depth features not only contain local information about object geometry but also provide information on the position, and shape for camouflaged object identification and localization. In this work, we use precomputed monocular method for the generation of the depth maps. The novel fusion-based architecture provides an efficient representation learning space for object detection. In addition, we perform ablation studies to measure the performance of depth versus RGB in detecting camouflaged objects. We also demonstrate how such as model can be deployed in edge devices for real-time object identification and localization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To carry out the technical expertise of the IR defense systems that equip the French Armed Forces, DGA Information Superiority relies on simulation and uses the SE-Workbench-EO software to model the operational battlefield, as viewed by an optronic system, and generate synthetic images of military targets in their environment in animated scenarios. The various simulated functions comprise mainly, on sensor side, intelligence, detection/observation, homing and image processing, then, on target side, low detectability/stealth and self-protection. In recent years, DGA IS has experimented with SE-Workbench-EO both the computation of image data sets to train an automatic object acquisition capability by IR imaging using machine learning and the exploitation of the software in the visible color domain. In both cases, the software should meet high requirements regarding physical realism, image quality for human eyes and algorithmic perception, domain coverage and variability, calculation time, ergonomics and implementation efficiency. These experiences have brought many insights but the requirements are higher than ever and it is now necessary to undertake a major evolution of the current software to cover the needs of the years to come, including machine learning and particularly automatic detection and recognition of targets. In addition, the main stakes are the increasing of the entropy to approach that of the real images, the constitution of large volume of data in acceptable times or the complementarity and the right balance with the real images. This article presents the recent experiments, the current state of the software and needs not yet covered, and then the major evolution in preparation. It aims to demonstrate the expected contribution of EO/IR image synthesis to specify, design, evaluate and qualify an imaging system using machine learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.