Computer vision (CV) algorithms have improved tremendously with the application of neural network-based approaches. For instance, Convolutional Neural Networks (CNNs) achieve state of the art performance on Infrared (IR) detection and identification (e.g., classification) problems. To train such algorithms, however, requires a tremendous quantity of labeled data, which are less available in the IR domain than for “natural imagery”, and are further less available for CV-related tasks. Recent work has demonstrated that synthetic data generation techniques provide a cheap and attractive alternative to collecting real data, despite a “realism gap” that exists between synthetic and real IR data.
In this work, we train deep models on a combination of real and synthetic IR data, and we evaluate model performance on real IR data. We focus on the tasks of vehicle and person detection, object identification, and vehicle parts segmentation. We find that for both detection and object identification, training on a combination of real and synthetic data performs better than training only on real data. This classification improvement demonstrates an advantage to using synthetic data for computer vision. Furthermore, we believe that the utility of synthetic data – when combined with real data – will only increase as the realism gap closes.
Achieving state of the art performance with CNNs (Convolutional Neural Networks) on IR (infrared) detection and classification problems requires significant quantities of labeled training data. Real data in this domain can be both expensive and time-consuming to acquire. Synthetic data generation techniques have made significant gains in efficiency and realism in recent work, and provide an attractive and much cheaper alternative to collecting real data. However, the salient differences between synthetic and real IR data still constitute a “realism gap”, meaning that synthetic data is not as effective for training CNNs as real data. In this work we explore the use of image compositing techniques to combine real and synthetic IR data, improving realism while retaining many of the efficiency benefits of the synthetic data approach. In addition, we demonstrate the importance of controlling the object size distribution (in pixels) of synthetic IR training sets. By evaluating synthetically-trained models on real IR data, we show notable improvement over previous synthetic IR data approaches and suggest guidelines for enhanced performance with future training dataset generation.
Large amounts of labelled imagery are needed to sufficiently train Deep Neural Network (DNN) based classification algorithms. In many cases, collecting an adequate training dataset requires excessive amounts of time and money. The limited data problem is exacerbated when military-relevant imagery requirements are imposed. This often requires imagery collected in the infrared (IR) band, as well as, imagery of military-relevant targets; adding difficulty due to scarcity of sensors, targets, and personnel with the ability to capture the data. To mitigate these types of problems, this study evaluates the effectiveness of synthetic data when aided with small amounts of real data for training DNN based classifier algorithms. This study analyzes the efficacy of the YOLOv3 classifier algorithm at detecting common household objects after training on synthetic data created through an image chipping and insertion method. A set of image chips are created by extracting objects from a green screen background which are then used to generate synthetic training examples by pasting them on a variety of new backgrounds. The impact of background variety and addition of small amounts of real data on trained algorithm performance is analyzed.
Experimental results are presented from an investigation that evaluated the effects of introducing degraded imagery into the training and test sets of an algorithm. Degradation consisted of various applied MTFs (blur) and noise profiles. The hypothesis was that the introduction of degraded imagery into the training set would increase the algorithm's accuracy when degraded imagery was present in the test set. Preliminary experimentation confirmed this hypothesis, with some additional observations regarding robustness and feature selection for degraded imagery. Further investigations are suggested to advance this work, including increased variety of objects for classification, additional wave bands, and randomized degradations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.