Automatic Target Recognition (ATR) often confronts intricate visual scenes, necessitating models capable of discerning subtle nuances. Real-world datasets like the Defense Systems Information Analysis Center (DSIAC) ATR database exhibit unimodal characteristics, hindering performance, and lack contextual information for each frame. To address these limitations, we enrich the DSIAC dataset by algorithmically generating captions and proposing new train/test splits, thereby creating a rich multimodal training landscape. To effectively leverage these captions, we explore the integration of a vision-language model, specifically Contrastive Language-Image Pre-training (CLIP), which combines visual perception with linguistic descriptors. At the core of our methodology lies a homotopy-based multi-objective optimization technique, designed to achieve a harmonious balance between model precision, generalizability, and interpretability. Our framework, developed using PyTorch Lightning and Ray Tune for advanced distributed hyperparameter optimization, enhances models to meet the intricate demands of practical ATR applications. All code and data is available at https://github.com/sabraha2/ATR-CLIP-Multi-Objective-Homotopy-Optimization.
Deep learning has expedited important breakthroughs in research and commercial applications for next-generation technologies across many domains including Automatic Target Recognition (ATR). The success of these models in a specific application is often attributed to optimized hyperparameters: user-configured values controlling the model’s ability to learn from data. Tuning hyperparameters however remains a difficult and computationally expensive task contributing to deficient ATR model performance compared to set requirements. We present the efficacy of applying our developed hyperparameter optimization method to boost the effectiveness and performance of any given optimization method. Specifically, we use a generalized additive model surrogate homotopy hyperparameter optimization strategy to approximate regions of interest and trace minimal points over regions of the hyperparameter space instead of ineffectively evaluating the entire hyperparameter surface. We integrate our approach into SHADHO (Scalable Hardware-Aware Distributed Hyperparameter Optimization) a hyperparameter optimization framework that computes the relative complexity of each search space and then monitors the performance of the learning task over the trials. We demonstrate how our approach effectively finds optimal hyperparameters for object detection by conducting a model search to optimize multiple object detection algorithms on a subset of the DSIAC ATR Algorithm Development Image Database and finding models that achieve comparable or lower validation loss in fewer iterations than standard techniques and manual tuning practices.
As the use of biometrics becomes more wide-spread, the privacy concerns that stem from the use of biometrics are becoming more apparent. As the usage of mobile devices grows, so does the desire to implement biometric identification into such devices. A large majority of mobile devices being used are mobile phones. While work is being done to implement different types of biometrics into mobile phones, such as photo based biometrics, voice is a more natural choice. The idea of voice as a biometric identifier has been around a long time. One of the major concerns with using voice as an identifier is the instability of voice. We have developed a protocol that addresses those instabilities and preserves privacy. This paper describes a novel protocol that allows a user to authenticate using voice on a mobile/remote device without compromising their privacy. We first discuss the Vaulted Verification protocol, which has recently been introduced in research literature, and then describe its limitations. We then introduce a novel adaptation and extension of the Vaulted Verification protocol to voice, dubbed Vaulted Voice Verification (V3). Following that we show a performance evaluation and then conclude with a discussion of security and future work.
Describable visual attributes are a powerful way to label aspects of an image, and taken together, build a detailed representation of a scene's appearance. Attributes enable highly accurate approaches to a variety of tasks, including object recognition, face recognition and image retrieval. An important consideration not previously addressed in the literature is the reliability of attribute classifiers as the quality of an image degrades. In this paper, we introduce a general framework for conducting reliability studies that assesses attribute classifier accuracy as a function of image degradation. This framework allows us to bound, in a probabilistic manner, the input imagery that is deemed acceptable for consideration by the attribute system without requiring ground truth attribute labels. We introduce a novel differential probabilistic model for accuracy assessment that leverages a strong normalization procedure based on the statistical extreme value theory. To demonstrate the utility of our framework, we present an extensive case study using 64 unique facial attributes, computed on data derived from the Labeled Faces in the Wild (LFW) data set. We also show that such reliability studies can result in significant compression benefits for mobile applications.
The issues of applying facial recognition at significant distances are non-trivial and often subtle. This paper summarizes 7 years of effort on Face at a distance, which for us is far more than a fad. Our effort started under the DARPA Human Identification at a Distance (HID) program. Of all the programmers under HID, only a few of the efforts demonstrated face recognition at greater than 25ft and only one, lead by Dr. Boult, studied face recognition at distances greater than 50 meters. Two issues were explicitly studied. The first was atmospherics/weather, which can have a measurable impact at these distances. The second area was sensor issues including resolution, field-of-view and dynamic range. This paper starts with a discussion and some of results in sensors related issues including resolution, FOV, dynamic range and lighting normalization. It then discusses the "Photohead" technique developed to analyze the impact of weather/imaging and atmospherics at medium distances. The paper presents experimental results showing the limitations of existing systems at significant distance and under non-ideal weather conditions and presents some reasons for the weak performance. It ends with a discussion of our FASSTTM (failure prediction from similarity surface theory) and RandomEyesTM approaches, combined into the FIINDERTM system and how they improved FAAD.
Proceedings Volume Editor (3)
This will count as one of your downloads.
You will have access to both the presentation and article (if available).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.