We present a method for maritime platform defense using constrained deep reinforcement learning (DRL), showing how competing desires to reliably defend a fleet and conserve inventory may be managed through a dual optimization strategy. Against persistent and variable raids of threats, our agents minimize inventory expenditure subject to a constraint on the average time before a threat impacts the fleet being defended. Critically, the additional inventory consideration is introduced only after the agent has learned to defend the fleet well enough to consistently satisfy the constraint. In evaluations against a realistic simulation environment and with variable multi-ship geometries, we find that our strategy may be tuned to either (1) enable the agent to make significant gains in efficiency while losing very little in terms of reliability or (2) closely track specified reliability constraints while reducing inventory expenditure even further. The result is an agent with considerably stronger long-term viability, since the conserved inventory may be used for future engagements. We speculate on the potential of this method to provide a tunable, trustworthy artificial assistant to human decision-makers tasked with defense scheduling.
We explore strategies for improving the versatility of deep reinforcement learning (DRL) agents trained for maritime platform defense, in an effort to avoid impractical retraining when conditions change. DRL platform defense agents must be able to effectively schedule countermeasures, with constraints and against incoming raids of threats of uncertain type. Here we provide methods, centered on domain randomization, threat representation, and neural network architecture modification, for addressing changes in the relative locations and orientations of ships in the fleet, the number of ships and their inventories, and the distribution of threats. Testing our interventions in a realistic simulator, we show that a base DRL agent may be extended to account for a wide variety of changes in the operational scenario, with little degradation of performance.
Naval intelligence plays a critical role in multi-domain operations by identifying and tracking vessels of interest, especially suspected “dark ships” operating in an emissions-controlled (EMCON) state. While applying machine learning (ML) to maritime satellite imagery could enable an automated open-ocean search capability for dark ships, ensuring the robustness of ML models to environmental variations in the maritime domain remains a challenge because training sets do not encapsulate all possible environmental conditions. To address the challenge of unsupervised domain adaptation (UDA) in ship classification, i.e. transferring a ML model from a labeled source domain to an unlabeled target domain, we propose employing combinations of semi-supervised learning (SSL) techniques with standalone UDA approaches. Specifically, we incorporate combinations of FixMatch, minimum class confusion, gradient reversal, and mixup augmentation into the standard cross-entropy supervised loss function. These interventions were compared in two domain shift settings, one in which the source and target domains are both comprised of simulated data, and another in which the source domain consists of only simulated data, and the target domain consists of only real data. Experimental results comparing the combinations of interventions to a regularized fine-tuning baseline demonstrate that the greatest improvements in model robustness were achieved when combinations of our SSL strategy (FixMatch) and UDA algorithms were incorporated into training.
Existing artificial intelligence (AI) agents are most successful on narrow, well-defined tasks, where training data are plentiful, well-labeled, and match the deployment scenarios. It is also possible to train an AI agent to do multiple tasks (such as identifying IEDs from image data as well as recognizing faces), given sufficient time to craft the training regime and network architecture. However, data and time are often in short supply -- multi-domain operations involve rapidly shifting and adaptive compositions of capabilities, against adversaries that will likely be adapting on the fly. We suggest that to be robust in such scenarios, AIs need to be capable of learning in the field with opportunistically-available data, limited human oversight, and limited or no access to ground truth. These challenges also apply to reinforcement learning agents, with the additional challenge that for such agents, bad decisions cascade and pose a further difficulty in learning new tasks on the fly. We present an overview of the challenges in enabling AI systems for multi-domain operations, current algorithmic approaches for developing lifelong learning agents, and potential techniques for evaluating them.
We present a method for applying deep reinforcement learning to maritime platform defense, showing how to successfully train agents to schedule countermeasures for defending a fleet of ships against stochastic raids in a simulated environment. Our Schedule Evaluation Simulation (SEvSim) environment was developed using extensive input from subject matter experts and contains realistic threat characteristics, weapon efficacies, and constraints among weapons. Our approach includes novelty in both the representation of the system state and the neural network architecture: threats are represented as vectors containing information on the projected effect of different scheduling actions on their viability and fed to network input “slots” in randomized locations. Agents are trained using Proximal Policy Optimization, a state-of-the-art method for model-free learning. We evaluate the performance of our approach, finding that it learns scheduling strategies that both reliably neutralize threats and conserve inventory. We subsequently discuss the remaining challenges involved in bringing neural-network-based control to realization in this application space. Among these challenges are the needs to integrate humans into the loop, provide safety assurances, and enable continual learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.