In the U.S. Army the variables of the Operational Environment (OE) defined as Political, Military, Economic, Social, Information, Infrastructure, Physical Environment, and Time (PMESII-PT) are used for mission analysis and course of action development. In this research, we discuss how to modify existing simulators in order to add non-kinetic operational variables and investigate how they may shape and influence mission outcomes. We use the StarCraft II (SC2) Learning Environment (LE) which provides an interface for Artificial Intelligent agents to control game entities, gather observations, and adjust actions algorithmically. We develop a military-relevant scenario in SC2LE that will be perturbed by the emergence of non-kinetic challenges that affect the simulation outcome. Finally, we investigate reward functions that leverage the OE context to discover an optimal course of action. By integrating PMESII-PT variables into simulators, this research significantly enhances the realism of military simulations, facilitating the development of algorithms that improve operational planning and decision-making for real-world scenarios.
The performance of decision-making algorithmic approaches depends on a vast number of factors, including hyperparameters, which make some solutions difficult to find. In our previous work (MARDOC paradigm),1 we have found that even in a simple environment2 (i.e., a small map with few obstacles), merely changing the initial conditions or doctrinal guided policy (MARDOC) showed a significant impact on the converged behavior. Further, we found that not all policies were useful or desirable for military applications. In this paper, we focus on a complex environment (i.e., a larger map with a greater number of heterogeneous assets and a stronger adversarial force) to analyze the impact of different doctrinal control parameters on the performance and behavior of fixed doctrinal policies. Especially we prioritize the Red force assets for targeted maneuvers and attacks. We hypothesize that asset type and their corresponding coordination will have a significant impact on the performance of the Blue force. Our preliminary experiments in this complex environment showed that the performance varies tremendously depending on asset capability and coordination between teams.
A barrier to developing novel AI for complex reasoning is the lack of appropriate wargaming platforms for training and evaluating AIs in a multiplayer setting combining collaborative and adversarial reasoning under uncertainty with game theory and deception. An appropriate platform has several key requirements including flexible scenario design and exploration, extensibility across all five elements of Multi-Domain Operations (MDO), and capability for human-human and human-AI collaborative reasoning and data collection, to aid development of AI reasoning and the warrior-machinelike interface. Here, we describe the ARL Battlespace testbed which fulfills the above requirements for AI development, training and evaluation. ARL Battlespace is offered as an open source software platform (https://github.com/USArmyResearchLab/ARL_Battlespace). We present several example scenarios implemented in ARL Battlespace that illustrate different kinds of complex reasoning for AI development. We focus on ‘gap’ scenarios that simulate bridgehead and crossing tactics, and we highlight how they address key platform requirements including coordinated MDO actions, game theory and deception. We describe the process of reward shaping for these scenarios that will incentivize an agent to perform command and control (C2) tasks informed by human commanders’ courses of action, as well as the key challenges that arise. The intuition presented will enable AI researchers to develop agents that will provide optimal policies for complex scenarios.
Traditionally, learning from human demonstrations via direct behavior cloning can lead to high-performance policies given that the algorithm has access to large amounts of high-quality data covering the most likely scenarios to be encountered when the agent is operating. However, in real-world scenarios, expert data is limited and it is desired to train an agent that learns a behavior policy general enough to handle situations that were not demonstrated by the human expert. Another alternative is to learn these policies with no supervision via deep reinforcement learning, however, these algorithms require a large amount of computing time to perform well on complex tasks with high-dimensional state and action spaces, such as those found in StarCraft II. Automatic curriculum learning is a recent mechanism comprised of techniques designed to speed up deep reinforcement learning by adjusting the difficulty of the current task to be solved according to the agent's current capabilities. Designing a proper curriculum, however, can be challenging for sufficiently complex tasks, and thus we leverage human demonstrations as a way to guide agent exploration during training. In this work, we aim to train deep reinforcement learning agents that can command multiple heterogeneous actors where starting positions and overall difficulty of the task are controlled by an automatically-generated curriculum from a single human demonstration. Our results show that an agent trained via automated curriculum learning can outperform state-of- the-art deep reinforcement learning baselines and match the performance of the human expert in a simulated command and control task in StarCraft II modeled over a real military scenario.
Future Multi Domain Operation (MDO) wargaming will rely on Artificial Intelligence/Machine Learning (AI/ML) algorithms to aid and accelerate complex Command and Control decision-making. This requires an interdisciplinary effort to develop new algorithms that can operate in dynamic environments with changing rules, uncertainty, individual biases, changing cognitive states, as well as the capability to rapidly mitigate unexpected hostile capabilities and exploit friendly technological capabilities. Building on recent advancements in AI/ML algorithms, we believe that new algorithms for learning, reasoning under uncertainty, game theory with three or more players, and interpretable AI can be developed to aid in complex MDO decision-making. To achieve these goals, we developed a new flexible MDO warfighter machine interface game, Battlespace, to investigate and understand how human decision-making principles can be leveraged by and synergized with AI. We conducted several experiments with human vs. random players operating in a fixed environment with fixed rules, where the overall goal of the human players was to collaborate to either capture the opponents’ flags or eliminate all of their units. Then, we analyzed the evolution of the games and identified key features that characterized the human players’ strategies and their overall goal. We then followed a Bayesian approach to model the human strategies and developed heuristic strategies for a simple AI agent. Preliminary analysis revealed that following the human agents’ strategy in the capture the flag games produced the greatest winning percentage and may be useful for gauging the value of intermediate game states for developing the coordinated action planning of reinforcement learning algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.