We present a framework for developing software agents via Machine Learning (ML) entitled Curriculum-Heavy Accelerated Learning in a Competitive Environment (CHALICE). CHALICE is designed to train and deploy intelligent agents capable of executing strategies for air-ground combat as embodied in AFRL’s MIST turn-based wargame system. Such agents can be used to suggest courses of action in real-time to operational planners and to provide an adversarial opponent for evaluation of proposed courses of action. CHALICE uses state-of-the-art Deep Neural Networks (DNNs) to represent the state of the environment and Deep Reinforcement Learning (DRL) to train each agent via repeated feedback from outcomes of the MIST Stratagem game. Unlike recent DRL approaches for strategy games such as Go or StarCraft [1] [2], CHALICE minimizes dependence on existing corpora of human gameplay and trains efficiently with low computational resources and short convergence time (hours to days rather than weeks to months). Over the course of four government-led competitions, CHALICE produced agents that continually improved their performance, resulting in competitive play against human and automated opposing agents at relatively low training cost and time. In this paper, we motivate the operational problem and technical challenges, provide an overview of our technical approach, elaborate on our vision-based and graph-based DNN architecture design and agent training procedure, and present results from the most recent Stratagem competition. We close with a discussion of future research recommendations.
Many defense problems are time-dominant: attacks progress at speeds that outpace human-centric systems designed for
monitoring and response. Despite this shortcoming, these well-honed and ostensibly reliable systems pervade most
domains, including cyberspace. The argument that often prevails when considering the automation of defense is that
while technological systems are suitable for simple, well-defined tasks, only humans possess sufficiently nuanced
understanding of problems to act appropriately under complicated circumstances. While this perspective is founded in
verifiable truths, it does not account for a middle ground in which human-managed technological capabilities extend
well into the territory of complex reasoning, thereby automating more nuanced sense-making and dramatically
increasing the speed at which it can be applied. Snort1 and platforms like it enable humans to build, refine, and deploy
sense-making tools for network defense. Shortcomings of these platforms include a reliance on rule-based logic, which
confounds analyst knowledge of how bad actors behave with the means by which bad behaviors can be detected, and a
lack of feedback-informed automation of sensor deployment. We propose an approach in which human-specified
computational models hypothesize bad behaviors independent of indicators and then allocate sensors to estimate and
forecast the state of an intrusion. State estimates and forecasts inform the proactive deployment of additional sensors
and detection logic, thereby closing the sense-making loop. All the while, humans are on the loop, rather than in it,
permitting nuanced management of fast-acting automated measurement, detection, and inference engines. This paper
motivates and conceptualizes analytics to facilitate this human-machine partnership.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.