In this paper, we propose a novel approach for real-time human action recognition (HAR) on resource-constrained UAVs. Our approach tackles the limited availability of labeled UAV video data (compared to ground-based datasets) by incorporating synthetic data augmentation to improve the performance of a lightweight action recognition model. This combined strategy offers a robust and efficient solution for UAV-based HAR.We evaluate our method on the RoCoG v21 and UAV-Human2 datasets, showing a notable increase in top-1 accuracy across all scenarios on RoCoG: 9.1% improvement when training with synthetic data only, 6.9% with real data only, and the highest improvement of 11.8% with a combined approach. Additionally, using an X3D backbone further improves accuracy on the UAV-Human dataset by 5.5%. Our models deployed on a Qualcomm Robotics RB5 platform achieve real-time predictions at approximately 10 frames per second (fps) and demonstrate a superior trade-off between performance and inference rate on both low-power edge devices and high-end desktops.
The output of a semantic segmentation model on an off-road dataset can provide an accurate description of the terrain and the obstacles contained within it. This output can be leveraged to determine the presence of barriers in an image. An obstacle is anything that may obstruct a portion of the region of traversal, while we define a barrier as something that will bisect the region of traversal to create two disjoint regions that would otherwise be connected if not for its presence. Detecting instances of barriers requires more than learning the correct label for a standard 2D semantic segmentation model. This paper will present an approach to detect the presence of barriers/barricades in the scene by utilizing the traversability of the semantic classes of non-traversal and the pose of that class(es) in relation to other classes of non-traversal in the scene to define an object as a barricade/barrier.
Using a semantic hierarchy as a labeling scheme can provide object detection systems with a more robust expert knowledge of the relationships between object classes. This knowledge can be used to improve object class prediction in cases where an object detector encounters an object of a class upon which it was not trained, known as zero-shot object detection or open-set recognition. Datasets which are useful for a particular application, domain, or task may not have their object labels organized into an appropriate semantic hierarchy. For example, the Scene UNderstanding (SUN1) Database has image scenes organized into a hierarchy, but no such organization exists with the object labels. Objects in the images of this dataset were annotated in a crowd-sourced manner which allowed annotators to define the polygons which bound the objects as well as assign the labels. The challenge taken up by the method presented in this paper was to take the original object labels of the SUN Database and create a semantic hierarchy such that each child-parent pair of object classes demonstrates an “IS-A” relationship. By associating common labels within the dataset with the most relevant and fine-grained WordNet2 synonym, this approach resulted in a multi-layered semantic hierarchy for SUN Database object labels. The product is a tree-structured graph where each node is a WordNet synonym of the original label and a node’s parent is determined by its WordNet hypernym. Other ontological frameworks, such as Basic Formal Ontology3 and the Operational Environment Ontology Suite,4 are also discussed.
Head mounted displays (HMD) may prove useful for synthetic training and augmentation of military C5ISR decisionmaking. Motion sickness caused by such HMD use is detrimental, resulting in decreased task performance or total user dropout. The genesis of sickness symptoms is often measured using paper surveys, which are difficult to deploy in live scenarios. Here, we demonstrate a new way to track sickness severity using machine learning on data collected from heterogeneous, non-invasive sensors worn by users who navigated a virtual environment while remaining stationary in reality. We discovered that two models, one trained on heterogeneous sensor data and another trained only on electroencephalography (EEG) data, were able to classify sickness severity with over 95% accuracy and were statistically comparable in performance. Greedy feature optimization was used to maximize accuracy while minimizing the feature subspace. We found that across models, the features with the most weight were previously reported in the literature as being related to motion sickness severity. Finally, we discuss how models constructed on heterogeneous vs homogeneous sensor data may be useful in different real-world scenarios.
Intelligent agents are devices, software, and simulations that perceive the environment and take actions to achieve a goal through the use of artificial intelligence. These AI agents are increasingly incorporated into every aspect of our lives. This is particularly true for soldiers and analysts as they must increasingly perform tasks in varied, dynamic, and fast paced operational environments. There is a common idea that, in the future, the pace of operations will increasingly far exceed soldiers’ or analysts’ ability to react to extreme, complex activities. Accelerated decision making in Army operations will relying on AI agents and enabling technologies such as autonomous systems and simulations. However, what happens when the decisions from these AI agents are wrong, produce results contrary to expectations, or simply in disagreement with a person? Explanations can help resolve these issues. Any errors or uncertainty from the AI agent in an accelerated environment will present unique and unforeseen challenges that may potentially inhibit analysts’ or soldiers’ ability to make decisions effectively and efficiently. Providing explanations for AI outputs, predictions, or behaviors is challenging. Algorithms or techniques frequently obfuscate features and how actions are decided. In addition, results from these systems do not always include uncertainty information related to the factors that influenced the actions or decisions. Therefore, information on the uncertainty explicitly in the explanation is necessary. We explore the use of abductive reasoning to provide explanations for situations where an agents answers are not in line with human assessment nor provide uncertainty information needed for human interpretation of the answers. The primary goal of this work is to strengthen the communication of information and increase the effectiveness of interactions between humans and non-human agents.
Collaborative decision-making remains a significant research challenge that is made even more complicated in real-time or tactical problem-contexts. Advances in technology have dramatically assisted the ability for computers and networks to improve the decision-making process (i.e. intelligence, design, and choice). In the intelligence phase of decision making, mixed reality (MxR) has shown a great deal of promise through implementations of simulation and training. However little research has focused on an implementation of MxR to support the entire scope of the decision cycle, let alone collaboratively and in a tactical context. This paper presents a description of the design and initial implementation for the Defense Integrated Collaborative Environment (DICE), an experimental framework for supporting theoretical and empirical research on MxR for tactical decision-making support.
Tone mapping operators compress high dynamic range images to improve the picture quality on a digital display when the dynamic range of the display is lower than that of the image. However, tone mapping operators have been largely designed and evaluated based on the aesthetic quality of the resulting displayed image or how perceptually similar the compressed image appears relative to the original scene. They also often require per image tuning of parameters depending on the content of the image. In military operations, however, the amount of information that can be perceived is more important than the aesthetic quality of the image and any parameter adjustment needs to be as automated as possible regardless of the content of the image. We have conducted two studies to evaluate the perceivable detail of a set of tone mapping algorithms, and we apply our findings to develop and test an automated tone mapping algorithm that demonstrates a consistent improvement in the amount of perceived detail. An automated, and thereby predictable, tone mapping method enables a consistent presentation of perceivable features, can reduce the bandwidth required to transmit the imagery, and can improve the accessibility of the data by reducing the needed expertise of the analyst(s) viewing the imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.