2D multi-person pose estimation is a well-studied problem for understanding humans in an image. This involves keypoint detection, which requires to detect and localize the points of interest (human joints). Multi-person pose estimation remains challenging because of occlusion of body parts, non-rigidity of human body, variable number of persons in an image and various scales. The most common existing method for keypoint detection is heatmap-based regression. However, there are several drawbacks. The precision relies on the resolution of the output heatmap; the computation is costly for post-process or pre-process for high resolution heatmap; the overlapping heatmap signals of spatial closely keypoints could not be distinguished. Therefore, heatmap-free pose estimation was emerged to tackle these problems. KAPAO and YOLO-Pose are the representations. They both utilized YOLO for keypoint detection since YOLO is an extremely fast object detection method with high accuracy. A graph consists of a collection of nodes and a collection of edges that connect the nodes. A human pose could be referred to a graph, where human joints are nodes and corresponding connection will draw the pose. Graph neural network (GNN) is designed for data with graph structure. Inspired by these, we introduce a YOLO-based GNN, a heatmap-free approach for 2D multi-person pose estimation. YOLO-based network is leveraged for keypoint detection. The detected keypoints and connections will be then re-arranged and refined by GNN. We tested our framework on COCO-2017 dataset and preliminary results show superior performance in accuracy and efficiency.
KEYWORDS: Ice, Image segmentation, Education and training, Databases, Convolution, Performance modeling, Data modeling, Tunable filters, Solar radiation models, Solar radiation
With increasing global temperatures due to anthropogenic climate change, seasonal sea ice in the Arctic has experienced rapid retreat, with increasing areal extent of meltponds that occur on the surface of retreating sea ice. Because meltponds have a much lower albedo than sea ice or snow, more solar radiation is absorbed by the underlying water, further accelerating the melting rate of sea ice. However, the dynamic nature of meltponds, which exhibit complex shapes and boundaries, makes manual analysis of their effects on underlying light and water temperatures tedious and taxing. Several classical image processing approaches have been extensively used for the detection of meltpond regions in the Arctic area. We propose a Convolutional Neural Network (CNN) based multiclass segmentation model termed NABLA-N (∇N) for automated detection and segmentation of meltponds. The architectural framework of NABLA-N consists of an encoding unit and multiple decoding units that decode from several latent spaces. The fusion of multiple feature spaces in the decoding units enables better representation of features due to the combination of low and high-level feature maps. The proposed model is evaluated on high-resolution aerial photographs of Arctic sea ice obtained during the Healy-Oden Trans Arctic Expedition (HOTRAX) in 2005 and NASA’s Operation IceBridge DMS L1B Geolocated and Orthorectified image data in 2016. These images are classified into three classes: meltpond, open water and sea ice. We determined that NABLA-N demonstrates superior performance on segmentation of meltpond data compared to other state-of-the-art networks such as UNet and Recurrent Residual UNet (R2UNet).
KEYWORDS: Data modeling, Diffusion, Image processing, Ice, Education and training, Colorimetry, RGB color model, Model based design, Statistical modeling, Image segmentation
As global warming causes climate change, extreme weather has become more common, posing a significant threat to life on Earth. One of the important indicators of climate change is the formation of melt ponds in the arctic region. Scarcity of large amount of annotated arctic sea ice data is a major challenge in training a deep learning model for the prediction of the dynamics of the melt ponds. In this research work, we use diffusion model, a class of generative models, to generate synthetic arctic sea ice data for further analysis of meltponds. Based on the training data, diffusion models can generate new and realistic data that are not present in the original dataset by focusing on the data distribution from a simple to a more complex distribution. First, simple distribution is transformed into a complex distribution by adding noise, such as a Gaussian distribution and through a series of invertible operations. Once trained, the model can generate new samples by starting from a simple distribution and diffusing it to the complex distribution, capturing the underlying features of the data. During inference, when generating new samples, the conditioning information is provided as input alongside the starting noise vector. This guides the diffusion process to produce samples that adhere to the specified conditions. We used high-resolution aerial photographs of Arctic region obtained during the Healy-Oden Trans Arctic Expedition (HOTRAX) in year 2005 and NASA’s Operation IceBridge DMS L1B Geolocated and Orthorectified data acquired in 2016 for the initial training of the generative model. The original image and synthetic image are assessed based on their chromatic similarity. We employed evaluation metric known as Chromatic Similarity Index (CSI) for the assessment purposes.
The massive shift in temperatures in the Arctic region has caused the increased Albedo effect as higher amount of solar energy is absorbed in the darker surface due to melting ice and snow. This continuous regional warming results in further melting of glaciers and loss of sea ice. Arctic melt ponds are important indicators of Arctic climate change. High-resolution aerial photographs are invaluable for identifying different sea ice features and are great source for validating, tuning, and improving climate models. Due to the complex shapes and unpredictable boundaries of melt ponds, it is extremely tedious, taxing, and time-consuming to manually analyze these remote sensing data that lead to the need for automatizing the technique. Deep learning is a powerful tool for semantic segmentation, and one of the most popular deep learning architectures for feature cascading and effective pixel classification is the UNet architecture. We introduce an automatic and robust technique to predict the bounding boxes for melt ponds using a Multiclass Recurrent Residual UNet (R2UNet) with UNet as a base model. R2UNet mainly consists of two important components in the architecture namely residual connection and recurrent block in each layer. The residual learning approach prevents vanishing gradients in deep networks by introducing shortcut connections, and the recurrent block, which provides a feedback connection in a loop, allows outputs of a layer to be influenced by subsequent inputs to the same layer. The algorithm is evaluated on Healy-Oden Trans Arctic Expedition (HO-TRAX) dataset containing melt ponds obtained during helicopter photography flights between 5 August and 30 September 2005. The testing and evaluation results show that R2UNet provides improved and superior performance when compared to UNet, Residual UNet (Res-UNet) and Recurrent U-Net (R-UNet).
Automatic ship detections in complex background during the day and night in infrared images is an important task. Additionally, we want to have the capability to detect the ships in various scales, orientations, and shapes. In this paper, we propose the use of neural network technology for this purpose. The algorithm used for this task is the Deep Neural Machine (DNM), which contains three different parts (backbone, neck, and head). Combining all three steps, this algorithm can extract the features, create prediction layers using different scales of the backbone, and give object predictions at different scales. The experimental results show that our algorithm is robust and efficient in detecting ships in complex background.
This Conference Presentation, “Learning classical image registration features using a deep learning architecture,” was recorded at SPIE Photonics West held in San Francisco, California, United States
This Conference Presentation, “Deep neural machine for multimodal information fusion,” was recorded at SPIE Photonics West held in San Francisco, California, United States
Multi-object tracking in wide-area motion imagery (WAMI) is facilitating great interest in the field of image processing that leads to numerous real-world applications. Among them, aircraft and unmanned aerial vehicles (UAV) with real-time robust visual trackers for long-term aerial maneuvering are currently attracting attention and have remarkably broadened the scope of applications of object tracking. In this paper, we present a novel attention-based feature fusion strategy, which effectively combines the template and searching region features. Our results demonstrate the efficacy of the proposed system on CLIF and UNICORN datasets.
Automated monitoring of low resolution, deep-space objects in wide field of view (WFOV) imaging systems can benefit from the improved performance of deep learning object detectors. The PANDORA sensor array, located in Maui at the Air Force Maui Optical and Supercomputing Site, is an exemplar of a scalable imaging architecture that can detect dim deep-space objects while maintaining a WFOV. The PANDORA system captures 20°×120° images of the night sky oriented along the GEO belt at a rate of two frames per minute. Prior work has established a baseline performance for the detection of Geosynchronous Earth Orbit (GEO) satellite objects using classical, feature-based detectors. This work extends GEO object detection and tracking methodologies by implementing a spatio-temporal deep learning architecture (GEO-SPANN), further improving the state of the art in GEO satellite object detection and tracking. GEO-SPANN consists of a learned spatial detector coupled with a tracking algorithm to detect and re-identify space objects in temporal sequences. We present the detection and tracking results of GEO-SPANN on an annotated PANDORA dataset, reporting an overall maximum F1 point of 0.814, corresponding to 0.766 precision and 0.868 recall. GEO-SPANN advances strategies for autonomous detection and tracking of GEO satellites, enabling the PANDORA sensor system to be leveraged for satellite orbit catalog maintenance and anomaly detection.
Aerial object detection is one of the most important applications in computer vision. We propose a deep learning strategy for detection and classification of objects on the pipeline right of ways by analyzing aerial images captured by flying aircrafts or drones. Due to the limitation of sufficient aerial datasets for accurately training the deep learning systems, it is necessary to create an efficient methodology for object data augmentation of the training dataset to achieve robust performance in various environmental conditions. Another limitation is the computing hardware that could be installed on the aircraft, especially when it is a drone. Hence a balance between the effectiveness and efficiency of object detector needs to be considered. We propose an efficient weighted IOU NMS (intersection over union non-maxima suppression) method to speed up the post-processing time that satisfies the onboard processing requirement. Weighted IOU NMS utilizes confidence scores of all proposed bounding boxes to regenerate a mean box in parallel. It processes the bounding box score at the same instant without removing the bounding box or decreasing the bounding box score. We perform both quantitative and qualitative evaluations of our network architecture on multiple aerial datasets. The experimental results show that our proposed framework achieves better accuracy than the state-of-the-art methods for aerial object detection in various environmental conditions.
Much research has been done in implementing deep learning architectures in detection and recognition tasks. Current work in auto-encoders and generative adversarial networks suggest the ability to recreate scenes based on previously trained data. It can be assumed that with the ability to recreate information is the ability to differentiate information. We propose a convolutional auto-encoder for both recreating information of the scene and for detection of vehicles from within the scene. In essence, the auto-encoder creates a low-dimensional representation of the data projected in a latent space, which can also be used for classification. The convolutional neural network is based on the concept of receptive fields created by the network, which are part of the detection process. The proposed architecture includes a discriminator network connected in the latent space, which is trained for the detection of vehicles. Through work in multi-task learning, it is advantageous to learn multiple representations of the data from different tasks to help improve task performance. To test and evaluated the network, we use standard aerial vehicle data sets, like Vehicle Detection in Aerial Imagery (VEDAI) and Columbus Large Image Format (CLIF). We observe that the neural network is able to create features representative of the data and is able to classify the imagery into vehicle and non-vehicle regions.
KEYWORDS: 3D modeling, Cameras, RGB color model, 3D surface sensing, Robotics, Environmental sensing, Motion models, 3D visualizations, Reconstruction algorithms, Sensors, 3D metrology
A new methodology for 3D change detection which can support effective robot sensing and navigation in a reconstructed indoor environment is presented in this paper. We register the RGB-D images acquired with an untracked camera into a globally consistent and accurate point-cloud model. This paper introduces a robust system that detects camera position for multiple RGB video frames by using both photo-metric error and feature based method. It utilizes the iterative closest point (ICP) algorithm to establish geometric constraints between the point-cloud as they become aligned. For the change detection part, a bag-of-word (DBoW) model is used to match the current frame with the previous key frames based on RGB images with Oriented FAST and Rotated BRIEF (ORB) feature. Then combine the key-frame translation and ICP to align the current point-cloud with reconstructed 3D scene to localize the robot position. Meanwhile, camera position and orientation are used to aid robot navigation. After preprocessing the data, we create an Octomap Model to detect the scene change measurements. The experimental evaluations performed to evaluate the capability of our algorithm show that the robot's location and orientation are accurately determined and provide promising results for change detection indicating all the object changes with very limited false alarm rate.
KEYWORDS: RGB color model, 3D modeling, Robotics, 3D surface sensing, Detection and tracking algorithms, Environmental sensing, Clouds, Video, Free space, Image processing, 3D modeling, Clouds, Sensors, 3D image processing, Robot vision, Data modeling
3D scene change detection is a challenging problem in robotic sensing and navigation. There are several unpredictable aspects in performing scene change detection. A change detection method which can support various applications in varying environmental conditions is proposed. Point cloud models are acquired from a RGB-D sensor, which provides the required color and depth information. Change detection is performed on robot view point cloud model. A bilateral filter smooths the surface and fills the holes as well as keeps the edge details on depth image. Registration of the point cloud model is implemented by using Random Sample Consensus (RANSAC) algorithm. It uses surface normal as the previous stage for the ground and wall estimate. After preprocessing the data, we create a point voxel model which defines voxel as surface or free space. Then we create a color model which defines each voxel that has a color by the mean of all points’ color value in this voxel. The preliminary change detection is detected by XOR subtract on the point voxel model. Next, the eight neighbors for this center voxel are defined. If they are neither all ‘changed’ voxels nor all ‘no changed’ voxels, a histogram of location and hue channel color is estimated. The experimental evaluations performed to evaluate the capability of our algorithm show promising results for novel change detection that indicate all the changing objects with very limited false alarm rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.