PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11398, including the Title Page, Copyright information and Table of Contents
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is common that after a disaster, teams are sent to collect data of damaged buildings. The images collected are often taken using a manual handheld camera and a drone. This data collection process is extremely manual and slow. It can even put the photo collectors in hazardous conditions. To address this, a drivable, omnidirectional camera can produce images that could potentially be combined with drone images to create a functioning three-dimensional model with drastically reduced data collection times. This paper aims to discuss the methods and applications of Applied Streetview images in Pix4D modeling software. The Applied Streetview images went through varied processing stages which resulted in models being combined with UAV data. The result of the merged data sets were then visually compared for aesthetic and accuracy purposes. The research is using images collected from areas around the University of Washington's campus.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
3D Building Reconstruction is an important problem with applications in urban planning, emergency response, and disaster planning. This paper presents a new pipeline for 3D reconstruction of buildings from RGB imagery captured via a drone. We leverage the commercial software Pix4D to construct a 3D point cloud from RGB drone imagery, which is then used in conjunction with image processing and geometric methods to extract a building footprint. The footprint is then extruded vertically based on the heights of the segmented rooftops. The footprint extraction involves two main steps, line segment detection and polygonization of the lines. To detect line segments, we project the point cloud onto a regular grid, detect preliminary lines using the Hough transform, refine them via RANSAC, and convert them into line segments by checking the density of the points surrounding the line. In the polygonization step, we convert detected line segments into polygons by constructing and completing partial polygons, and then filter them by checking for support in the point cloud. The polygons are then merged based on their respective height profiles. We have tested our system on two buildings of several thousand square feet in Alameda, CA, and obtained an F1 score of 0.93 and 0.95 respectively as compared to the ground truth.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, imagery from airborne sensors has become available at low costs, due to the advent of affordable drone systems. Such imagery can be used for addressing many different tasks in various fields of application. While the imagery itself bears all of the information required for some tasks, other tasks require the imagery to be georeferenced satisfying certain accuracy requirements. If the latter is not the case when performing such tasks, registering the imagery to reference images that come with a satisfying georeference allows us to transfer this georeference to the imagery. Many registration approaches described in literature require an image and the reference to be of sufficiently similar appearance, in order to work properly. To address registration problems in more unsimilar cases, we have been developing a registration method based on contour matching. In a nutshell, this method comprises two main steps, namely contour point extraction from both the image and the reference and matching them. Towards the optimization of the overall performance of our registration method, we strive to improve the performance of each step individually, by both implementing new algorithms and fine-tuning relevant parameters. The scope of this work is the implementation of a novel contour point extraction algorithm to improve step one of our method, as well as its evaluation in the context of our registration method. Line shaped objects exceeding a certain length, such as e.g. road networks, are likely to be present both in the image and the reference, despite their possible appearance disimilarity. The novel contour point extraction algorithm capitalizes on this by focusing on the extraction of contour points representing such line shaped objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image matching has been a critical research topic in many computer vision applications such as stereo vision, feature tracking, motion tracking, image registration and mosaicing, object recognition, 3D reconstruction, etc. Normalized Cross Correlation (NCC) is a template based image matching approach which is invariant to linear brightness and contrast variations. As a first step in mosaicing, we use NCC to a great extent for matching images which is an expensive and time consuming operation. Thus an attempt is made to implement NCC in GPU and multi-CPU in order to improve execution time for real time applications. Finally we compare the enhancement in performance and efficiency in timing by switching NCC implementation from CPU to GPU.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the widespread use of multirotor UAS/drones in the civilian and commercial sector, the skies are going to get crowded. Under this scenario, it is not inconceivable to anticipate potential issues with enforcing flight rules and regulations on these drones. This might come about as a result of some failures on the drones themselves (loss of communication, sensor or actuator failures) or in some cases, deliberatively uncooperative drones. Therefore, in order to implement effective Counter UAS (C-UAS) measures, it is important to fully characterize the uncooperative drone, particularly its capabilities; the first step in this process is the identification of the geometry of the drone. In this paper, we present the preliminary results of an effort to characterize the geometry of a drone, using feeds from fixed video cameras. Preliminary results indicate that it is feasible to identify the general geometry of the drone – such as if it is a quadcopter or otherwise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object detection for computer vision systems continues to be a complicated problem in real-world situations. For instance, autonomous vehicles need to operate with very small margins of error as they encounter safety-critical scenarios such as pedestrian and vehicle detection. The increased use of unmanned aerial vehicles (UAVs) by both government and private citizens has created a need for systems which can reliably detect UAVs in a large variety of conditions and environments. In order to achieve small margins of error, object detection systems, especially those reliant on deep learning methods, require large amounts of annotated data. The use of synthetic datasets provides a way to alleviate the need to collect annotated data. Unfortunately, the nature of synthetic dataset generation introduces a reality and simulation gap that hinders an object detector's ability to generalize on real world data. Domain randomization is a technique that generates a variety of different scenarios in a randomized fashion both to close the reality and simulation gap and to augment a hand-crafted dataset. In this paper, we combine the AirSim simulation environment with domain randomization to train a robust object detector. As a final step, we fine-tune our object detector on real-world data and compare it with object detectors trained solely on real-world data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent breakthroughs in deep net processing have shown the ability to compute solutions to physics-based problems such as the three-body problem many orders-of-magnitude times faster. In this paper, we show how a deep autoencoder, trained on paths generated using a dynamical, physics-based model can generate comparable routes much faster. The autogenerated routes have all the properties of a physics-based model without the computational burden of explicitly solving the dynamical equations. This result is useful for planning and multi-agent reinforcement learning simulation purposes. In addition, the fast route planning capability may prove useful in real time situations such as collision avoidance or fast dynamic targeting response.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The continuous evolution of commercial Unmanned Aerial Systems (UAS) is fuelling a rapid advancement in the fields of network edge-communication applications for smart agriculture, smart traffic management, and border security. A common problem in UAS (a.k.a. drone systems) research and development is the cost related to deploying and running realistic testbeds. Due to the constraints in safe operation, handling limited energy resources, and government regulation restrictions, UAS testbed building is time-consuming and not easily configurable for high-scale experiments. In addition, experimenters have a hard time creating repeatable and reproducible experiments to test major hypotheses. In this paper, we present a design for performing tracebased NS-3 simulations that can be helpful for realistic UAS simulation experiments. We run experiments with real-world UAS traces including various mobility models, geospatial link information and video analytics measurements. Our experiments assume a hierarchical UAS platform with low-cost/high-cost drones co-operating using a geo-location service in order to provide a ‘common operating picture’ for decision makers. We implement a synergized drone and network simulator that features three main modules: (i) learning-based optimal scheme selection module, (ii) application environment monitoring module, and (iii) trace-based simulation and visualization module. Simulations generated from our implementation have the ability to integrate di↵erent drone configurations, wireless communication links (air-to-air; air-to-ground), as well as mobility routing protocols. Our approach is beneficial to evaluate network-edge orchestration algorithms pertaining to e.g., management of energy consumption, video analytics performance, and networking protocols configuration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a pipeline and prototype vision system for near-real-time semantic segmentation and classification of objects such as roads, buildings, and vehicles in large high-resolution wide-area real-world aerial LiDAR point-cloud and RGBD imagery. Unlike previous works, which have focused on exploiting ground- based sensors or narrowed the scope to detecting the density of large objects, here we address the full semantic segmentation of aerial LiDAR and RGBD imagery by exploiting crowd-sourced labels that densely canvas each image in the 2015 Dublin dataset.1 Our results indicate important improvements to detection and segmentation accuracy with the addition of aerial LiDAR over RGB imagery alone, which has important implications for civilian applications such as autonomous navigation and rescue operations. Moreover, the prototype system can segment and search geographic areas as big as 1km2 in a matter of seconds on commodity hardware with high accuracy (_ 90%), suggesting the feasibility of real-time scene understanding on small aerial platforms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fast, efficient and robust algorithms are needed for real-time visual tracking that could also run smoothly on the airborne embedded systems. Flux tensor can be used to provide motion-based cues in visual tracking. In order to use any object motion detection on a raw image sequence captured by a moving platform, the motion caused by the camera movement must be stabilized first. Using feature points to estimate the homography matrix between the frames is a simple registration method that can be used for the stabilization. In order to have a good homography estimation, most of the feature points should lay on the same plane in the images. However, when the scene has complex structures it becomes very challenging to estimate a good homography. In this work, we propose a robust video stabilization algorithm which allows the flux motion detection to efficiently identify moving objects. Our experiments show satisfactory results when other methods shown to fail on the same type of raw videos.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion imagery with geospatial metadata are recordings that are used to provide information about the observed scene. Given the comparably high speed and agility of the sensor platform (usual some kind of aircraft), metadata has to be synchronized very accurately to each individual recorded image to yield accurate results. The quality of the geospatial metadata can be evaluated by a 3D reconstruction of a motion imagery sequence (with software like Agisoft Metashape1 or COLMAP2, 3) and comparison of the reconstructed camera poses with the camera poses derived from the metadata. The obtained results so far suggest that a miss synchronization between the video frames and the metadata is often one of the largest sources of inaccuracies of the geospatial metadata and one of the most easy to avoid. For this reason, we assembled our own system with a commercially available image sensor and metadata module that can be attached to a small aircraft and evaluated the quality of its metadata on a test flight. This article describes the used system and the result of the metadata calibration4 performed to evaluate the quality of the metadata and its synchronization to the image frames.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, a rapid training data and ground truth generation tool has been implemented for visual tracking. The proposed tool's plugin structure allows integration, testing, and validation of different trackers. The tracker can be paused, resumed, forwarded, rewound and re-initialized on the run, after it loses the object, which is a needed step in the training data generation. This tool has been implemented to assist researchers to rapidly generate ground truth and training data, fix annotations, run and visualize their own single object trackers, or existing object tracking techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synthetic data has shown to be an effective proxy for real data in order to train computer vision algorithms when acquiring labeled data is costly or impossible. Ship detection and classification from satellite imagery and surveillance video is one such area, and images generated using gaming engines such as Unity3D have been used successfully to circumvent the need for annotated real data. However, there is a lack of understanding of the effect of rendering quality of 3D models on algorithms that use synthetic data. In this work, we investigate how the level of detail (LOD) of objects in a maritime scene affects ship classification algorithms. To study this systematically, we create datasets featuring objects with varying LODs and observe their significance in computer vision algorithms. Specifically, we evaluate the impact of mismatched LOD datasets on classification algorithms, and investigate the effect of low or high LOD datasets on a model's ability to transfer to real data. The LOD of 3D objects are quantified using image quality metrics while the performance of computer vision algorithms is compared using accuracy metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are several factors that should be considered for robust terrain classification. We address the issue of high pixel-wise variability within terrain classes from remote sensing modalities, when the spatial resolution is less than one meter. Our proposed method segments an image into superpixels, makes terrain classification decisions on the pixels within each superpixel using the probabilistic feature fusion (PFF) classifier, then makes a superpixel-level terrain classification decision by the majority vote of the pixels within the superpixel. We show that this method leads to improved terrain classification decisions. We demonstrate our method on optical, hyperspectral, and polarimetric synthetic aperture radar data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The objective of this paper is to detect the type of vegetation so that a more accurate Digital Terrain Model (DTM) can be generated by excluding the vegetation from the Digital Surface Model (DSM) based on the vegetation type (such as trees). This way, many different inpainting methods can be applied subsequently to restore the terrain information from the removed vegetation pixels from DSM and obtain a more accurate DTM. We trained three DeepLabV3+ models with three different datasets that are collected at different resolutions. Among the three DeepLabV3+ models, the model trained with the dataset that has an image resolution close to the test data images provided the best performance and the semantic segmentation results with this model looked highly promising.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To accurately extract digital terrain model (DTM), it is necessary to remove heights due to vegetation such as trees and shrubs and other manmade structures such as buildings, bridges, etc. from the digital surface model (DSM). The resulting DTM can then be used for construction planning, land surveying, etc. Normally, the process of extracting DTM involves two steps. First, accurate land cover classification is required. Second, an image inpainting process is needed to fill in the missing pixels due to trees, buildings, bridges, etc. In this paper, we focus on the second step of using image inpainting algorithms for terrain reconstruction. In particular, we evaluate seven conventional and deep learning based inpainting algorithms in the literature using two datasets. Both objective and subjective comparisons were carried out. It was observed that some algorithms yielded slightly better performance than others.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The interpretability of an image indicates the potential intelligence value of the data. Historically, the National Imagery Interpretability Rating Scale (NIIRS) has been the standard for quantifying the intelligence potential based on image analysis by human observers. Empirical studies have demonstrated that spatial resolution is the dominant predictor of the NIIRS level of an image. Today, the value of imagery is no longer simply determined by spatial resolution, since additional factors such as spectral diversity and temporal sampling are significant. Furthermore, analyses are performed by machines as well as humans. Consequently, NIIRS no longer accurately quantifies potential intelligence value for an image or set of images. We are exploring new measures of information potential based on mutual information. Our research suggests that new measures of image “quality” based on information theory can provide meaningful standards that go beyond NIIRS. In our approach, mutual information provides an objective method for quantifying divergence across objects and activities in an image. This paper presents the rationale for our approach, the technical description, and the results of early experimentation to explore the feasibility of establishing an information-theoretic standard for quantifying the intelligence potential of an image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Generating imagery using gaming engines has become a popular method to both augment or completely replace the need for real data. This is due largely to the fact that gaming engines, such as Unity3D and Unreal, have the ability to produce novel scenes and ground-truth labels quickly and with low-cost. However, there is a disparity between rendering imagery in the digital domain and testing in the real domain on a deep learning task. This disparity/gap is commonly known as domain mismatch or domain shift, and without a solution, renders synthetic imagery impractical and ineffective for deep learning tasks. Recently, Generative Adversarial Networks (GANs) have shown success at generating novel imagery and overcoming this gap between two different distributions by performing cross-domain transfer. In this research, we explore the use of state-of-the-art GANs to perform a domain transfer between a rendered synthetic domain to a real domain. We evaluate the data generated using an image-to-image translation GAN on a classification task as well as by qualitative analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Maritime situational awareness depends on accurate knowledge of the locations, types, and activities of ocean-bound vessels. Such data can be gathered by analyzing the motion patterns of vessel tracks collected using coastal radar, visual identification, and Automatic Identification System (AIS) reports. We have developed a technique for predicting the types of vessels from abstract representations of their motion patterns. Our approach involves constructing multiple state sequences which represent activities syntactically. From these sequences, we generate multi-state transition matrices, which are the central feature used to train a support-vector machine classifier. Applying this technique to historical AIS data, our model successfully predicts vessel type even in cases where vessels do not follow known routes. Using only location information as the base feature for our model, we circumvent classification issues that arise from vessels' non-compliance with AIS regulations as well as the inability to visually identify vessels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The effective detection of urban development is the basis of understanding urban sustainability. Although various studies concentrated on long-time-series analysis on urban development, the resolution of images was too low to focus on a single object. In this paper, we provide a long-time-series analysis of built-up areas at an annual frequency in Beijing, China, from 2000 to 2015, based on the automatic building extraction and high-resolution satellite images. We propose a deeplearning based method to extract buildings, and employ an ensemble learning method to improve the localization of boundaries. The time-series results of built-up areas are analyzed based on two schemes, i.e., change detection over the past fifteen years and evaluation of the whole region in three selected years. Our proposed method achieves an average overall accuracy (OA) of 93%. The results reveal that Beijing developed more rapidly during 2001-2008 than other periods in terms of the density and the number of buildings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.