Multispectral imaging is an attractive sensing modality for small unmanned aerial vehicles (UAVs) in numerous military and civilian applications such as reconnaissance, target detection, and precision agriculture. Cameras based on patterned filters in the focal plane, such as conventional colour cameras, represent the most compact architecture for spectral imaging, but image reconstruction becomes challenging at higher band counts. We consider a camera configuration where six bandpass filters are arranged in a periodically repeating pattern in the focal plane. In addition, a large unfiltered region permits conventional monochromatic video imaging that can be used for situational awareness (SA), including estimating the camera motion and the 3D structure of the ground surface. By platform movement, the filters are scanned over the scene, capturing an irregular pattern of spectral samples of the ground surface. Through estimation of the camera trajectory and 3D scene structure, it is still possible to assemble a spectral image by fusing all measurements in software. The repeated sampling of bands enables spectral consistency testing, which can improve spectral integrity significantly. The result is a truly multimodal camera sensor system able to produce a range of image products. Here, we investigate its application in tactical reconnaissance by pushing towards on-board real-time spectral reconstruction based on visual odometry (VO) and full 3D reconstruction of the scene. The results are compared with offline processing based on estimates from visual simultaneous localisation and mapping (VSLAM) and indicate that the multimodal sensing concept has a clear potential for use in tactical reconnaissance scenarios.
Image segmentation is one of the key components in systems performing computer vision recognition tasks. Various algorithms for image segmentation have been developed in the literature. Among them, more recently, deep learning algorithms have been remarkably successful in performing this task. A downside with deep neural networks for segmentation is that they require a large amount of labeled dataset for training. This prerequisite is one of the main reasons that led researchers to adopt data augmentation approaches in order to minimize manual labeling efforts while maintaining highly accurate results. This paper uses classical non-deep learning methods for background extraction to increase the size of the dataset used to train deep learning attention segmentation algorithms when images are presented as time-series to the model. The method presented adopts the Gaussian mixture-based (MOG2) foreground-background segmentation followed by dilation and erosion to create masks necessary to train the deep learning models. It is applied in the context of planktonic images captured in situ as time series. Various evaluation metrics and visual inspection are used to compare the performance of the deep learning algorithms. Experimental results show higher accuracy achieved by the deep learning algorithms for time-series image attention segmentation when the proposed data augmentation methodology is utilized to increase the training dataset.
Deep convolutional neural networks have proven effective in computer vision, especially in the task of image classification Nevertheless, the success is limited to supervised learning approaches, requiring extensive amounts of labeled training data that impose time-consuming manual efforts. Unsupervised deep learning methods were introduced to overcome this challenge. The gap, however, towards achieving comparable classification accuracy to supervised learning is still significant. This paper presents a deep learning framework for images of planktonic organisms with no ground truth or manually labeled data. This work combines feature extraction methods using state-of-the-art unsupervised training schemes with clustering algorithms to minimize the labeling effort while improving the classification process based on essential features learned by the deep learning model. The models utilized in the framework are tested over existing planktonic data sets. Empirical results show that unsupervised approaches that cluster the data based on the deep learning model’s feature space representations improve the classification task and can identify classes that have not been seen during the learning process.
Frequent inspection of salmon cage integrity is essential to early detect and prevent the possible escape of farmed salmon—minimizing the risk of any negative impact for the remaining wild stock of salmon. Current state-of-the-art computer vision-based approaches can detect net irregularities under “optimal” net and illumination conditions but might fail under real-world conditions. In this paper, we present a novel modularized processing framework based on advanced computer vision and machine learning approaches to effectively detect potential net damages in video recordings from cleaner robots traversing the net cages. The framework includes a deep learning-based approach to segmenting interpretable net structure from background, transfer learning facilitated classification of potential holes from irrelevance, and computer vision-based modules for irregularity detection, filtering, and tracking. Filtering and classification are vital steps to ensure that temporally consistent holes within net structure are reported—and irrelevant objects such as by-passing fish are ignored. We evaluate our approach on representative real-world videos from real cleaning operations and show that the approach can cope with the difficult lighting conditions that are typical for aquaculture environments.
Procedurally-defined implicit functions, such as CSG trees and recent neural shape representations, offer compelling benefits for modeling scenes and objects, including infinite resolution, differentiability and trivial deformation, at a low memory footprint. The common approach to fit such models to measurements is to solve an optimization problem involving the function evaluated at points in space. However, the computational cost of evaluating the function makes it challenging to use visibility information from range sensors and 3D reconstruction systems. We propose a method that uses visibility information, where the number of function evaluations required at each iteration is proportional to the scene area. Our method builds on recent results for bounded Euclidean distance functions by introducing a coarse-to-fine mechanism to avoid the requirement for correct bounds. This makes our method applicable to a greater variety of implicit modeling techniques, for which deriving the Euclidean distance function or appropriate bounds is difficult.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.