KEYWORDS: Convolution, Field programmable gate arrays, Digital signal processing, Windows, Object detection, Design, Convolutional neural networks, Computer hardware, Detection and tracking algorithms, Deep learning
Most deep learning models for object detection are designed based on convolutional neural networks, requiring powerful computing and storage capabilities typically provided by hardware platforms such as GPUs and CPUs. In contrast, FPGAs offer low power consumption and strong computational capabilities; however, deploying neural network models directly on FPGA embedded platforms is challenging. To address these issues, this paper takes the YOLO-V3 target detection algorithm as an example, introducing the hierarchical structure of the YOLO-V3 network, analyzing acceleration methods for each layer in the YOLO-V3 network, designing a convolutional neural network accelerator, and comparing its performance with that of GPUs. The designed accelerator effectively utilizes FPGA hardware computing resources, achieving an overall average performance of 192.229 GOP/s.
Tracking specific objects in images or videos is one of the most attractive problems in visual tasks. It is widely employed in security monitoring, automatic driving, military operations and other scenes. Recently, object tracker based on convolution neural network, especially Siamese network, obtains high accuracy and has been deeply studied. However, in practical application scenarios of visual tracking, when meets clutter background or the object is occluded, the accuracy of the tracking task will drop rapidly, and the tracker loses the target in extreme cases. It is particularly necessary to quickly and accurately relocate the target. Therefore, an anti-interference tracker based on Siamese convolution neural network is developed. Benefiting from the adaptive tracking confidence parameter, once the tracking effect of the tracker has dropped significantly during the tracking process, the location of the object will be corrected immediately. Experimental results show that the proposed method has the ability to relocate and track the target after occlusion or loss effectively.
Along with the prosperity and development of computer vision technologies, fine-grained visual classification (FGVC) has now become an intriguing research field due to its broad application prospects. The major challenges of fine-grained classification are mainly two-fold: localization of discriminative region and extraction of fine-grained features. The attention mechanism is a common choice for current state-of-art (SOTA) methods in the FGVC that can significantly improve the performance of distinguishing among fine-grained categories. The attention module in different designs is utilized to capture the discriminative region, and region-based feature representation encodes subtle inter-class differences. However, the attention mechanism without proper supervision may not learn to provide informative guidance to the discriminative region, thus could be meaningless in the FGVC tasks that lack part annotations. We propose a weakly-supervised attention mechanism that integrates visual explanation methods to address confusing issues in the discriminative region localization caused by the absence of supervision and avoid labor-intensive bounding box/part annotations in the meanwhile. We employ Score-CAM, a novel post-hoc visual explanation method based on class activation mapping, to provide supervision and constrain the attention module. We conduct extensive experiments and show that the proposed method outperforms the current SOTA methods in three fine-grained classification tasks on CUB Birds, FGVC Aircraft, and Stanford Cars.
Compared with traditional Lidar, photon counting laser radar uses a high repetition rate and low pulse energy detection mechanism. When the echo photon is very weak, the echo signal can be extracted by increasing the counting time of TCSPC. However, it becomes difficult to detect fast-moving targets within a long counting time, especially under the interference of high background noise. In this article, we propose a dynamic ranging algorithm based on target motion parameter matching for the application scenario of long-distance moving target ranging, which solves the problem of fast real-time ranging of remote dynamic small targets in noisy environment, and realizes the distance Measure synchronously with speed. Further, in order to improve the application effect of the dynamic ranging algorithm, we transplanted the algorithm to the FPGA platform.
Visual object tracking is one of the most attractive issue in computer vision. Recently, deep neural network has been widely developed in object tracking and showing great accuracy. In general, the accuracy of tracking task decreases dramatically when the background becomes complex or occluded. Thus, a robust tracking method based on convolutional neural network and anti-occlusion mechanic is presented. Benefit from the adaptive tracking confidence parameter T, the tracking effect is evaluated during tracking. Once the target is occluded, the location of the target object is corrected immediately. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance.
Dynamic range compression and contrast enhancement are the key steps of infrared imaging. Reasonable dynamic range compression should not destroy the gray distribution relationship between adjacent pixels. Most of the existing dynamic range compression algorithms do not take maintaining the gray distribution relationship between adjacent pixels as the basic principle of algorithm design. After dynamic range compression, the gray distribution of adjacent pixels can not be consistent with that before compression, which may lead to gradient reversal, edge halo, and some algorithms have the problem that the whole image is smooth, but the details are lost seriously. An infrared image dynamic range compression algorithm with the characteristics of neighborhood gra y distribution preservation is proposed based on the principle of keeping the gray distribution of neighboring pixels. The algorithm is based on the commonly used segmented linear transformation algorithm. In order to minimize the loss of image details in dynamic range compression, local factors are introduced into the global transformation to reduce the loss of overall image details. The specific method is to add the description operator of gray distribution of adjacent pixels in the calculation of transformation parameters. The algorithm effectively improves the image details, and can obtain good display effect for the original infrared image with high dynamic range. The experimental results show that the algorithm is better than the segmented linear transformation algorithm in displaying the original infrared image with high dynamic range.
Visual object tracking is one of the most attractive issue in computer vision. Recently, deep neural network has been widely developed in object tracking and showing great accuracy. In general, the accuracy of tracking task decreases dramatically when the background becomes complex or occluded. Here, we propose an end-to-end lightweight Siamese convolution neural network to achieve fast and robust target tracking especially for infrared target. The network structure replaces the hand-crafted features by the multi-layers deep convolution features of the target, so that higher precision can be achieved. Specifically, object location is updated in every frame by refreshing a response-map. However, the success rate of tracking task decreases dramatically when the background becomes complex or occluded. Consequently, a simple and robust anti-occlusion tracking method is presented. The tracking accuracy is evaluated during tracking process by computing the tracking confidence parameters. The parameters are composed of two parts: target confusion degree which indicates the degree of background interference and target occlusion degree which indicates the degree of target occlusion. Once the target is occluded, the location of the target object is corrected immediately. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance on the popular OTB50 and OTB100 benchmarks.
A panoramic monitoring system is designed to achieve continuous monitoring of the surrounding environment. The image acquisition module of the system is composed of five fixed-focal-length cameras and a variable-focal-length camera, which realizes 360 degree environmental monitoring. Usually, the background of continuous photography changes due to fluctuations of ambient light, humidity and wind. Therefore, a dynamic adaptive threshold is used to dynamically update the background template in order to better accommodate various weather changes. Further, a motion-aware algorithm based on background updates is applied to effectively detect whether an intruding target exists and determine the direction of the target. Once an intrusive target is found, the deep convolution neural network Yolo is employed to recognize the target quickly. It shows the advantages of less computation and preferable detection accuracy. In addition, according to the preset warning level, when the intrusion target needs to be alarmed, the target orientation is transmitted to the platform through the central control processing unit, so that the variable-focal-length camera can take real-time snapshots. we propose an end-to-end lightweight siamese convolution neural network to achieve fast and robust target tracking. The network structure replaces the hand-crafted features by the multi-layers deep convolution features of the target, so that higher precision can be achieved. The experiment result shows panoramic surveillance system can effectively and robustly perform security tasks such as panoramic imaging, target recognition and fast target tracking.
Pedestrian detection is the major task of many infrared surveillance system. Due to the technical limitation of sensor or the high cost of advanced hardware, the resolution of infrared images is usually low, which is not capable of meeting the high quality requirement of various applications. Compressed sensing capturing and represents compressible signals at a sample rate significantly below the Nyquist rate, is considered as a new framework for signal reconstruction based on the sparsity and compressibility. Thus, the compressed sensing theory enlightens a computational way to reconstruct a high resolution image on the basis of a sparse signal, i.e. the low resolution image. The proposed method use low resolution and high resolution infrared pedestrian images to train an over-complete dictionary through K-SVD algorithm, by which the pedestrian are sparsely well-represented. Two distant infrared cameras in the same scene are used to capture high and low resolution image to make sure same pedestrian pair is sparsely represented under the over-complete dictionary. Therefore the similarities are learning between input low resolution image patches and high resolution image patches. The popular greedy algorithm Orthogonal Matching Pursuit (OMP) is utilized for sparse reconstruction, providing optimal performance and guaranteeing less computational cost and storage. We evaluate the quality of reconstructed image employing root mean square error and peak signal to noise. The experimental results show that the reconstructed images preserve wealthy detailed information of pedestrian, and have low RMSE and high PSNR, which are superior to the traditional super-resolution methodologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.