In real-world scenarios, a target tracking system could be severely compromised by interactions, i.e., influences from the proximity and/or behavior of other targets or background objects. Closely spaced targets are difficult to distinguish, and targets may be partially or totally invisible for uncontrolled durations when occluded by other objects. These situations are very likely to degrade the performance or cause the tracker to fail because the system may use invalid target observations to update the tracks. To address these issues, we propose an integrated multitarget tracking system. A background-subtraction–based method is used to automatically detect moving objects in video frames captured by a moving camera. The data association method evaluates the overlap rates between newly detected objects (observations) and already-tracked targets and makes decisions pertaining to whether a target is interacting with other targets and whether it has a valid observation. According to the association results, distinct strategies are employed to update and manage the tracks of interacting versus well-isolated targets. This system has been tested with real-world airborne videos from the DARPA Video Verification of Identity program database and demonstrated excellent track continuity in the presence of occlusions and multiple target interactions, very low false alarm rate, and real-time operation on an ordinary general-purpose computer.
In real-world target tracking scenarios, interactions among multiple moving targets can severely compromise the performance of the tracking system. Targets involved in interactions are typically closely-spaced and are often partially or entirely occluded by other objects. In these cases, valid target observations are unlikely to be available. To address this issue, we present an integrated multi-target tracking system. The data association method evaluates the overlap rates between newly detected objects (target observations) and already-tracked targets, and makes decisions pertaining to whether a target is interacting with other targets and whether it has a valid observation. Thus, the system is capable of recognizing target interactions and will reject invalid target observations. According to the association results, distinct strategies are adopted to update and manage the tracks of interacting versus well-isolated targets. Testing results on real-world airborne video sequences demonstrate the excellent performance of the proposed system for tracking targets with multiple target interactions. Moreover, the system operates in real time on an ordinary desktop computer.
KEYWORDS: Digital signal processing, Target detection, Detection and tracking algorithms, Video, Signal processing, System on a chip, Motion estimation, Cameras, Video processing, Optical flow
In this paper, we propose a real-time embedded video target tracking algorithm for use with real-world airborne video. The proposed system is designed to detect and track multiple targets from a moving camera in complicated motion scenarios such as occlusion, closely spaced targets passing in opposite directions, move-stop-move, etc. In our previous work, we developed a robust motion-based detection and tracking system, which achieved real-time performance on a desktop computer. In this paper, we extend our work to real-time implementation on a Texas Instruments OMAP 3730 ARM + DSP embedded processor by replacing the previous sequential motion estimation and tracking processes with a parallel implementation. To achieve real-time performance on the heterogeneous-core ARM + DSP OMAP platform, the C64x+ DSP core is utilized as a motion estimation preprocessing unit for target detection. Following the DSP-based motion estimation step, the descriptors of potential targets are passed to the general-purpose ARM Cortex A8 for further processing. Simultaneously, the DSP begins preprocessing the next frame. By maximizing the parallel computational capability of the DSP, and operating the DSP and ARM asynchronously, we reduce the average processing time for each video frame by up to 60% as compared to an ARM-only approach.
In real-world outdoor video, moving targets such as vehicles and people may be partially or fully occluded by
background objects such as buildings and trees, which makes tracking them continuously a very challenging
task. In the present work, we present a system to address the problem of tracking targets through occlusions in
a motion-based target detection and tracking framework. For an existing track that is fully occluded, a Kalman
filter is applied to predict the target's current position based upon its previous locations. However, the prediction
may drift from the target's true trajectory due to accumulated prediction errors, especially when the occlusion
is of long duration. To address this problem, tracks that have disappeared are checked with an extra data
association procedure that evaluates the potential association between the track and the new detections, which
could be a previously tracked target that is just coming out of occlusion. Another issue that arises with motion-based
tracking is that the algorithm may consider the visible part of a partially occluded target as the entire
target region. This is problematic because an inaccurate target motion trajectory model will be built, causing
the Kalman filter to generate inaccurate target position predictions, which can yield a divergence between the
track and the true target trajectory. Accordingly, we present a method that provides reasonable estimates of the
partially-occluded target centers. Experimental results conducted on real-world unmanned air vehicle (UAV)
video sequences demonstrate that the proposed system significantly improves the track continuity in various
occlusion scenarios.
In video tracking systems using image subtraction for motion detection, the global motion is usually estimated to compensate for the camera motion. The accuracy and robustness of the global motion compensation critically affects the performance of the target tracking process. The global motion between video frames can be estimated by matching the features from the image background. However, the features from moving targets contain both camera and target motion and should not be used to calculate the global motion. Sparse optical flow is a classical image matching method. However, the image features selected by optical flow may come from moving targets, with some of the image features matched not being accurate, which leads to poor video tracking performance. Least Median of Square (LMedS) is a popular robust linear regression model and has been applied to real-time video tracking systems implemented in hardware to process up to 7.5 frames/second. In this paper, we use a robust regression method to select features only from the image background for robust global motion estimation, and we develop a real-time (10 frames/second), software-based video tracking system that runs on an ordinary Windows-based general-purpose computer. The software optimization and parameter tuning for real-time execution are discussed in detail. The tracking performance is evaluated with real-world Unmanned Air Vehicle (UAV) video, and we demonstrate the improved global motion estimation in terms of accuracy and robustness.
In this paper, a novel system is presented to detect and track multiple targets in Unmanned Air Vehicles
(UAV) video sequences. Since the output of the system is based on target motion, we first segment foreground
moving areas from the background in each video frame using background subtraction. To stabilize the video, a
multi-point-descriptor-based image registration method is performed where a projective model is employed to
describe the global transformation between frames. For each detected foreground blob, an object model is used
to describe its appearance and motion information. Rather than immediately classifying the detected objects as
targets, we track them for a certain period of time and only those with qualified motion patterns are labeled as
targets. In the subsequent tracking process, a Kalman filter is assigned to each tracked target to dynamically
estimate its position in each frame. Blobs detected at a later time are used as observations to update the state
of the tracked targets to which they are associated. The proposed overlap-rate-based data association method
considers the splitting and merging of the observations, and therefore is able to maintain tracks more consistently.
Experimental results demonstrate that the system performs well on real-world UAV video sequences. Moreover,
careful consideration given to each component in the system has made the proposed system feasible for real-time
applications.
Image mosaicking is the process of piecing together multiple video frames or still images from a moving camera
to form a wide-area or panoramic view of the scene being imaged. Mosaics have widespread applications in many
areas such as security surveillance, remote sensing, geographical exploration, agricultural field surveillance, virtual
reality, digital video, and medical image analysis, among others. When mosaicking a large number of still images
or video frames, the quality of the resulting mosaic is compromised by projective distortion. That is, during the
mosaicking process, the image frames that are transformed and pasted to the mosaic become significantly scaled
down and appear out of proportion with respect to the mosaic. As more frames continue to be transformed,
important target information in the frames can be lost since the transformed frames become too small, which
eventually leads to the inability to continue further. Some projective distortion correction techniques make
use of prior information such as GPS information embedded within the image, or camera internal and external
parameters. Alternatively, this paper proposes a new algorithm to reduce the projective distortion without
using any prior information whatsoever. Based on the analysis of the projective distortion, we approximate the
projective matrix that describes the transformation between image frames using an affine model. Using singular
value decomposition, we can deduce the affine model scaling factor that is usually very close to 1. By resetting the
image scale of the affine model to 1, the transformed image size remains unchanged. Even though the proposed
correction introduces some error in the image matching, this error is typically acceptable and more importantly,
the final mosaic preserves the original image size after transformation. We demonstrate the effectiveness of this
new correction algorithm on two real-world unmanned air vehicle (UAV) sequences. The proposed method is
shown to be effective and suitable for real-time implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.