Many pedestrian detection research works focused on the improvement of detection performance, without considering the detection speed, making the detection algorithms not applicable for real-world requirement for real-time processing. To explore this problem, we first propose a pre-processing method Hierarchical HOG Matrices to replace the traditional integral histogram of gradients, which stores more data in the pre-processing phase to reduce computation time. A matrix-based detection computation structure is also proposed, which organize the massive data computations in the scanning detection process into matrix operations to optimize the overall speed. We then add multiple instance learning into the fast pedestrian detection algorithm to further enhance its accuracy. Experiments demonstrate that the proposed fast and robust pedestrian detection algorithm based on the multiple instance feature achieves an accuracy comparable to the latest algorithms, with the best speed among the algorithms with an accuracy of the same level.
By transferring of prior knowledge from source domains and synthesizing the new knowledge extracted from the target domain, the performance of learning can be improved when there are insufficient training data in the target domain. In this paper we propose a new method to transfer a deformable part model (DPM) for object detection, using sharable filters from offline-trained auxiliary DPMs of similar categories and new filters learnt from the target training samples to improve the performance of the target object detector. A DPM consists of a collection of root and part filters. The filters of the auxiliary detectors capture the sharable appearance features and can be used as prior knowledge. The sharable filters are employed by the new detector with a coefficient reweighting algorithm to fit the target object much better. Meanwhile the target object still has some distinct local appearance features that the part filters in the auxiliary filter pool can not represent. Hence, new part filters will be learnt with the training samples of the target object and added to the filter pool as complementary. The final learnt model will be an assembly of transferred auxiliary filters and additional target filters. With a latent transfer learning algorithm, appropriate local features are extracted for the transfer of the auxiliary filters and the description of the distinct target filters. Our experiments demonstrate that the proposed strategy precedes some state-of-the-art methods.
Face recognition in surveillance is a hot topic in computer vision due to the strong demand for public security and remains a challenging task owing to large variations in viewpoint and illumination of cameras. In surveillance, image sets are the most natural form of input by incorporating tracking. Recent advances in set-based matching also show its great potential for exploring the feature space for face recognition by making use of multiple samples of subjects. In this paper, we propose a novel method that exploits the salient features (such as eyes, noses, mouth) in set-based matching. To represent image sets, we adopt the affine hull model, which can general unseen appearances in the form of affine combinations of sample images. In our proposal, a robust part detector is first used to find four salient parts for each face image: two eyes, nose, and mouth. For each part, we construct an affine hull model by using the local binary pattern histograms of multiple samples of the part. We also construct an affine model for the whole face region. Then, we find the closest distance between the corresponding affine hull models to measure the similarity between parts/face regions, and a weighting scheme is introduced to combine the five distances (four parts and the whole face region) to obtain the final distance between two subjects. In the recognition phase, a nearest neighbor classifier is used. Experiments on the public ChokePoint dataset and our dataset demonstrate the superior performance of our method.
High-performance pedestrian detection with good accuracy and fast speed is an important yet challenging task in computer vision. We design a novel feature named pair normalized channel feature (PNCF), which simultaneously combines and normalizes two channel features in image channels, achieving a highly discriminative power and computational efficiency. PNCF applies to both gradient channels and color channels so that shape and appearance information are described and integrated in the same feature. To efficiently explore the formidably large PNCF feature space, we propose a statistics-based feature learning method to select a small number of potentially discriminative candidate features, which are fed into the boosting algorithm. In addition, channel compression and a hybrid pyramid are employed to speed up the multiscale detection. Experiments illustrate the effectiveness of PNCF and its learning method. Our proposed detector outperforms the state-of-the-art on several benchmark datasets in both detection accuracy and efficiency.
Gaussian Mixture Model (GMM) for background subtraction (BGS) is widely used for detecting and tracking objects in
video sequences. Although the GMM can provide good results, low processing speed has become its bottleneck for realtime
applications. We propose a novel method to accelerate the GMM algorithm based on graphics processing unit
(GPU). As GPU excels at performing massively parallel operations, the novelty lies in how to adopt various optimization
strategies to fully exploit GPU's resources. The parallel design consists of three levels. On the basis of first-level
implementation, we employ techniques such as memory access coalescing and memory address saving to the secondlevel
optimization and the third-level modification, which reduces the time cost and increases the bandwidth greatly.
Experimental results demonstrate that the proposed method can yield performance gains of 145 frames per second (fps)
for VGA (640*480) video and 505 fps for QVGA (320*240) video which outperform their CPU counterparts by 24X and 23X speedup respectively. The resulted surveillance system can process five VGA videos simultaneously with strong robustness and high efficiency.
Video sequences captured by handheld digital camera need to be stabilized to eliminate the tiresome effects caused by
camera's undesirable shake or jiggle. The key issue of video stabilization is to estimate the global motion parameters
between two successive frames. In this paper, a novel circular block matching algorithm is proposed to estimate the
global motion parameters. This algorithm can deal with not only translational motion but even large rotational motion.
For an appointed circular block in current frame, a four-dimensional rotation invariant feature vector is firstly extracted
from it and used to judge if it is an effective block. Then the rotation invariant features based circular block matching
process is performed to find the best matching blocks in reference frame for those effective blocks. With the matching
results of any two effective blocks, a two-dimensional motion model is constructed to produce one group of frame
motion parameters. A statistical method is proposed to calculate the estimated global motion parameters with all groups
of global motion parameters. Finally, using the estimated motion parameters as the initial values, an iteration algorithm is
introduced to obtain the refined global motion parameters. The experimental results show that the proposed algorithm is
excellent in stabilizing frames with even burst global translational and rotational motions.
Background modeling and estimation is essential in motion segmentation and object tracking for videos captured by
stationary cameras with fixed focal lengths. The Gaussian Mixture Models (GMMs) are extensively adopted to deal with
non-monomodal background pixels. To model each of the non-stationary stochastic pixel processes, the GMMs have to
be properly updated, especially for outdoor surveillance applications. Varying illumination condition and uncertain noise
are the main factors to which background subtraction algorithms should adapt. Filtering methods, such as Wiener
prediction, Kalman Filter (KF), and adaptive KF have been proposed to solve this problem. However, they rely on
critical tuned parameters and are too time consuming to be applied to a whole frame. We developed a novel adaptive
Kalman filter which adjusts the steady state Kalman gain depending on the normalized correlation of the innovation
sequence. It is used to accurately update gradually changing background models in real time without empirical parameter
selection. In order to avoid accumulated errors statistically in the subtraction stage, the threshold corresponding to a pixel
is adapted to its neighborhood condition basing on Markov random fields (MRF) model. Experiments on real world
video data yield satisfactory results; prove our scheme robust, accurate and efficient.
In recent years, various spatial error concealment techniques have been proposed only by partially considering the four following factors: 1) image continuity, 2) edge preservation, 3) texture recovery and 4) concealment complexity. Therefore, either the results of edge and texture recovery are unsatisfactory or the computation loan is too heavy to be acceptable. Aiming to overcome the above problems, a strap-based (strap means a set of consecutive macroblocks in horizon) framework, instead of block-based framework, is introduced in this paper. Within this framework, the content of each corrupted strap is classified into four classes: smooth area, edge area, low detail area and high detail area. Then a suitable method is selected for each class. Briefly speaking, bilinear interpolation, directional interpolation, best neighborhood match and Markov Random Field (MRF) model-based maximum a posterior (MAP) estimation are employed to conceal the above classes. During content classification, the gradient information of those pixels near the corrupted strap is calculated and presented as gradient points on a unit circle. The centroid, scatter degree and average gradient magnitude of those gradient points are calculated and used to classify the corrupted content. The results of our experiments demonstrate the efficiency of the proposed method and the impressive improvements in both objective and subjective measures have been achieved.
Selective enhancement mechanism of Fine-Granular-Scalability (FGS) In MPEG-4 is able to enhance specific objects under bandwidth variation. A novel technique for self-adaptive enhancement of interested regions based on Motion Vectors (MVs) of the base layer is proposed, which is suitable for those video sequences having still background and what we are interested in is only the moving objects in the scene, such as news broadcasting, video surveillance, Internet education, etc. Motion vectors generated during base layer encoding are obtained and analyzed. A Gaussian model is introduced to describe non-moving macroblocks which may have non-zero MVs caused by random noise or luminance variation. MVs of these macroblocks are set to zero to prevent them from being enhanced. A segmentation algorithm, region growth, based on MV values is exploited to separate foreground from background. Post-process is needed to reduce the influence of burst noise so that only the interested moving regions are left. Applying the result in selective enhancement during enhancement layer encoding can significantly improves the visual quality of interested regions within an aforementioned video transmitted at different bit-rate in our experiments.
Error concealment is becoming increasingly important because of the growing interest in multimedia transmission over unreliable channels such as wireless channel. At present most concealment method has its own advantage as well as applicable limitation. In different case, it can achieve different concealment effect. In our paper, we present a novel feature-based image error detection and error concealment algorithm to improve the image quality which was degraded during its transmission over wireless channel. First a simulation channel based on Rayleigh mode is implemented to emulating the actual wireless fading channel characterized by fading, multipath and Doppler frequency shift. The damaged image blocks are detected by exploring the contextual information in images, such as their consistency and edge continuity. The statistical characteristics of missing blocks are then estimated based on the types of their surrounding blocks (e.g., smoothness, texture and edge). Finally different error concealment strategies are applied to different types of blocks in order to achieve better visual quality. Instead of assuming random errors in packets, we simulate the errors of wireless channel based on the Rayleigh model. The proposed algorithm is tested on a number of still images. Simulation results demonstrate that our proposed algorithm is effective in terms of visual quality assessment and PSNR.
In this paper, we present a novel scheme to deliver the scalable video over priority network. Firstly, we describe the background on the scalable video transmission over the network and the motivation of this research. Secondly, a new scheme is proposed, covering bitstream classification, prioritization and packetization. Thirdly, we present a simple and effective mechanism of rate control and adaptation by selectively dropping packet aiming to minimize the end-to-end distortion. We also describe a framework of transmission. Simulations show that our scheme is also effective to video multicast scenario.
KEYWORDS: Video, Internet, Error control coding, Distortion, Forward error correction, Video compression, Video coding, Scalable video coding, Monte Carlo methods, Control systems
This paper proposes a novel robust framework for scalable video over Internet. The main contribute of our work is that a simplified rate-distortion theory is specially developed for scalable bit-stream over the network and the corresponding bit allocation is presented to determine the sets of the channel rate for each video layer. Compared with the traditional iterative optimal bit allocation with complexity O(nL) time, simulations show that our scheme achieves high quality video transmission with much less complexity O(L x n) time, only no more than 0.2dB under different network conditions (different bandwidth and different packet loss case). Besides, our error control scheme can be naturally combined with congestion control and error resilient techniques to enhance the performance of the overall system.
The paper provides a novel algorithm for face rendering applications. Ensuring algorithms of low complexity to render virtual humans in VLBR networks is at the heart of our new facial rendering system. The system differs from others such as parametric animation models and interpolation solutions. The novelties include a dual segment growing algorithm and a heat diffusion rendering method. The extracting process takes into account information both in gradient domain and topographic feature. And segments are used to carry this information, which greatly reduces the transmitted packet size. Face rendering is based on this segment and is carried out like a heat diffusive process. Experimental results, as reported in following, prove that this proposed system. Furthermore this scheme can be extended to deal with more general video or image analysis and synthesis systems.
With the emerging of the third generation (3G) wireless technology, digital media, like image and video, over wireless channel becomes more and more demanding. In this paper, the measure metrics for the wireless image is proposed and a Qos-guarantee error control is presented, combining UEP with Forward Error Correction (FEC) and Automatic Repeat reQuest (ARQ), aiming to high quality image transmission with short delay and little energy. Simulation results show that our scheme can achieve good reconstructed image with few retransmission times and small bit budget under different channel conditions, which can reduce the energy consumed in the network interface.
Rate Control is an important component in a video encoder for date storage or real-time visual communication. In this paper, we will discuss the rate control in MPEG encoder for real-time video communication over Variable Bit Rate (VBR) channel. In interactive video communication, the video transmission is subject to both channel rate constraints and end-to-end delay constraints. Our goal in this paper is to modify the rate control in MPEG-2 encoder and satisfy the rate constraints, and study how to improve the video quality in the scenario of VBR transmission. Here, we employ Leaky- Bucket to describe the traffic parameters and monitor the encoder's output. Depending on the Rate-distortion models developed by us, we present a rate control algorithm to achieve almost uniform distortion both within a frame and between frames in a scene. With adaptive rate-distortion models and additional function of scene detecting, our method can robustly deal with scenes of different statistical characteristics. Comparing to MPEG2 TM5, in real time video communication, we could keep the constant buffer delay while maintain the decoded image quality stable. Furthermore, the bit allocation in our algorithm is more reasonable and controllable. Therefore, our method realized the advantages that advanced by VBR video communication, such as small end-to-end delay, consistent image quality and high channel efficiency.
Imaging apparatus inevitably impose undesirable noises onto acquired images during real imaging process. Usually these noises are too faint to cause unpleasing visual effects, however, they degrade image fidelity and significantly lower the compression ratio of lossless coding. More baffling, in this case, there leaves little room for traditional noise filtering methods to work. This paper will introduce some of our efforts trying to weaken the effect of such Micro Noise during near-lossless compression. Experimental results on ISO test images and micro Gaussian noises demonstrate that with potentiality of filtering micro noise, an improved near-lossless coder can not only achieve obviously higher compression ratio but also provide better image fidelity (measured by mean squared error) than lossless coding.
Image segmentation consists of dividing an image into non- intersecting and dissimilar but meaningful regions (object and background). Thresholding is a commonly employed technique for segmenting image. Many methods for automatic selection of thresholds use optimization process in which some specific criterion functions are defined. Recently, several thresholding methods based on minimizing the cross- entropy function of images have been proposed. Cross-entropy measures the information discrepancy between two probability distributions. Derived from cross-entropy, fuzzy divergence measures the dissimilarity between two fuzzy sets. In this paper, we present four new algorithms for optimal threshold selection based on different criteria integrating cross entropy and fuzzy divergence. The first one is a minimum cross entropy algorithm based on the hypothesis of uniform probability distribution. The second one is a maximum between-class cross entropy algorithm using a posterior probability. The third one is a modified version of existing method based on maximum between-class fuzzy divergence. The last one is a minimum fuzzy divergence algorithm. According to the requirement of image thresholding, we construct a new fuzzy membership function to take into account the gray level probability distribution of object pixels and background pixels about their mean values for the last two algorithms. The effectiveness and generality of these proposed algorithms have been compared with some recent techniques based on related principles, and evaluated by using uniformity measure and shape measure with real images. Results showing the superiority of the proposed algorithms are presented.
In this paper, a boundary-control point (BCP) based motion representation scheme is proposed. According to the scheme, the dense motion field is described by the object boundary and the motion vectors located at predefined grids--the control points. Our scheme differs from the conventional block based motion representation scheme in the point that the motion field has more degree of freedom. It can represent complex motion, e.g., translation, rotation, zooming and deformation. And the motion field is generally continuous, with discontinuity only at the object boundary, so the distortion after motion compensation is mainly geometrical deformation, which is relatively insensitive to human eye compared with the block effect caused by the conventional block based scheme. A pixel threshold criterion is also proposed to evaluate the BCP based motion compensated prediction (MCP) image and to determine whether the MCP error needs to be transmitted. Finally, a BCP based video encoder is constructed. With nearly the same decoded signal-to-noise ratio about 20 - 55% bit rate saving can be achieved compared with MPEG-I, while the subjective quality of the BCP based scheme is better than that of MPEG-I. The new scheme is also quire unlike the model-based scheme for it needs no complex scenery analysis. Some promising experimental results are shown.
This paper presents a method for vectorizing line-structured engineering drawings based on window features extraction. A line-structured engineering drawing is composed of straight lines and curves (they may have different widths) as well as their ends, corners, and crosses (we call them feature points). In the paper we use 2-dimension black-run-length to trace and separate different lines. We present feature point extracting criteria to detect feature point. When any feature point is detected a small rectangle window is opened around the feature point. After an adaptive window enlarge algorithm is applied, a proper size and position of the window which is named as Window Feature is obtained. In this way, we can see whether it is really a feature point (end, corner, cross) or just a noise. We define 40 window features and with these window features, we can process and vectorize all the complicated cross points in mechanical drawings. Finally we give some experimental samples about vectorizing mechanical drawings.
Motion estimation and compensation are important tasks in image sequence coding. In this paper, we present a motion estimation scheme with multiresolution tree structure and hierarchical motion vector search. Experiments and analysis show that this scheme is not only computation efficient, but also robust. The multiresolution tree structure is further utilized in a variable block size image sequence coding scheme that incorporates the visual spatio-temporal characteristics. Both DCT and quadtree approaches are used to encode the motion compensated prediction error. Although the signal-to-noise ratio of the quadtree coded image is a little lower, the subjective quality around sharp edges is much better. Comparing our extensive simulation results with the MPEG-1 standard, we have obtained quite promising results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.