Accurate building footprint extraction from optical remote sensing images remains challenging due to the diverse appearance and complex scenarios. Although recent deep learning-based methods have been shown to greatly improve the accuracy of building footprint extraction, vanilla deep networks still suffer from ambiguous predictions of edge pixels. The building edge contains abundant location and shape information, which is important for downstream applications such as building positioning and area measurement. Therefore, the problem of inaccurate edge prediction needs to be resolved urgently. To this end, we propose a novel edge-guided network (EGNet) that makes ample use of the edge prior in an end-to-end manner. First, an edge extraction module (EEM) is proposed to extract the building edge map. Then, an edge-guidance module (EGM) is designed to utilize the edge map to guide each encoder block in extracting edge-related features. Furthermore, a multi-scale context aggregation module (MCAM) is built to enhance the feature representation by aggregating contextual semantics with different receptive fields. EGNet can effectively mine edge semantics and guide the representation learning of boundaries, achieving 75.21% and 91.16% IoU on the Massachusetts and WHU datasets, respectively. Experimental results demonstrate that the proposed EGNet has a certain superiority in both accuracy and efficiency compared with current state-of-the-art (SOTA) methods.
Morphological examination of bone marrow cells is crucial for diagnosing blood diseases. However, manual classification of bone marrow cells is time-consuming and subjective. Therefore, it is necessary to develop an autoclassification method for bone marrow cells. Although deep learning methods are commonly used for cell classification like Resnet50, they don’t take advantage of the features of bone marrow cells such as the shape features of cells. However, the shape of cell and nucleus plays a significant role in distinguishing between different cell types. In this paper, we proposed a Shape and Texture Feature Blending Network (STFB-Net) for bone marrow cells classification based on auxiliary learning. We used ResNext50 as the backbone network for STFB-Net due to its exceptional ability to extract texture features from cells. In addition, we proposed the Shape Feature Extraction Module (SFEM) to enhance the backbone network's ability for capturing shape features. SFEM shares a part of parameters with the backbone network. SFEM extracts features at multiple scales and up-samples them, then fuses multi-scale features to predict the shape of cells. We performed experiments on two bone marrow cell datasets. The results show that the proposed STFB-Net can effectively extract texture and shape features, which brings better performance than other cell classification methods. By using the Grad-CAM method to visualize the features extracted by STFB-Net, we proved the reliability and the effectiveness of the STFB-Net in extracting cell shape features.
The detection of small and dim targets in infrared image sequences is a key technology in infrared search and tracking systems. The optimization method based on low-rank and sparse decomposition is the mainstream research method in this field. Most of the existing optimization methods are based on a single frame, only exploiting spatial information but ignoring the time dimension. The multi-frame detection method makes full use of temporal and spatial information to achieve higher accuracy. However, the existing multi-frame detection method is very slow, because they rely on slow decomposition methods. To solve the problem, we propose the infrared sequence tensor model for multi-frame detection. First, we select the tensor average rank to describe the low-rank property of the infrared sequence. It can be solved quickly. Second, we introduce a new approach to extract prior information as a prior weight tensor, which can highlight the target and suppress noise and strong edges on the background. Third, we formulate the optimization equation based on the tensor average rank and prior weight, which can be solved efficiently with tensor robust principal component analysis(TRPCA). Experiments show that our proposed method has high detection accuracy. The speed of our proposed method is much faster than that of the current multi-frame optimization methods and is comparable to that of the singleframe optimization methods.
Group detection is crucial component in intelligent video surveillance, which can capture crowd motion and directly apply to emergency security in complex scenes, thus it has attracted plenty of attention in the related fields. However, the existing works cannot fully utilize the deep and precise features of the crowd. Recently, with the rapid development of deep learning and the promotion of challenging datasets, crowd density estimation has achieved the desired accuracy in single image. Since density maps can provide a high-level semantic information for the crowd, in this paper, a density map assisted scene analysis method is proposed to detect the groups in crowd scenes. The main contributions in this study are threefold: (1) Using density map-based super-pixel segmentation method to obtain the multiple image patches, which are taken as the next research objects; (2) A group detection method based on multi-view clustering is proposed. The density maps are used to construct similar graphs from the aspects of interaction, spatial distribution, motion distribution and motion pattern. (3) A post-processing strategy is designed to combine the groups with higher relevance to determine the final group. The experimental results show that the method can accurately detect the groups in image sequence. Furthermore, compared with the existing methods, the proposed method achieves better performance on the CUHK Crowd Dataset.
Subject to the complex battlefield environment, it is difficult to establish a complete knowledge base in practical application of vehicle recognition algorithms. The infrared vehicle recognition is always difficult and challenging, which plays an important role in remote sensing. In this paper we propose a new unsupervised feature learning method based on K-feature to recognize vehicle in infrared images. First, we use the target detection algorithm which is based on the saliency to detect the initial image. Then, the unsupervised feature learning based on K-feature, which is generated by Kmeans clustering algorithm that extracted features by learning a visual dictionary from a large number of samples without label, is calculated to suppress the false alarm and improve the accuracy. Finally, the vehicle target recognition image is finished by some post-processing. Large numbers of experiments demonstrate that the proposed method has satisfy recognition effectiveness and robustness for vehicle recognition in infrared images under complex backgrounds, and it also improve the reliability of it.
An efficient automatic small target detection algorithm in infrared image is proposed. Based on non-linear histogram equalization, a coarse-to-fine segmentation is used to segment IR image into target candidates.
Then genuine targets are captured by using contrast-based confidence measure and empirical size constraint. Experimental results demonstrate that the presented method is efficient, accurate and robust.
Detection of infrared dim small target is an important task in many application fields such as automatic target detection,
target search and tracking, and early warning. By combining the block-based background reconstruction and min-cut of
non-balanced graph, a dim small target detection algorithm is presented. First, a background reconstruction based on a
new modeling is presented. Secondly, the background is suppressed though subtracting the reconstructed image from the
original image. Lastly, further segmentation using min-cut for non-balanced graph to the background suppressed image
is proposed in order to obtain the binary image containing target. The optimal segmentation threshold is selected by
heuristic search based on the optimal min-cut. Experimental results show that the proposed method can suppress
background noise and clutter effectively and detect infrared small target accurately.
This letter mainly aims at an E-Centrist descriptor for the pedestrian recognition in image sequences with background
moving slowly. Utilizing the motion information detected from the image sequences, pedestrian recognition algorithm is
implemented by combining region of interest(ROI)which probably includes potential pedestrians and an enhanced
descriptor from contour. Experimental results demonstrate that the presented method improves the speed as well as the
accuracy of pedestrian recognition in test sequences.
In this paper, a new automatic and adaptive aircraft target detection algorithm in high-resolution airport synthetic aperture radar (SAR) images is proposed. Firstly, region segmentation is used to detect the apron area in the images, which provides the potential area where aircrafts may exist and reduce the search range. Secondly, upon the apron area the pre-segmentation is taken to label the possible target points. Thirdly, the constant false alarm rate (CFAR) detector is improved to cope with multi-target detection situation. The clutter pixels in the sliding detection window will be removed automatically based on pre-segmentation result. As a result, more structural features of the targets are preserved. At last, in order to eliminate the detected false targets and solve the problem that the same target is divided into several disconnected areas, a new joint algorithm based on the area recognition factors and distance cluster is presented. The real airborne SAR image data of some airport is used to verify this target detection algorithm, and the result indicates that this algorithm can detect the aircraft target precisely and decrease the false alarm rate.
In this paper, we propose a hyperspectral image segmentation algorithm which combines classification and segmentation
into Conditional Random Field(CRF) framework. The classification step is implemented using Gaussian process which
gives the exact class probabilities of a pixel. The classification result is treated as the single-pixel model in CRF
framework, by which classification and segmentation are combined together. Through the CRF, the spatial and spectral
constraints on pixel classification are exploited. As a result, experimental results on real hyperspectral image show that
the segmentation precision has been much improved.
Feathering is a most widely used method in seamless satellite image mosaicking. A simple but effective algorithm - double
regions growing (DRG) algorithm, which utilizes the shape content of images' valid regions, is proposed for generating
robust feathering-line before feathering. It works without any human intervention, and experiment on real satellite images
shows the advantages of the proposed method.
We present a new scheme based on multiple-cue integration for visual tracking within a Gaussian particle filter framework. The proposed method integrates the color, shape, and texture cues of an object to construct a hybrid likelihood model. During the measurement step, the likelihood model can be switched adaptively according to environmental changes, which improves the object representation to deal with the complex disturbances, such as appearance changes, partial occlusions, and significant clutter. Moreover, the confidence weights of the cues are adjusted online through the estimation using a particle filter, which ensures the tracking accuracy and reliability. Experiments are conducted on several real video sequences, and the results demonstrate that the proposed method can effectively track objects in complex scenarios. Compared with previous similar approaches through some quantitative and qualitative evaluations, the proposed method performs better in terms of tracking robustness and precision.
A new method for super-resolution reconstruction based on the Gaussian-kernel is presented. Each pixel is modeled as a
Gaussian distribution to reconstruct, which is iterated by the image weighting parameter adaptively. The parallelism of
this real-valued algorithm based on the grid model enables better integration of the information of the low-resolution
images of the same scene. Compared to the bi-cubic interpolation algorithm, experiments show that the proposed
algorithm can achieve a gain up over 1.0dB. The visual quality of presented algorithm demonstrate the recovery of
spatial frequencies above the band-limit and corresponding reduction in ringing artifacts when compared with the bicubic
interpolation algorithm. And the proposed method gets better objective and subjective quality by preserving the
sharpness of the edges.
In this paper, a classification method of four moving objects including vehicle, human, motorcycle and bicycle in
surveillance video was presented by using machine learning idea. The method can be described as three steps: feature
selection, training of Support Vector Machine(SVM) classifier and performance evaluation. Firstly, a feature vector to
represent the discriminabilty of an object is described. From the profile of object, the ratio of width to height and
trisection ratio of width to height are firstly adopted as the distinct feature. Moreover, we use external rectangle to
approximate the object mask, which leads to a feature of rectangle degree standing for the ratio between the area of
object to the area of external rectangle. To cope with the invariance to scale, rotation and so on, Hu moment invariants,
Fourier descriptor and dispersedness were extracted as another kind of features. Secondly, a multi-class classifier were
designed based on two-class SVM. The idea behind the classifier structure is that the multi-class classification can be
converted to the combination of two-class classification. For our case, the final classification is the vote result of six twoclass
classifier. Thirdly, we determine the precise feature selection by experiments. According to the classification result,
we select different features for each two-class classifier. The true positive rate, false positive rate and discriminative
index are taken to evaluate the performance of the classifier. Experimental results show that the classifier achieves good
classification precision for the real and test data.
A new supervised classifier based on image fusion of hyperspectral data is proposed. The technique first selects the
suitable bands as the candidates for fusion. Then, the bands based on curvelet transform are fused into several
components. The fused hyperspectral components as the extracted features are fed into the supervised classifier based on
Gaussian Mixture Model. After the estimation of the GMM with Expectation Maximization, the pixels are classified
based on the Bayesian decision rule. One requirement of the technique is that the training samples should be provided
from the hyperspectral data to be analyzed. The main merits of the new method contain tow folds. One is the novel
feature extraction based on curvelet transform which fully makes use of the spectral properties of the hyperspectral data.
The other one is the low computing complexity by reducing the data dimension significantly. Experimental result on the
real hyperspectral data demonstrate that the proposed technique is practically useful and posses encouraging advantages.
KEYWORDS: Image segmentation, RGB color model, Video surveillance, Video, Image processing, Data modeling, Video processing, Machine vision, Computer vision technology, Visual process modeling
Robust and efficient foreground segmentation is a crucial topic in many computer vision applications. In this paper, we
propose an improved method of foreground segmentation with the Gaussian mixture model (GMM) for video
surveillance. The number of mixture components of GMM is estimated according to the frequency of pixel value
changes, the performance of GMM can be effectively enhanced with the modified background learning and update, new
Gaussian distribution generation rule and shadow detection. In order to improve the efficiency, illumination assessment
is used to decide whether there are shadows in the given image. Shadow suppression will be adopted based on
morphological reconstruction. Besides, the detection of sudden illumination change and background updating are also
presented. Results obtained with different real-world scenarios show the robustness and efficiency of the approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.