Group convolution can significantly reduce the computational cost by dividing the feature map channels into groups and then convolution operation is applied within each group. However, evenly dividing channels will cause the isolation between divided groups, i.e., no interaction between groups. To address the channel isolation problem, we proposed flow group convolution (FGConv) that utilizes different but overlapped input channels to compute output channels and enhance the interaction of different groups. FGConv can be easily applied to the existing networks and reduce their computational cost. We replace the original convolution with FGConv in ResNets and validate them on CIFAR-10 and CIFAR-100 benchmarks. Experiments results demonstrate that FGConv performs better than existing group convolution techniques.
L1 loss function and Intersection over Union (IoU) are commonly used in object detection. However, minimizing the loss function through training process does not necessarily amount to maximizing IoUs. L1 loss simply assigns equal weights to difference of the width, height, and center point between a prediction box and a ground truth box but pays less attention to the contribution of each shape property. Observing this, we propose scaling loss which can be easily embedded in convolutional neural networks for mitigating the gap between IoU and loss function. The key insight is to add in the loss function the adaptive weights for width, height, and center point that encode the shape properties of the bounding box. The contribution of each shape property will be adaptively adjusted according to the difference between a prediction box and a ground truth box, i.e. increasing the weight assigned to the bad-regressed shape property. By this means, the scaling loss is able to obtain more accurate prediction box. The proposed scaling loss was embedded in Faster R-CNN and SSD, and was validated on PASCAL VOC 2007. Experimental results verify that the proposed scaling loss can improve the detection accuracy over the smooth L1 loss and Softer-NMS.
This paper presents an approach to estimating point spread function (PSF) from low resolution (LR) images. Existing techniques usually rely on accurate detection of ending points of the profile normal to edges. In practice however, it is often a great challenge to accurately localize profiles of edges from a LR image, which hence leads to a poor PSF estimation of the lens taking the LR image. For precisely estimating the PSF, this paper proposes firstly estimating a 1-D PSF kernel with straight lines, and then robustly obtaining the 2-D PSF from the 1-D kernel by least squares techniques and random sample consensus. Canny operator is applied to the LR image for obtaining edges and then Hough transform is utilized to extract straight lines of all orientations. Estimating 1-D PSF kernel with straight lines effectively alleviates the influence of the inaccurate edge detection on PSF estimation. The proposed method is investigated on both natural and synthetic images for estimating PSF. Experimental results show that the proposed method outperforms the state-ofthe- art and does not rely on accurate edge detection.
This paper proposes a method for removing mismatched lines on multispectral images. The inaccurate detection of ending points brings a great challenge for matching lines since corresponding lines may not be integrally extracted. Due to the inaccurate detection of ending points, lines are usually mismatched with the line description. To eliminate the mismatched lines, we employ a modified RANSAC (Random Sample Consensus) consisting of two steps: (1) pick three line matches randomly and determine their intersections, which are used to calculate a transformation; (2) the best transformation is obtained by sorting the matching score of line matches and then the inliers are declared as the correct matches. Experimental results show that the proposed method can effectively remove incorrect matches on multispectral images.
This paper presents a cascade of classifiers with “resurrection” mechanism for building reliable keypoint matches. It is likely to cause that correct keypoint mappings are removed because of too strict regulation in many existing solutions of image registration. To avoid this situation and get accuracy result, a cascade framework with multi-steps is proposed to remove the incorrect keypoint mappings. To further reduce the rate of misjudgment to correct mappings in each step, we introduce “resurrection” in a cascade structure. Keypoint mappings are initially built with their associated descriptors, and then in each step part of keypoint mappings are determined to be incorrect and deleted completely. Meanwhile, some mappings which perform relatively poor are undetermined and their fate will be decided in next step under their performance. By this means, we use multi-steps efficiently and reduce misjudgment to correct mappings. Experimental results show that the presented cascade structure can robustly remove the outlier keypoint mappings and achieve accurate image registration.
This work deals with the problem of high computation complexity in image registration. A hierarchical multiresolution strategy is utilized to speed up the processing of SIFT by starting on a low resolution octave. The initial affine transformation model will be achieved. In subsequent multiresolution octaves, we apply the transformation affine model getting from upper octave to current octave, then, combined with geometrical distribution of matched keypoints to further remove incorrect mappings and update affine transformation model. The strategy ends with the best affine transformation model on the bottom octave(full-size image). Experimental results show that the proposed method can achieve comparative accuracy with less computational than original SIFT
This paper proposes a multimodal image registration algorithm through searching the best matched keypoints by employing
the global information. Keypoints are detected from images from both the reference and test images. For each test keypoint,
a certain number of reference keypoints are chosen as mapping candidates. A triplet of keypoint mappings determine an
affine transformation, and then it is evaluated with the similarity metric between the reference image and the transformed
test image by the determined transformation. An iterative process is conducted on triplets of keypoint mappings, and
for every test keypoint updates and stores its best matched reference keypoint. The similarity metric is defined to be the
number of overlapped edge pixels over entire images, allowing for global information being incorporated in evaluating
triplets of mappings. Experimental results show that the proposed algorithm can provide more accurate registration than
existing methods on EO-IR images.
This paper proposes an affine image registration technique formulated in a hypothesis-test framework. The technique is
based on a mapping algorithm which matches curves and junctions (edge corners) to achieve registration. A similarity
metric based on partitioned (short) curves is utilized to narrow the space of possible junction mappings (hypotheses) and
the transformation matrix of junction and curve mappings are analyzed (testing) to find the best affine registration of two
images. Experimental results show that the proposed algorithms can effectively align images.
This paper addresses a fundamental problem in computer vision, curve matching. Curve matching and comparison play a
key role in various applications. High-level vision problems usually require comparing curves, and the quality of tackling
these problems relies much on the underlying curve matching techniques. Our goal is to define a distance on the space of
plane (space) curves. The space of curves is taken as a manifold (topological space), and we consider Riemannian metrics
on the manifold. The distance induced by a Riemannian metric is a metric, which, if not trivial, can be used as a similarity
metric. This work also deals with the problem of partial curve matching given their starting points are known. Dynamic
programming is used to implement partial matching, giving an efficient computational method. Experiments are conducted
to test the distance invariant to translation and scaling.
This paper proposes a line segment based image registration method. Edges are detected and partitioned into line segments.
Line-fitting is applied onto every line segment to rule out those segments of high fitting error. For each segment in a
reference image, putative matching segments in a test image are picked with the constraints obtained by analyzing affine
transformations. Putative segment correspondences result in the correspondences of intersections of segments, which
are used as matching points. An affine matrix is derived from those point correspondences and evaluated by the similarity
metric. The segment correspondences ending up with higher similarity metrics are used to compute the final transformation.
Experimental results show that the proposed method is robust especially when salient points can not be detected accurately.
This paper proposes a junction detection method that detects junctions as those points where edges join or intersect. The
edges that form a junction are searched in a square neighbourhood, and the subtended angles among them are calculated
by using edge orientations. Local edge orientation at a pixel is estimated by utilizing those edge points close to the pixel.
Based on the subtended angles, the pixel is determined to be a junction candidate or not. Each actual junction is accurately
localized by suppressing the candidates of non-minimum orientation difference. The proposed method analyzes real cases
of extracted edges, and estimates the change of orientations of edge segments in digital fields. The experimental results
show that the proposed algorithm can robustly detect junctions in digital images.
This paper proposes an edge-based multimodal image registration approach. It aims to address image registration as a
whole rather than tackle each of its elements independently. One-pixel-wide curves are firstly extracted from images, and
junctions along the curves are detected. Then, each curve is divided into subsegments as matching primitives. A similarity
metric based on the number of matched pairs of subsegments is proposed and experimental results show that the presented
approach is a robust and effective tool for multimodal image registration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.