PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 9405 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral imaging (HSI) sensors provide plenty of spectral information to uniquely identify materials by their
reflectance spectra, and this information has been effectively used for object detection and identification applications.
Joint transform correlation (JTC) based object detection techniques in HSI have been proposed in the literatures, such
as spectral fringe-adjusted joint transform correlation (SFJTC) and with its several improvements. However, to our
knowledge, the SFJTC based techniques were designed to detect only similar patterns in hyperspectral data cube and
not for dissimilar patterns. Thus, in this paper, a new deterministic object detection approach using SFJTC is proposed
to perform multiple dissimilar target detection in hyperspectral imagery. In this technique, input spectral signatures
from a given hyperspectral image data cube are correlated with the multiple reference signatures using the classassociative
technique. To achieve better correlation output, the concept of SFJTC and the modified Fourier-plane
image subtraction technique are incorporated in the multiple target detection processes. The output of this technique
provides sharp and high correlation peaks for a match and negligible or no correlation peaks for a mismatch. Test
results using real-life hyperspectral data cube show that the proposed algorithm can successfully detect multiple
dissimilar patterns with high discrimination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Marine microfossils provide a useful record of the Earth's resources and prehistory via biostratigraphy. To study Hydrocarbon reservoirs and prehistoric climate, geoscientists visually identify the species of microfossils found in core samples. Because microfossil identification is labour intensive, automation has been investigated since the 1980s. With the initial rule-based systems, users still had to examine each specimen under a microscope. While artificial neural network systems showed more promise for reducing expert labour, they also did not displace manual identification for a variety of reasons, which we aim to overcome. In our human-based computation approach, the most difficult step, namely taxon identification is outsourced via a frontend website to human volunteers. A backend algorithm, called dynamic hierarchical identification, uses unsupervised, supervised, and dynamic learning to accelerate microfossil identification. Unsupervised learning clusters specimens so that volunteers need not identify every specimen during supervised learning. Dynamic learning means interim computation outputs prioritize subsequent human inputs. Using a dataset of microfossils identified by an expert, we evaluated correct and incorrect genus and species rates versus simulated time, where each specimen identification defines a moment. The proposed algorithm accelerated microfossil identification effectively, especially compared to benchmark results obtained using a k-nearest neighbour method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Optic surveillance is an important part of monitoring environmental changes in various ecological settings.
Although remote sensing provides extensive data, its resolution is yet not sufficient for scientific research focusing on small spatial scale landscape variations. We are interested in exploiting high resolution image data to observe and investigate the landscape variations at a small spatial scale arctic corridor in Barrow, AK, as part of the DOE Next-Generation Ecosystem Experiments (NGEE-Arctic). A 35 m transect is continuously imaged by two separate pole mounted consumer grade stationary cameras, one capturing in NIR and the other capturing in visible range, starting from June to August in 2014. Surface and subsurface features along this 35 m transect are also sampled by electrical resistivity tomography (ERT), temperature loggers and water content reflectometers. We track the behavioral change along this transect by collecting samples from the pole images and look for a relation between the image features and electrical conductivity. Results show that the correlation coefficient between inferred vegetation indices and soil electrical resistivity (closely related to water content) increased during the growing season, reaching a correlation of 0.89 at the peak of the vegetation. To extrapolate such results to a larger scale, we use a high resolution RGB map of a 500x40 m corridor at this site, which is occasionally obtained using a low-altitude kite mounted consumer grade (RGB) camera. We introduce a segmentation algorithm that operates on the mosaic generated from the kite images to classify the landscape features of the corridor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper reports the results of a feasibility study for the development of a hyperspectral image recovery
(reconstruction) technique using a RGB color camera and regression analysis in order to detect and classify colonies of
foodborne pathogens. The target bacterial pathogens were the six representative non-O157 Shiga-toxin producing
Escherichia coli (STEC) serogroups (O26, O45, O103, O111, O121, and O145) grown in Petri dishes of Rainbow agar.
The purpose of the feasibility study was to evaluate whether a DSLR camera (Nikon D700) could be used to predict
hyperspectral images in the wavelength range from 400 to 1,000 nm and even to predict the types of pathogens using a
hyperspectral STEC classification algorithm that was previously developed. Unlike many other studies using color charts
with known and noise-free spectra for training reconstruction models, this work used hyperspectral and color images,
separately measured by a hyperspectral imaging spectrometer and the DSLR color camera. The color images were
calibrated (i.e. normalized) to relative reflectance, subsampled and spatially registered to match with counterpart pixels
in hyperspectral images that were also calibrated to relative reflectance. Polynomial multivariate least-squares regression
(PMLR) was previously developed with simulated color images. In this study, partial least squares regression (PLSR)
was also evaluated as a spectral recovery technique to minimize multicollinearity and overfitting. The two spectral
recovery models (PMLR and PLSR) and their parameters were evaluated by cross-validation. The QR decomposition
was used to find a numerically more stable solution of the regression equation. The preliminary results showed that
PLSR was more effective especially with higher order polynomial regressions than PMLR. The best classification
accuracy measured with an independent test set was about 90%. The results suggest the potential of cost-effective color
imaging using hyperspectral image classification algorithms for rapidly differentiating pathogens in agar plates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This contribution addresses the task of searching for faces in large video datasets. Despite vast progress in the field, face
recognition remains a challenge for uncontrolled large scale applications like searching for persons in surveillance footage
or internet videos. While current productive systems focus on the best shot approach, where only one representative frame
from a given face track is selected, thus sacrificing recognition performance, systems achieving state-of-the-art recognition
performance, like the recently published DeepFace, ignore recognition speed, which makes them impractical for large scale
applications. We suggest a set of measures to address the problem. First, considering the feature location allows collecting
the extracted features in according sets. Secondly, the inverted index approach, which became popular in the area of image
retrieval, is applied to these feature sets. A face track is thus described by a set of local indexed visual words which enables
a fast search. This way, all information from a face track is collected which allows better recognition performance than
best shot approaches and the inverted index permits constantly high recognition speeds. Evaluation on a dataset of several
thousand videos shows the validity of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mobile devices such as smartphones are going to play an important role in professionally image processing tasks. However, mobile systems were not designed for such applications, especially in terms of image processing requirements like stability and robustness. One major drawback is the automatic white balance, which comes with the devices. It is necessary for many applications, but of no use when applied to shiny surfaces. Such an issue appears when image acquisition takes place in differently coloured illuminations caused by different environments. This results in inhomogeneous appearances of the same subject. In our paper we show a new approach for handling the complex task of generating a low-noise and sharp image without spatial filtering. Our method is based on the fact that we analyze the spectral and saturation distribution of the channels. Furthermore, the RGB space is transformed into a more convenient space, a particular HSI space. We generate the greyscale image by a control procedure that takes into account the colour channels. This leads in an adaptive colour mixing model with reduced noise. The results of the optimized images are used to show how, e. g., image classification benefits from our colour adaptation approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical radiography is the use of radiation to “see through” a human body without breaching its integrity (surface). With
computed tomography (CT)/cone beam computed tomography (CBCT), three-dimensional (3D) imaging can be
produced. These imagings not only facilitate disease diagnosis but also enable computer-aided surgical
planning/navigation. In dentistry, the common method for transfer of the virtual surgical planning to the patient (reality)
is the use of surgical stent either with a preloaded planning (static) like a channel or a real time surgical navigation
(dynamic) after registration with fiducial markers (RF). This paper describes using the corner of a cube as a radiopaque
fiducial marker on an acrylic (plastic) stent, this RF allows robust calibration and registration of Cartesian (x, y, z)-
coordinates for linking up the patient (reality) and the imaging (virtuality) and hence the surgical planning can be
transferred in either static or dynamic way. The accuracy of computer-aided implant surgery was measured with
reference to coordinates. In our preliminary model surgery, a dental implant was planned virtually and placed with
preloaded surgical guide. The deviation of the placed implant apex from the planning was x=+0.56mm [more right], y=-
0.05mm [deeper], z=-0.26mm [more lingual]) which was within clinically 2mm safety range. For comparison with the
virtual planning, the physically placed implant was CT/CBCT scanned and errors may be introduced. The difference of
the actual implant apex to the virtual apex was x=0.00mm, y=+0.21mm [shallower], z=-1.35mm [more lingual] and this
should be brought in mind when interpret the results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lambertian photometric stereo (PS) is a seminal computer vision method. However, using depth maps in the image formation model, instead of surface normals as in PS, reduces model parameters by a third, making it preferred from an information-theoretic perspective. The Akaike information criterion (AIC) quantifies this trade-off between goodness of fit and overfitting. Obtaining superior AIC values requires an effective maximum likelihood (ML) depth-map & albedo estimation method. Recently, the authors published an ML estimation method that uses a two-step approach based on PS. While effective, approximations of noise distributions and decoupling of depth-map & albedo estimation have limited its accuracy. Overcoming these limitations, this paper presents an ML method operating directly on images. The previous two-step ML method provides a robust initial solution, which kick starts a new nonlinear estimation process. An innovative formulation of the estimation task, including a separable nonlinear least-squares approach, reduces the computational burden of the optimization process. Experiments demonstrate visual improvements under noisy conditions by avoiding overfitting. As well, a comprehensive analysis shows that refined depth maps & albedos produce superior AIC metrics and enjoy better predictive accuracy than with literature methods. The results indicate that the new method is a promising means for depth-map & albedo estimation with superior information-theoretic performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a two stage algorithm for streaming video segmentation. In the first stage, shot boundaries are
detected within a window of frames by comparing dissimilarity between 2-D segmentations of each frame. In the second
stage, the 2-D segments are propagated across the window of frames in both spatial and temporal direction. The window
is moved across the video to find all shot transitions and obtain spatio-temporal segments simultaneously. As opposed to
techniques that operate on entire video, the proposed approach consumes significantly less memory and enables
segmentation of lengthy videos. We tested our segmentation based shot detection method on the TRECVID 2007 video
dataset and compared it with block-based technique. Cut detection results on the TRECVID 2007 dataset indicate that
our algorithm has comparable results to the best of the block-based methods. The streaming video segmentation routine
also achieves promising results on a challenging video segmentation benchmark database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose an edge-based depth-from-focus technique for high-precision non-contact industrial inspection and metrology
applications. In our system, an objective lens with a large numerical aperture is chosen to resolve the edge details of the
measured object. By motorizing this imaging system, we capture the high-resolution edges within every narrow depth
of field. We can therefore extend the measured range and keep a high resolution at the same time. Yet, on the surfaces
with a large depth variation, a significant amount of data around each measured point are out of focus within the captured
images. Then, it is difficult to extract the valuable information from these out-of-focus data due to the depth-variant blur.
Moreover, these data impede the extraction of continuous contours for the measurement objects in high-level machine
vision applications. The proposed approach however makes use of the out-of-focus data to synthesize a depth-invariant
smoothed image, and then robustly locates the positions of high contrast edges based on non-maximum suppression and
hysteresis thresholding. Furthermore, by focus analysis of both the in-focus and the out-of-focus data, we reconstruct the
high-precision 3D edges for metrology applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In solar wafer manufacturing processes, the measurement of implant mask wearing over time is important to maintain
the quality of wafers and the overall yield. Mask wearing can be estimated by measuring the width of lines implanted by
it on the substrate. Previous methods, which propose image analysis methods to detect and measure these lines, have
been shown to perform well on polished wafers. Although it is easier to capture images of textured wafers, the contrast
between the foreground and background is extremely low. In this paper, an improved technique to detect and measure
implant line widths on textured solar wafers is proposed. As a pre-processing step, a fast non-local means method is used
to denoise the image due to the presence of repeated patterns of textured lines in the image. Following image
enhancement, the previously proposed line integral method is used to extract the position of each line in the image. Full-
Width One-Third maximum approximation is then used to estimate the line widths in pixel units. The conversion of
these widths into real-world metric units is done using a photogrammetric approach involving the Sampling Distance.
The proposed technique is evaluated using real images of textured wafers and compared with the state-of-the-art using
identical synthetic images, to which varying amounts of noise was added. Precision, recall and F-measure values are
calculated to benchmark the proposed technique. The proposed method is found to be more robust to noise, with critical
SNR value reduced by 10dB in comparison to the existing method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present an industrial application of multispectral imaging, for density measurement of colorants in photographic
paper. We designed and developed a 9-band LED illumination based multispectral imaging system specifically for
this application in collaboration with FUJIFILM Manufacturing Europe B.V., Tilburg, Netherlands. Unlike a densitometer,
which is a spot density measurement device, the proposed system enables fast density measurement in a large area of a
photo paper. Densities of the four colorants (CMYK) at every surface point in an image are calculated from the spectral
reflectance image. Fast density measurements facilitate automatic monitoring of density changes (which is proportional to
thickness changes), which helps control the manufacturing process for quality and consistent output. Experimental results
confirm the effectiveness of the proposed system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a method of self-calibration of monocular vision system which is based on planar points. Using the
method proposed in this paper we can get the initial value of the three-dimensional (3D) coordinates of the feature points
in the scene easily, although there is a nonzero factor between the initial value and the real value of the 3D coordinates of
the feature points. From different viewpoints, we can shoot different pictures, and calculate the initial external
parameters of these pictures. Finally, through the overall optimization, we can get all the parameters including the
internal parameters, the distortion parameters, the external parameters of each picture and the 3D coordinates of the
feature points. According to the experimental results, in about 100mm×200mm field of view, the mean error and the
variance of 3D coordinates of the feature points is less than 10μm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper aims at presenting a comparative study of outlier detection (OD) for large-scale traffic data. The traffic data
nowadays are massive in scale and collected in every second throughout any modern city. In this research, the traffic flow
dynamic is collected from one of the busiest 4-armed junction in Hong Kong in a 31-day sampling period (with 764,027
vehicles in total). The traffic flow dynamic is expressed in a high dimension spatial-temporal (ST) signal format (i.e. 80
cycles) which has a high degree of similarities among the same signal and across different signals in one direction. A total
of 19 traffic directions are identified in this junction and lots of ST signals are collected in the 31-day period (i.e. 874
signals). In order to reduce its dimension, the ST signals are firstly undergone a principal component analysis (PCA) to
represent as (x,y)-coordinates. Then, these PCA (x,y)-coordinates are assumed to be conformed as Gaussian distributed.
With this assumption, the data points are further to be evaluated by (a) a correlation study with three variant coefficients,
(b) one-class support vector machine (SVM) and (c) kernel density estimation (KDE). The correlation study could not give
any explicit OD result while the one-class SVM and KDE provide average 59.61% and 95.20% DSRs, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we report on the vibration and displacement monitoring of civil engineering structures using a state of the
art image assisted total station (IATS) and passive target markings. By utilizing the telescope camera of the total station,
it is possible to capture video streams in real time with 10fps and an angular resolution of approximately 2″/px. Due to
the high angular resolution resulting from the 30x optical magnification of the telescope, large distances to the object to
be monitored are possible. The laser distance measurement unit integrated in the total station allows to precisely set the
camera’s focus position and to relate the angular quantities gained from image processing to units of length. To accurately
measure the vibrations and displacements of civil engineering structures, we use circular target markings rigidly attached
to the object. The computation of the targets’ centers is performed by a least squares adjustment of an ellipse according
to the Gauß-Helmert model from which the parameters of the ellipse and their standard deviations are derived. In
laboratory experiments, we show that movements can be detected with an accuracy of better than 0.2mm for single
frames and distances up to 30m. For static applications, where many video frames can be averaged, accuracies of better
than 0.05mm are possible. In a field test on a life-size footbridge, we compare the vibrations measured by the IATS to
reference values derived from accelerometer measurements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Building and road detection from aerial imagery has many applications in a wide range of areas including urban design, real-estate management, and disaster relief. The extracting buildings and roads from aerial imagery has been performed by human experts manually, so that it has been very costly and time-consuming process. Our goal is to develop a system for automatically detecting buildings and roads directly from aerial imagery. Many attempts at automatic aerial imagery interpretation have been proposed in remote sensing literature, but much of early works use local features to classify each pixel or segment to an object label, so that these kind of approach needs some prior knowledge on object appearance or class-conditional distribution of pixel values. Furthermore, some works also need a segmentation step as pre-processing. Therefore, we use Convolutional Neural Networks(CNN) to learn mapping from raw pixel values in aerial imagery to three object labels (buildings, roads, and others), in other words, we generate three-channel maps from raw aerial imagery input. We take a patch-based semantic segmentation approach, so we firstly divide large aerial imagery into small patches and then train the CNN with those patches and corresponding three-channel map patches. Finally, we evaluate our
system on a large-scale road and building detection datasets that is publicly available.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In material science and bio-medical domains the quantity and quality of microscopy images is rapidly increasing and there
is a great need to automatically detect, delineate and quantify particles, grains, cells, neurons and other functional "objects"
within these images. These are challenging problems for image processing because of the variability in object appearance
that inevitably arises in real world image acquisition and analysis. One of the most promising (and practical) ways to
address these challenges is interactive image segmentation. These algorithms are designed to incorporate input from a
human operator to tailor the segmentation method to the image at hand. Interactive image segmentation is now a key tool
in a wide range of applications in microscopy and elsewhere. Historically, interactive image segmentation algorithms have
tailored segmentation on an image-by-image basis, and information derived from operator input is not transferred between
images. But recently there has been increasing interest to use machine learning in segmentation to provide interactive tools
that accumulate and learn from the operator input over longer periods of time. These new learning algorithms reduce the
need for operator input over time, and can potentially provide a more dynamic balance between customization and
automation for different applications. This paper reviews the state of the art in this area, provides a unified view of these
algorithms, and compares the segmentation performance of various design choices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the transition towards renewable energies, electricity suppliers are faced with huge challenges. Especially the
increasing integration of solar power systems into the grid gets more and more complicated because of their dynamic
feed-in capacity. To assist the stabilization of the grid, the feed-in capacity of a solar power system within the next hours,
minutes and even seconds should be known in advance. In this work, we present a consumer camera-based system for
forecasting the feed-in capacity of a solar system for a horizon of 10 seconds. A camera is targeted at the sky and clouds
are segmented, detected and tracked. A quantitative prediction of the insolation is performed based on the tracked clouds.
Image data as well as truth data for the feed-in capacity was synchronously collected at one Hz using a small solar panel,
a resistor and a measuring device. Preliminary results demonstrate both the applicability and the limits of the proposed
system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces the concept of three dimensional (3D) barcodes. A 3D barcode is composed of an array
of 3D cells, called modules, and each can be either filled or empty, corresponding to two possible values of a bit.
These barcodes have great theoretical promise thanks to their very large information capacity, which grows as
the cube of the linear size of the barcode, and in addition are becoming practically manufacturable thanks to
the ubiquitous use of 3D printers.
In order to make these 3D barcodes practical for consumers, it is important to keep the decoding simple using
commonly available means like smartphones. We therefore limit ourselves to decoding mechanisms based only
on three projections of the barcode, which imply specific constraints on the barcode itself. The three projections
produce the marginal sums of the 3D cube, which are the counts of filled-in modules along each Cartesian axis.
In this paper we present some of the theoretical aspects of the 2D and 3D cases, and describe the resulting
complexity of the 3D case. We then describe a method to reduce these complexities into a practical application.
The method features an asymmetric coding scheme, where the decoder is much simpler than the encoder. We
close by demonstrating 3D barcodes we created and their usability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Face images from video sequences captured in unconstrained environments usually contain several kinds of variations, e.g. pose, facial expression, illumination, image resolution and occlusion. Motion blur and compression artifacts also deteriorate recognition performance. Besides, in various practical systems such as law enforcement, video surveillance and e-passport identification, only a single still image per person is enrolled as the gallery set. Many existing methods may fail to work due to variations in face appearances and the limit of available gallery samples. In this paper, we propose a novel approach for still-to-video face recognition in unconstrained environments. By assuming that faces from still images and video frames share the same identity space, a regularized least squares regression method is utilized to tackle the multi-modality problem. Regularization terms based on heuristic assumptions are enrolled to avoid overfitting. In order to deal with the single image per person problem, we exploit face variations learned from training sets to synthesize virtual samples for gallery samples. We adopt a learning algorithm combining both affine/convex hull-based approach and regularizations to match image sets. Experimental results on a real-world dataset consisting of unconstrained video sequences demonstrate that our method outperforms the state-of-the-art methods impressively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of face modeling, probably the most well-known approach to represent 3D faces is the 3D
Morphable Model (3DMM). When 3DMM is fitted to a 2D image, the shape as well as the texture and illumination parameters are simultaneously estimated. However, if real facial texture is needed, texture extraction from the 2D image is necessary. This paper addresses the possible problems in texture extraction of a single image caused by self-occlusion. Unlike common approaches that leverage the symmetric property of the face by mirroring the visible facial part, which is sensitive to inhomogeneous illumination, this work first generates a virtual texture map for the skin area iteratively by averaging the color of neighbored vertices. Although this step creates unrealistic, overly smoothed texture, illumination stays constant between the real and virtual texture. In the second pass, the mirrored texture is gradually blended with the real or generated texture according to the visibility. This scheme ensures a gentle handling of illumination and yet yields realistic texture. Because the blending area only relates to non-informative area, main facial features still have unique appearance in different face halves. Evaluation results reveal realistic rendering in novel poses robust to challenging illumination conditions and small registration errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to control riots in crowds, it is helpful to get ringleaders under control and pull them out of the crowd if one has become an offender. A great support to achieve these tasks is the capability of observing the crowd and ringleaders automatically by using cameras. It also allows a better conservation of evidence in riot control. A ringleader who has become an offender should be tracked across and recognized by several cameras, regardless of whether overlapping camera’s fields of view exist or not. We propose a context-based approach for handover of persons between different camera fields of view. This approach can be applied for overlapping as well as for non-overlapping fields of view, so that a fast and accurate identification of individual persons in camera networks is feasible. Within the scope of this paper, the approach is applied to a handover of persons between single images without having any temporal information. It is particularly developed for semiautomatic video editing and a handover of persons between cameras in order to improve conservation of evidence. The approach has been developed on a dataset collected during a Crowd and Riot Control (CRC) training of the German armed forces. It consists of three different levels of escalation. First, the crowd started with a peaceful demonstration. Later, there were violent protests, and third, the riot escalated and offenders bumped into the chain of guards. One result of the work is a reliable context-based method for person re-identification between single images of different camera fields of view in crowd and riot scenarios. Furthermore, a qualitative assessment shows that the use of contextual information can support this task additionally. It can decrease the needed time for handover and the number of confusions which supports the conservation of evidence in crowd and riot scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computed tomography (CT) is a medical imaging technology that projects computer-processed X-rays to acquire tomographic images or the slices of specific organ of body. A motion artifact caused by patient motion is a common problem in CT system and may introduce undesirable artifacts in CT images. This paper analyzes the critical problems in motion artifacts and proposes a new CT system for motion artifact compensation. We employ depth cameras to capture the patient motion and account it for the CT image reconstruction. In this way, we achieve the significant improvement in motion artifact compensation, which is not possible by previous techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose new method that can classify a human action using Procrustes shape theory. First, we extract a pre-shape configuration vector of landmarks from each frame of an image sequence representing an arbitrary human action, and then we have derived the Procrustes fit vector for pre-shape configuration vector. Second, we extract a set of pre-shape vectors from tanning sample stored at database, and we compute a Procrustes mean shape vector for these preshape vectors. Third, we extract a sequence of the pre-shape vectors from input video, and we project this sequence of pre-shape vectors on the tangent space with respect to the pole taking as a sequence of mean shape vectors corresponding with a target video. And we calculate the Procrustes distance between two sequences of the projection pre-shape vectors on the tangent space and the mean shape vectors. Finally, we classify the input video into the human action class with minimum Procrustes distance. We assess a performance of the proposed method using one public dataset, namely Weizmann human action dataset. Experimental results reveal that the proposed method performs very good on this dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a study about the effect of the quality of the input video source on the computer vision system
robustness and how to make use of the findings to create a framework generating a set of recommendation or rules for
researchers and developers in the field to use. The study is of high importance especially for cloud based computer vision
platforms where the transmission of raw uncompressed video is not possible, as such it is desired to have a sweet spot
where the usage of bandwidth is at optimal level while maintaining high recognition rate. Experimental results showed
that creating such rules is possible and beneficial to integrate in an end to end cloud based computer vision service.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Capturing a clean video from a source camera is crucial for accurate results of a computer vision system. In particular,
blurry images can considerably affect the detection, tracking and pattern matching algorithms. This paper presents a
framework to apply quality control by monitoring captured video with the ability to detect whether the camera is out of
focus or not, thus identifying blurry defective images and providing a feedback channel to the camera to adjust the focal
length. The framework relies on the use of a no reference objective quality metric for the loopback channel to adjust the
camera focus. The experimental results show how the framework enables reduction of unnecessary computations and
thus enabling a more power efficient cameras.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The reduction of EOL concrete disposal in landfills, together with a lower exploitation of primary raw materials, generates a strong interest to develop, set-up and apply innovative technologies to maximize Construction and
Demolition Waste (C&DW) conversion into useful secondary raw materials. Such a goal can be reached starting from a punctual in-situ efficient characterization of the objects to dismantle in order to develop demolition actions aimed to set up innovative mechanical-physical processes to recover the different materials and products to recycle. In this paper an innovative recycling-oriented characterization strategy based on HyperSpectral Imaging (HSI) is described in order to identify aggregates and mortar in drill core samples from end-of-life concrete. To reach this goal, concrete drill cores from a demolition site were systematically investigated by HSI in the short wave infrared field (1000-2500 nm). Results obtained by the adoption of the HSI approach showed as this technology can be successfully applied to analyze quality and characteristics of C&DW before dismantling and as final product to reutilise after demolition-milling-classification actions. The proposed technique and the related recognition logics, through the spectral signature detection of finite physical domains (i.e. concrete slice and/or particle) of different nature and composition, allows; i) to develop characterization procedures able to quantitatively assess end-of-life concrete compositional/textural characteristics and ii) to set up innovative sorting strategies to qualify the different materials constituting drill core samples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although people or object tracking in uncontrolled environments has been acknowledged in the literature, the accurate
localization of a subject with respect to a reference ground plane remains a major issue. This study describes an early
prototype for the tracking and localization of pedestrians with a handheld camera. One application envisioned here is to
analyze the trajectories of blind people going across long crosswalks when following different audio signals as a guide.
This kind of study is generally conducted manually with an observer following a subject and logging his/her current
position at regular time intervals with respect to a white grid painted on the ground. This study aims at automating the
manual logging activity: with a marker attached to the subject’s foot, a video of the crossing is recorded by a person
following the subject, and a semi-automatic tool analyzes the video and estimates the trajectory of the marker with
respect to the painted markings. Challenges include robustness to variations to lighting conditions (shadows, etc.),
occlusions, and changes in camera viewpoint. Results are promising when compared to GNSS measurements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we present a new model of visual saliency by combing results from existing methods, improving upon their performance and accuracy. By fusing pre-attentive and context-aware methods, we highlight the abilities of state-of-the-art models while compensating for their deficiencies. We put this theory to the test in a series of experiments, comparatively evaluating the visual saliency maps and employing them for content-based image retrieval and thumbnail generation. We find that on average our model yields definitive improvements upon recall and f-measure metrics with comparable precisions. In addition, we find that all image searches using our fused method return more correct images and additionally rank them higher than the searches using the original methods alone.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A significant increase in the availability of high resolution hyperspectral images has led to the need for developing pertinent techniques in image analysis, such as classification. Hyperspectral images that are correlated spatially and spectrally provide ample information across the bands to benefit this purpose. Conditional Random Fields (CRFs) are discriminative models that carry several advantages over conventional techniques: no requirement of the independence assumption for observations, flexibility in defining local and pairwise potentials, and an independence between the modules of feature selection and parameter leaning. In this paper we present a framework for classifying remotely sensed imagery based on CRFs. We apply a Support Vector Machine (SVM) classifier to raw remotely sensed imagery data in order to generate more meaningful feature potentials to the CRFs model. This approach produces promising results when tested with publicly available AVIRIS Indian Pine imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When design image processing applications in hard and software today, as cameras, FPGA embedded processing or PC applications it is a big task select the right interfaces in function and bandwidth for the complex system. This paper shall present existing specifications which are well established in the market and can help in building the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.