PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE
Proceedings Volume 7872, including the Title Page, Copyright
information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the past two years the processing power of video graphics cards has quadrupled and is approaching super computer
levels. State-of-the-art graphical processing units (GPU) boast of theoretical computational performance in the range of
1.5 trillion floating point operations per second (1.5 Teraflops). This processing power is readily accessible to the
scientific community at a relatively small cost. High level programming languages are now available that give access to
the internal architecture of the graphics card allowing greater algorithm optimization. This research takes memory access
expensive portions of an image-based iris identification algorithm and hosts it on a GPU using the C++ compatible
CUDA language. The selected segmentation algorithm uses basic image processing techniques such as image inversion,
value squaring, thresholding, dilation, erosion and memory/computationally intensive calculations such as the circular
Hough transform. Portions of the iris segmentation algorithm were accelerated by a factor of 77 over the 2008 GPU
results. Some parts of the algorithm ran at speeds that were over 1600 times faster than their CPU counterparts. Strengths
and limitations of the GPU Single Instruction Multiple Data architecture are discussed. Memory access times, instruction
execution times, programming details and code samples are presented as part of the research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper a consistent and efficient but yet convenient system for parallel computer vision, and in
fact also realtime actuator control is proposed. The system implements the multi-agent paradigm and a
blackboard information storage. This, in combination with a generic interface for hardware abstraction
and integration of external software components, is setup on basis of the message passing interface
(MPI).
The system allows for data- and task-parallel processing, and supports both synchronous communication,
as data exchange can be triggered by events, and asynchronous communication, as data can
be polled, strategies. Also, by duplication of processing units (agents) redundant processing is possible
to achieve greater robustness. As the system automatically distributes the task units to available
resources, and a monitoring concept allows for combination of tasks and their composition to complex
processes, it is easy to develop efficient parallel vision / robotics applications quickly. Multiple vision
based applications have already been implemented, including academic, research related fields and prototypes
for industrial automation. For the scientific community the system has been recently launched open-source.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
HP's digital printing presses consume a tremendous amount of data. The architectures of the Digital Front Ends (DFEs)
that feed these large, very fast presses have evolved from basic, single-RIP (Raster Image Processor) systems to multirack,
distributed systems that can take a PDF file and deliver data in excess of 3 Gigapixels per second to keep the
presses printing at 2000+ pages per minute. This paper highlights some of the more interesting parallelism features of
our DFE architectures.
The high-performance architecture developed over the last 5+ years can scale up to HP's largest digital press, out to
multiple mid-range presses, and down into a very low-cost single box deployment for low-end devices as appropriate.
Principles of parallelism pervade every aspect of the architecture, from the lowest-level elements of jobs to parallel
imaging pipelines that feed multiple presses.
From cores to threads to arrays to network teams to distributed machines, we use a systematic approach to move
bottlenecks. The ultimate goals of these efforts are: to take the best advantage of the prevailing hardware options at our
disposal; to reduce power consumption and cooling requirements; and to ultimately reduce the cost of the solution to our customers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Advances in the image processing field have brought new methods which are able to perform complex tasks robustly.
However, in order to meet constraints on functionality and reliability, imaging application developers often
design complex algorithms with many parameters which must be finely tuned for each particular environment.
The best approach for tuning these algorithms is to use an automatic training method, but the computational
cost of this kind of training method is prohibitive, making it inviable even in powerful machines. The same
problem arises when designing testing procedures. This work presents methods to train and test complex image
processing algorithms in parallel execution environments. The approach proposed in this work is to use existing
resources in offices or laboratories, rather than expensive clusters. These resources are typically non-dedicated,
heterogeneous and unreliable. The proposed methods have been designed to deal with all these issues. Two methods are proposed: intelligent training based on genetic algorithms and PVM, and a full factorial design based on grid computing which can be used for training or testing. These methods are capable of harnessing the available computational power resources, giving more work to more powerful machines, while taking its unreliable nature into account. Both methods have been tested using real applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We describe here a system consisting of multiple, relatively inexpensive marking engines. The marking engines are
interconnected using highly reconfigurable paper paths. The paths are composed of hypermodules (bidirectional nip
assemblies and sheet director assemblies) each of which has its own computation, sensing, actuation, and
communications capabilities. Auto-identification is used to inform a system level controller of the potential paths
through the system as well as module capabilities. Motion control of cut sheets, which of necessity reside physically
within multiple hypermodules simultaneously, requires a new abstraction, namely a sheet controller which coordinates
control of a given sheet as it moves through the system. Software/hardware co-design has provided a system architecture
that is scalable without requiring user relearning. Here the capabilities are described of an exemplary system consisting
of 160 modular entities and four marking engines. The throughput of the system is very nearly four times that of a single print engine.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging
tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more
complicated than assigning individual images to individual processors. However, there are three less trivial categories of
parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by
meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical
character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially
decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a
different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel
pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a
map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally,
parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously.
This approach may result in improved accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work presents a framework for fast texture analysis in computer vision. The speedup is obtained using General-
Purpose Processing on Graphics Processing Units (GPGPU technology). For this purpose, we have selected the
following texture analysis techniques: LBP (Local Binary Patterns), LTP (Local Ternary Patterns), Laws texture kernels
and Gabor filters. GPU optimizations are compared to CPU optimizations using MMX-SSE technologies and Multicore
parallel programming. The experimental results show an important increase in the performance of the proposed
algorithms when GPGPU is used particularly for large image sizes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization
because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The
concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They
have been less used than their properties might suggest due to their high computational cost.
We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which
makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per
second in voxel datasets about 2003 in size.
In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These
include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with
the high volume of input data so that it does not become a bottleneck for the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the problem of airport runway segmentation in satellite images with complex background
clutter. To this ends, we propose a novel ensemble learning based parallel runway segmentation algorithm. The
contributions of our work can be summarized as follows: (a) we propose the concept of Priority Directional Region
Growing. (b)We introduce the Bresenham's line generating algorithm into our segmentation task to better utilize
the structural a priori. (c) we adopt a two-stage strategy to better segment the regions corresponding to the
airport runway by applying the traditional region growing method and our priority directional (two orthogonal
directions in our problem) region growing method sequentially. (d) In our runway segmentation algorithm, the
ensemble-learning strategy is used to combine the growing results of each detected line segment. In addition,
those thin side branches, which have significantly different width, are eliminated. To evaluate the effectiveness
of our algorithm, extensive simulations are carried out on the testing images obtained from Google Map. Our
experimental results show that the proposed algorithm can effectively and efficiently segmented the airport
region, generate relatively neat boundaries of the runways, and have great superiority over the state-of-the-art
methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper discusses the experimental results of our visualization model for data extracted from sensors. The
objective of this paper is to find a computationally efficient method to produce a real time rendering visualization
for a large amount of data. We develop visualization method to monitor temperature variance of a data center.
Sensors are placed on three layers and do not cover all the room. We use particle paradigm to interpolate data
sensors. Particles model the "space" of the room. In this work we use a partition of the particle set, using two
mathematical methods: Delaunay triangulation and Vorono¨ý cells. Avis and Bhattacharya present these two
algorithms in. Particles provide information on the room temperature at different coordinates over time. To
locate and update particles data we define a computational cost function. To solve this function in an efficient
way, we use a client server paradigm. Server computes data and client display this data on different kind of
hardware. This paper is organized as follows. The first part presents related algorithm used to visualize large
flow of data. The second part presents different platforms and methods used, which was evaluated in order to
determine the better solution for the task proposed. The benchmark use the computational cost of our algorithm
that formed based on located particles compared to sensors and on update of particles value. The benchmark was
done on a personal computer using CPU, multi core programming, GPU programming and hybrid GPU/CPU.
GPU programming method is growing in the research field; this method allows getting a real time rendering
instates of a precompute rendering. For improving our results, we compute our algorithm on a High Performance
Computing (HPC), this benchmark was used to improve multi-core method. HPC is commonly used in data
visualization (astronomy, physic, etc) for improving the rendering and getting real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a highly effective and efficient ensemble learning based parallel impulse noise detection algorithm is
proposed. The contribution of this paper is three-fold. First, we propose a novel intensity homogeneity metric-
Directional Homogeneity Descriptor(DHD), which has very powerful discriminative ability and has been proven
in our previous work. Second, this proposed algorithm has high parallelism in feature extraction stage, classifier
training, and testing stage. And the proposed architecture are very suitable for distributed processing. Finally,
instead of manually tune the thresholds for each feature as most of the works in this research area do, we
use Random Forest to make decision since it has been demonstrated to own better generalization ability and
performance comparable to SVM or Boosting in classification problem. Another important reason we adopt
Random Forest is that it has natural parallelism structure and very significant performance advantage (e.g. the
overhead of training and testing the model is very low ) over other popular classifiers e.g. SVM or Boosting.
To the best of our knowledge, this is the first time ensemble learning strategies have been used in the area
of switching median filtering. Extensive simulations are carried out on several most common standard testing
images. The experimental results show that our algorithm achieves zero miss detection results while keeping the
false alarm rate at a rather low level and has great superiority over other state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tetrahedral interpolation is commonly used to implement continuous color space conversions from sparse 3D and 4D
lookup tables. We investigate the implementation and optimization of tetrahedral interpolation algorithms for GPUs, and
compare to the best known CPU implementations as well as to a well known GPU-based trilinear implementation. We
show that a $500 NVIDIA GTX-580 GPU is 3x faster than a $1000 Intel Core i7 980X CPU for 3D interpolation, and 9x
faster for 4D interpolation.
Performance-relevant GPU attributes are explored including thread scheduling, local memory characteristics, global
memory hierarchy, and cache behaviors. We consider existing tetrahedral interpolation algorithms and tune based on the
structure and branching capabilities of current GPUs. Global memory performance is improved by reordering and
expanding the lookup table to ensure optimal access behaviors. Per multiprocessor local memory is exploited to
implement optimally coalesced global memory accesses, and local memory addressing is optimized to minimize bank
conflicts. We explore the impacts of lookup table density upon computation and memory access costs.
Also presented are CPU-based 3D and 4D interpolators, using SSE vector operations that are faster than any previously
published solution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Retinex is an image restoration method that can restore the image's original appearance. The Retinex algorithm utilizes a
Gaussian blur convolution with large kernel size to compute the center/surround information. Then a log-domain
processing between the original image and the center/surround information is performed pixel-wise. The final step of the
Retinex algorithm is to normalize the results of log-domain processing to an appropriate dynamic range.
This paper presents a GPURetinex algorithm, which is a data parallel algorithm devised by parallelizing the Retinex
based on GPGPU/CUDA. The GPURetinex algorithm exploits GPGPU's massively parallel architecture and hierarchical
memory to improve efficiency. The GPURetinex algorithm is a parallel method with hierarchical threads and data
distribution. The GPURetinex algorithm is designed and developed optimized parallel implementation by taking full
advantage of the properties of the GPGPU/CUDA computing.
In our experiments, the GT200 GPU and CUDA 3.0 are employed. The experimental results show that the
GPURetinex can gain 30 times speedup compared with CPU-based implementation on the images with 2048 x 2048
resolution. Our experimental results indicate that using CUDA can achieve acceleration to gain real-time performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications.
In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization
strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these
strategies are different ways Canny edge detection can be parallelized, as well as differences in data management.
The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level
parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values
over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages,
where each subimage is processed independently, in parallel. The results of the two strategies show that for the
same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the
compiler managed, loop-level parallelism implemented with OpenMP.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the growing popularity of portable multimedia display devices and wide availability of high-definition video
content, the transcoding of high-resolution videos into lower resolution ones with different formats has become a crucial
challenge for PC platforms. This paper presents our study on the leveraging of the Unified Video Decoder (UVD)
provided by the graphics processor unit (GPU) for achieving high-speed video transcoding with low CPU usage. Our
experimental results show off-loading video decoding and video scaling to the GPU can double transcoding speed with
only half the CPU usage compared to in-box software decoders for transcoding 1080p (1920x1080) video content on an
AMD Vision processor with an integrated graphics unit.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two-dimensional image deconvolution is an important and well-studied problem with applications to image deblurring
and restoration. Most of the best deconvolution algorithms use natural image statistics that act as priors to regularize the
problem. Recently, Krishnan and Fergus provide a fast deconvolution algorithm that yields results comparable to the
current state of the art. They use a hyper-Laplacian image prior to regularize the problem. The resulting optimization
problem is solved using alternating minimization in conjunction with a half-quadratic penalty function. In this paper, we
provide an efficient CUDA implementation of their algorithm on the GPU. Our implementation leverages many wellknown
CUDA optimization techniques, as well as several others that have a significant impact on this particular
algorithm. We discuss each of these, as well as make a few observations regarding the CUFFT library. Our experiments
were run on an Nvidia GeForce GTX 260. For a single channel image of size 710 x 470, we obtain over 40 fps, while on
a larger image of size 1900 x 1266, we get almost 6 fps (without counting disk I/O). In addition to linear performance,
we believe ours is the first implementation to perform deconvolutions at video rates. Our running times also
demonstrate that our GPU implementation is over 27 times faster than the original CPU implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper addresses the problem of stitching Giga Pixel images from airborne images acquired over multiple flight
paths of Costa Rica in 2005. The set of input images contains about 10,158 images, each of size around 4072x4072
pixels, with very coarse georeferencing information (latitude and longitude of each image). Given the spatial coverage
and resolution of the input images, the final stitched color image is 294,847 by 269,195 pixels (79.3 Giga Pixels) and
corresponds to 238.2 GigaBytes. An assembly of such large images requires either hardware with large shared memory
or algorithms using disk access in tandem with available RAM providing data for local image operation. In addition to
I/O operations, the computations needed to stitch together image tiles involve at least one image transformation and
multiple comparisons to place the pixels into a pyramid representation for fast dissemination. The motivation of our
work is to explore the utilization of multiple hardware architectures (e.g., multicore servers, computer clusters) and
parallel computing to minimize the time needed to stitch Giga pixel images.
Our approach is to utilize the coarse georeferencing information for initial image grouping followed by an intensitybased
stitching of groups of images. This group-based stitching is highly parallelizable. The stitching process results in
image patches that can be cropped to fit a tile of an image pyramid frequently used as a data structure for fast image
access and retrieval. We report our experimental results obtained when stitching a four Giga Pixel image from the input
images at one fourth of their original spatial resolution using a single core on our eight core server and our preliminary
results for the entire 79.3 Gigapixel image obtained using a 120 core computer cluster.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of
this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality,
compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness
techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will
postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his
algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area
where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered
enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their
underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system
design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited
to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are
highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems
clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing
architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the
structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for
parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs
against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application,
we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined
subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate the suitability of the GPU for a parallel implementation of the pinwheel error
diffusion. We demonstrate a high-performance GPU implementation by efficiently parallelizing and unrolling the
image processing algorithm. Our GPU implementation achieves a 10 - 30x speedup over a two-threaded CPU
error diffusion implementation with comparable image quality. We have conducted experiments to study the
performance and quality tradeoffs for differences in image block sizes. We also present a performance analysis
at assembly level to understand the performance bottlenecks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the release of an eight core Xeon processor by Intel and a twelve core Opteron processor by AMD in the spring of
2010, the increase of multiple cores per chip package continues. Multiple core processors are common place in most
workstations sold today and are an attractive option for increasing imaging performance. Visual attention models are
very compute intensive, requiring many imaging algorithms to be run on images such as large difference of Gaussian
filters, segmentation, and region finding. In this paper we present our experience in optimizing the performance of a
visual attention model on standard multi-core Windows workstations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Graphical Processing Units (GPU) architectures are massively used for resource-intensive
computation. Initially dedicated to imaging, vision and graphics, these architectures serve nowadays a wide range
of multi-purpose applications. The GPU structure, however, does not suit to all applications. This can lead to
performance shortage. Among several applications, the aim of this work is to analyze GPU structures for image
analysis applications in multispectral to ultraspectral imaging. Algorithms used for the experiments are
multispectral and hyperspectral imaging dedicated to art authentication. Such algorithms use a high number of
spatial and spectral data, along with both a high number of memory accesses and a need for high storage
capacity. Timing performances are compared with CPU architecture and a global analysis is made according to
the algorithms and GPU architecture. This paper shows that GPU architectures are suitable to complex image
analysis algorithm in multispectral.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have investigated the computational scalability of image pyramid building needed for dissemination of very large
image data. The sources of large images include high resolution microscopes and telescopes, remote sensing and
airborne imaging, and high resolution scanners. The term 'large' is understood from a user perspective which means
either larger than a display size or larger than a memory/disk to hold the image data. The application drivers for our
work are digitization projects such as the Lincoln Papers project (each image scan is about 100-150MB or about
5000x8000 pixels with the total number to be around 200,000) and the UIUC library scanning project for historical maps
from 17th and 18th century (smaller number but larger images). The goal of our work is understand computational
scalability of the web-based dissemination using image pyramids for these large image scans, as well as the preservation
aspects of the data. We report our computational benchmarks for (a) building image pyramids to be disseminated using
the Microsoft Seadragon library, (b) a computation execution approach using hyper-threading to generate image
pyramids and to utilize the underlying hardware, and (c) an image pyramid preservation approach using various hard
drive configurations of Redundant Array of Independent Disks (RAID) drives for input/output operations. The benchmarks are obtained with a map (334.61 MB, JPEG format, 17591x15014 pixels). The discussion combines the speed and preservation objectives.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present real-time 3D image processing of flash ladar data using our recently developed GPU parallel processing
kernels. Our laboratory and airborne experiences with flash ladar focal planes have shown that per laser flash, typically
only a small fraction of the pixels on the focal plane array actually produce a meaningful range signal. Therefore, to
optimize overall data processing speed, the large quantity of uninformative data are filtered out and removed from the
data stream prior to the mathematically intensive point cloud transformation processing. This front-end pre-processing,
which largely consists of control flow instructions, is specific to the particular type of flash ladar focal plane array being
used and is performed by the computer's CPU. The valid signals along with their corresponding inertial and navigation
metadata are then transferred to a GPU device to perform range-correction, geo-location, and ortho-rectification on each
3D data point so that data from multiple frames can be properly tiled together either to create a wide-area map or to
reconstruct an object from multiple look angles. GPU parallel processing kernels were developed using OpenCL. Postprocessing
to perform fine registration between data frames via complex iterative steps also benefits greatly from this
type of high-performance computing. The performance improvements obtained using GPU processing to create
corrected 3D images and for frame-to-frame fine-registration are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a fast iterative magnetic resonance imaging (MRI) reconstruction algorithm taking advantage of
the prevailing GPGPU programming paradigm. In clinical environment, MRI reconstruction is usually performed via
fast Fourier transform (FFT). However, imaging artifacts (i.e. signal loss) resulting from susceptibility-induced magnetic
field inhomogeneities degrade the quality of reconstructed images. These artifacts must be addressed using accurate
modeling of the physics of the system coupled with iterative reconstruction. We have developed a reconstruction
algorithm with improved image quality at the expense of computation time and hence an implementation on GPUs
achieving significant speedup. In this work, we extend our previous work on GPU implementation by adding several
new features. First, we enable Sensitivity Encoding for Fast MRI (SENSE) reconstruction (from data acquired using a
multi-receiver coil array) which can reduce the acquisition time. Besides, we have implemented a GPU-based total
variation regularization in our SENSE reconstruction framework. In this paper, we describe the different optimizations
employed from levels of algorithm, program code structures, and specific architecture performance tuning, featuring
both our MRI reconstruction algorithm and GPU hardware specifics. Results show that the current GPU implementation produces accurate image estimates while significantly accelerating the reconstruction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally
costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for
parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities
to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators
with programmable pipelines are available, enabling the GPGPU implementation of several image processing
algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture
features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the
GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing
techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and
performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the
challenges of designing on a mobile platform, present the performance achieved and provide measurement results
for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional multi-view stereo reconstruction via volumetric graph cuts formulates the 3D reconstruction problem as a
computationally tractable global optimization using graph cuts. It benefits from a volumetric scene representation and
discrete photo consistency is defined on the edge cost with a weighted graph. As the independence between each discrete
voxel, it is natural to do the parallel processing with multi-core CPUs or GPU, but after the photo consistency has been
estimated, it still need to design a parallel optimized methods to get the optimized labeling results for each voxel. In our
paper, we use the parallel volumetric graph cuts methods to solve the above problems. Our algorithm has two main steps,
clustering step and parallel graph cuts optimization step. We also introduce an approach for enhancing accuracy and
speeding up existing Multi-view 3D reconstruction methods, which based on volumetric graph cuts. The main idea is to
decompose the collected photos into some overlapping sets, while the voxels are also be clustered. The voxels
consistency estimating and surface labeling with graph cuts are processed in parallel, however, the labels of the
overlapped voxels may in general have multiple label solutions. It will be constrained to be equal to obtain a unique
solution in parallel graph cuts optimization step.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As commercial printing presses become faster, cheaper and more efficient, so too must the Raster Image Processors
(RIP) that prepare data for them to print. Digital press RIPs, however, have been challenged to on the one hand meet the
ever increasing print performance of the latest digital presses, and on the other hand process increasingly complex
documents with transparent layers and embedded ICC profiles.
This paper explores the challenges encountered when implementing a GPU accelerated driver for the open source
Ghostscript Adobe PostScript and PDF language interpreter targeted at accelerating PDF transparency for high speed
commercial presses. It further describes our solution, including an image memory manager for tiling input and output
images and documents, a PDF compatible multiple image layer blending engine, and a GPU accelerated ICC v4
compatible color transformation engine. The result, we believe, is the foundation for a scalable, efficient, distributed RIP
system that can meet current and future RIP requirements for a wide range of commercial digital presses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents a low cost FPGA based solution for a real-time infrared small target tracking system. A specialized
architecture is presented based on a soft RISC processor capable of running kernel based mean shift tracking algorithm.
Mean shift tracking algorithm is realized in NIOS II soft-core with SOPC (System on a Programmable Chip) technology.
Though mean shift algorithm is widely used for target tracking, the original mean shift algorithm can not be directly used
for infrared small target tracking. As infrared small target only has intensity information, so an improved mean shift
algorithm is presented in this paper. How to describe target will determine whether target can be tracked by mean shift
algorithm. Because color target can be tracked well by mean shift algorithm, imitating color image expression, spatial
component and temporal component are advanced to describe target, which forms pseudo-color image. In order to
improve the processing speed parallel technology and pipeline technology are taken. Two RAM are taken to stored
images separately by ping-pong technology. A FLASH is used to store mass temp data. The experimental results show
that infrared small target is tracked stably in complicated background.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.