Deep Neural Networks (DNNs) have emerged as the reference processing architecture for the implementation of multiple computer vision tasks. They achieve much higher accuracy than traditional algorithms based on shallow learning. However, it comes at the cost of a substantial increase of computational resources. This constitutes a challenge for embedded vision systems performing edge inference as opposed to cloud processing. In such a demanding scenario, several open-source frameworks have been developed, e.g. Caffe, OpenCV, TensorFlow, Theano, Torch or MXNet. All of these tools enable the deployment of various state-of-the-art DNN models for inference, though each one relies on particular optimization libraries and techniques resulting in different performance behavior. In this paper, we present a comparative study of some of these frameworks in terms of power consumption, throughput and precision for some of the most popular Convolutional Neural Networks (CNN) models. The benchmarking system is Raspberry Pi 3 Model B, a low-cost embedded platform with limited resources. We highlight the advantages and limitations associated with the practical use of the analyzed frameworks. Some guidelines are provided for suitable selection of a specific tool according to prescribed application requirements.
This paper describes a prototype smart imager capable of adjusting the photo-integration time of multiple regions of interest concurrently, automatically and asynchronously with a single exposure period. The operation is supported by two intertwined photo-diodes at pixel level and two digital registers at the periphery of the pixel matrix. These registers divide the focal-plane into independent regions within which automatic concurrent adjustment of the integration time takes place. At pixel level, one of the photo-diodes senses the pixel value itself whereas the other, in collaboration with its counterparts in a particular ROI, senses the mean illumination of that ROI. Additional circuitry interconnecting both photo-diodes enables the asynchronous adjustment of the integration time for each ROI according to this sensed illumination. The sensor can be reconfigured on-the-fly according to the requirements of a vision algorithm.
The aim of this article is to guide image sensors designers to optimize the analog-to-digital conversion of pixel outputs. The most common ADCs topologies for image sensors are presented and discussed. The ADCs specific requirements for these sensors are analyzed and quantified. Finally, we present relevant recent contributions of specific ADCs for image sensors and we compare them using a novel FOM.
In computer vision, local descriptors permit to summarize relevant visual cues through feature vectors. These vectors constitute inputs for trained classifiers which in turn enable different high-level vision tasks. While local descriptors certainly alleviate the computation load of subsequent processing stages by preventing them from handling raw images, they still have to deal with individual pixels. Feature vector extraction can thus become a major limitation for conventional embedded vision hardware. In this paper, we present a power-efficient sensing processing array conceived to provide the computation of integral images at different scales. These images are intermediate representations that speed up feature extraction. In particular, the mixed-signal array operation is tailored for extraction of Haar-like features. These features feed the cascade of classifiers at the core of the Viola-Jones framework. The processing lattice has been designed for the standard UMC 0.18μm 1P6M CMOS process. In addition to integral image computation, the array can be reprogrammed to deliver other early vision tasks: concurrent rectangular area sum, block-wise HDR imaging, Gaussian pyramids and image pre-warping for subsequent reduced kernel filtering.
Early vision stages represent a considerably heavy computational load. A huge amount of data needs to be
processed under strict timing and power requirements. Conventional architectures usually fail to adhere to the
specifications in many application fields, especially when autonomous vision-enabled devices are to be implemented,
like in lightweight UAVs, robotics or wireless sensor networks. A bioinspired architectural approach can
be employed consisting of a hierarchical division of the processing chain, conveying the highest computational
demand to the focal plane. There, distributed processing elements, concurrent with the photosensitive devices,
influence the image capture and generate a pre-processed representation of the scene where only the information
of interest for subsequent stages remains. These focal-plane operators are implemented by analog building
blocks, which may individually be a little imprecise, but as a whole render the appropriate image processing very
efficiently. As a proof of concept, we have developed a 176x144-pixel smart CMOS imager that delivers lighter
but enriched representations of the scene. Each pixel of the array contains a photosensor and some switches and
weighted paths allowing reconfigurable resolution and spatial filtering. An energy-based image representation is
also supported. These functionalities greatly simplify the operation of the subsequent digital processor implementing
the high level logic of the vision algorithm. The resulting figures, 5.6mW@30fps, permit the integration
of the smart image sensor with a wireless interface module (Imote2 from Memsic Corp.) for the development of
vision-enabled WSN applications.
Single-photon avalanche diodes are compatible with standard CMOS. It means that photo-multipliers for scintillation
detectors in nuclear medicine (i. e. PET, SPECT) can be built in inexpensive technologies. These
silicon photo-multipliers consist in arrays of, usually passively-quenched, SPADs whose output current is sensed
by some analog readout circuitry. In addition to the implementation of photosensors that are sensitive to singlephoton
events, analog, digital and mixed-signal processing circuitry can be included in the same CMOS chip.
For instance, the SPAD can be employed as an event detector, and with the help of some in-pixel circuitry, a
digitized photo-multiplier can be built in which every single-photon detection event is summed up by a counter.
Moreover, this concurrent processing circuitry can be employed to realize low level image processing tasks. They
can be efficiently implemented by this architecture given their intrinsic parallelism. Our proposal is to operate
onto the light-induced signal at the focal plane in order to obtain a more elaborated record of the detection.
For instance, by providing some characterization of the light spot. Information about the depth-of-interaction,
in scintillation detectors, can be derived from the position and shape of the scintillation light distribution. This
will ultimately have an impact on the spatial resolution that can be achieved. We are presenting the design in
CMOS of an array of detector cells. Each cell contains a SPAD, an MOS-based passive quenching circuit and
drivers for the column and row detection lines.
Gaussian filtering is a basic tool for image processing. Noise reduction, scale-space generation or edge detection
are examples of tasks where different Gaussian filters can be successfully utilized. However, their implementation
in a conventional digital processor by applying a convolution kernel throughout the image is quite inefficient.
Not only the value of every single pixel is taken into consideration sucessively, but also contributions from their
neighbors need to be taken into account. Processing of the frame is serialized and memory access is intensive
and recurrent. The result is a low operation speed or, alternatively, a high power consumption. This inefficiency
is specially remarkable for filters with large variance, as the kernel size increases significantly. In this paper, a
different approach to achieve Gaussian filtering is proposed. It is oriented to applications with very low power
budgets. The key point is a reconfigurable focal-plane binning. Pixels are grouped according to the targeted
resolution by means of a division grid. Then, two consecutive shifts of this grid in opposite directions carry
out the spread of information to the neighborhood of each pixel in parallel. The outcome is equivalent to the
application of a 3×3 binomial filter kernel, which in turns is a good approximation of a Gaussian filter, on the
original image. The variance of the closest Gaussian filter is around 0.5. By repeating the operation, Gaussian
filters with larger variances can be achieved. A rough estimation of the necessary energy for each repetition until
reaching the desired filter is below 20nJ for a QCIF-size array. Finally, experimental results of a QCIF proofof-
concept focal-plane array manufactured in 0.35μm CMOS technology are presented. A maximum RMSE of
only 1.2% is obtained by the on-chip Gaussian filtering with respect to the corresponding equivalent ideal filter
implemented off-chip.
Stand-alone applications of vision are severely constrained by their limited power budget. This is one of the
main reasons why vision has not yet been widely incorporated into wireless sensor networks. For them, image
processing should be suscribed to the sensor node in order to reduce network traffic and its associated power
consumption. In this scenario, operating the conventional acquisition-digitization-processing chain is unfeasible
under tight power limitations. A bio-inspired scheme can be followed to meet the timing requirements while
maintaining a low power consumption. In our approach, part of the low-level image processing is conveyed to the
focal-plane thus speeding up system operation. Moreover, if a moderate accuracy is permissible, signal processing
is realized in the analog domain, resulting in a highly efficient implementation. In this paper we propose a circuit
to realize dynamic texture segmentation based on focal-plane spatial bandpass filtering of image subdivisions.
By the appropriate binning, we introduce some constrains into the spatial extent of the targeted texture. By
running time-controlled linear diffusion within each bin, a specific band of spatial frequencies can be highlighted.
Measuring the average energy of the components in that band at each image bin the presence of a targeted
texture can be detected and quantified. The resulting low-resolution representation of the scene can be then
employed to track the texture along an image flow. An application specific chip, based on this analysis, is being
developed for natural spaces monitoring by means of a network of low-power vision systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.