PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 10670, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work is orientated towards time optimization of the hyperspectral images classification. This kind of images represents an immense computational cost in the course of processing, particularly in tasks such as feature extraction and classification. In fact, numerous techniques in the state-of-the-art have suggested a reduction in the dimension of the information. Nevertheless, real-time applications require a fast information shrinkage with a feature extraction included in order to conduce to an agile classification. To solve the mentioned problem, this study is composed of a time and algorithm complexity comparison between three different transformations: Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT). Furthermore, three feature selection criteria are likewise analyzed: Jeffrries-Matusita Distance (JMD), Spectral Angle Mapper (SAM) and the unsupervised algorithm N-FINDR. An application that takes into consideration the study previously described is developed performing the parallel programming paradigm in multicore mode via utilizing a cluster of two Raspberry Pi units and, comparing it in time and algorithm complexity with the sequential paradigm. Moreover, a Support Vector Machine (SVM) is incorporated in the application to perform the classification. The images employed to test the algorithms were acquired by the Hyperion sensor, the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), and the Reflective Optics System Imaging Spectrometer (ROSIS).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the paper, a novel approach to the enhancement of color images corrupted by impulsive noise is presented. The proposed algorithm first calculates for every image pixel the distances in the RGB color space to all elements belonging to the filtering window. Then, a sum of a specified number of smallest distances, which serves as a measure of pixel similarity, is calculated. This generalization of the Rank-Ordered Absolute Difference (ROAD) is robust to outliers, as the high distances are not considered when calculating this measure. Next, for each pixel, a neighbor with smallest ROAD value is searched for. If such a pixel is found, then the filtering window is moved to a new position and again a neighbor, with ROAD measure lower than the initial value is looked for. If it is encountered, the window is moved again, otherwise the process is terminated and the starting pixel is replaced with the last pixel in the path formed by the iterative procedure of the window shifting. The comparison with the filters intended for the removal of noise in color images revealed excellent properties of the new enhancement technique. It is very fast, as the ROAD values can be pre-computed, and the formation of the paths needs only comparisons of scalar values. The proposed technique can be applied for the restoration of color images distorted by impulsive noise and can also be used as a method of edge sharpening. Its low computational complexity allows also for its application in the processing of video sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video is fast becoming the most common medium for media content in the present era. It is especially helpful in security situations for the detection of criminal or threat-related activity. Police routinely use videos as evidence in the analysis of criminal cases. It is important in such applications to get a high-quality still image from such videos. However, there are situations where the images are blurred and have artifacts as they are extracted from moving video repositories. A practical solution to this problem is to sharpen these images using advanced processing techniques to obtain higher display quality. Due to vast amount of data, it is extremely important that any such enhancement technique satisfy real-time processing constraints, in order for it to be usable by the end user. In this paper, a blind image sharpness metric is proposed using a combination of edge and textural features. Edges can be detected using different methods like Canny, Sobel, Prewitt and Roberts that are commonly accepted in the image processing literature. The Canny edge detection method typically provides better results due to extra processing steps and can be effectively used as a model feature extractor for the image. Wavelet processing based on the db2, sym4, and haar is also utilized to extract texture features. The normalized luminance coefficients of natural images are known to obey the generalized Gaussian probability distribution. Consequently, this characteristic is utilized to extract statistical features in the regions of interest (ROI) and regions of non-interest respectively. The extracted features are then merged together to obtain the sharpened image. The principle behind image formation is to merge the wavelet decompositions of the two original images using fusion methods applied to the approximation and details coefficients. The two images must be of the same size and are supposed to be associated with indexed images on a common color map. It is worth noting that the image fusion results are more consistent with human subjective visual perception of image quality, ground truth data for which is obtained from publicly available databases. Popular standard images such as Cameraman and Lena are used for the experiments. Results also show that the proposed method provides better objective quality than competing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the major concerns within the health system in the United States is the higher number of infant deaths (23,215 in 2014). This has triggered the need for intensive care facilities for at-risk infants and especially for neonates. Premature infants in Neonatal Intensive Care Units (NICU) need constant monitoring due to conditions like bradycardia, apnea and hypoxia that can lead to dangerous conditions including death. A contact-less method to record critical vital signs is a subject of interest for neonates as their skin is fragile and damageable by traditional sensors. Video monitoring of infants in NICU can be one of the solutions to this problem. Automated analysis of video feeds as opposed to manual approaches can render promising results. One of the important cues to detect the occurrence of conditions such as bradycardia is the pulse rate of the subject. In this paper, we present an approach to monitor patient’s pulse rate using video processing algorithms. A multi-step procedure was designed and tested on several subjects. Video from the frontal facial pose was captured and a region of interest (ROI) was selected. Statistical features such as the gray level average were extracted from the ROI in each frame and plotted as a function of time after Gaussian Smoothing. The feature signals were then de-noised using Maximal Overlap Discrete Wavelet Transform (MODWT). Filter banks tuned to the application were designed using bandpass cutoff frequencies and applied to the signal. The output signal resembled the actual pulse rate to a high degree of accuracy. Using Welch approximation, the Power Spectral Density (PSD) of the output signal was determined to display pulse rate. Further work to perform signal processing steps in the spatial domain is planned so that real time display of pulse rate will be possible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The human visual system registers electromagnetic waves lying in a 390 to 700 nm wavelength range. While visible light provides humans with sufficient guidance for everyday activities, a large amount of information remains unregistered. However, electromagnetic radiation outside the visible range can be registered using cameras and sensors. Due to the multiplexing of visible light and additional wavelengths, the resolution drops significantly. To improve the resolution, we propose a GPU based joint method for demosaicking, denoising and superresolution. In order to interpolate missing pixel values for all four wavelengths, we first extract high pass image features from all types of pixels in the mosaic. Using this information we perform directional interpolation, to preserve continuities of edges present in all four component images. After the initial interpolation, we introduce high spatial content from other frequency bands, giving preference to original over the interpolated edges. Moreover, we perform the refinement and upsampling of the demosaicked image by introducing information from previous frames. Motion compensation relies on a subpixel block-based motion estimation algorithm, relying on all 4 chromatic bands, and performs regularization to reduce estimation errors and related artifacts in the interpolated images. We perform experiments using the mosaic consisting of red, green, blue and near-infrared pixels (850nm). The proposed algorithm is implemented on Jetson TX 2 platform, achieving 120 fps at QVGA resolution. It operates recursively, requiring only one additional frame buffer for the previous results. The results of the proposed method compared favorably to the state-of-the-art multispectral demosaicing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we introduce a novel local image descriptor, which is very efficient to compute densely. We also present an algorithm to compute dense depth maps from image pairs using designed descriptor. Novel descriptor is based on visual primitives and relations between them, namely coplanarity, cocolority, distance, and angle. Designed feature descriptor covers both geometric and appearance information. The depth map estimation performance is evaluated using the established bad matched pixel metric. An analysis of the feature descriptor employing a parallel programming paradigm is included to develop a possible real-time mode. This is performed with help of hardware based on multi-core processors and GPU platform, using a NVIDIA ® GeForce ® GT640 graphic card and Matlab over a PC with Windows 10.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Blood vessels segmentation in fundus image is a requiring step in order to detect retinopathies. A higher performing segmentation was been proposed in [12]. It consists at three dependent stages: Provide two binary images to extract wide vessels, compute features of the remaining pixels on binary images in order to extract fine vessels, and then combine both wide and fine vessels. The segmentation execution time is about 3-12 seconds when it is performed with fundus image having resolutions between 768*584 and 999*960. These latest resolutions are quite smaller than ones provided by actual retinograph, which leads to a higher rise on execution time. In this paper, we propose a parallelism strategy of the segmentation approach for implementation in Shared Memory Parallel Machine (SMPM). First, both binary images are provided in parallel. Thereafter, features processing is split according to their computational complexities. At the later stage, wide vessels and fine vessels images are subdivided adequately in the objective of a parallel combination. The parallel strategy is implemented using OpenCV and then assessed on STARE public data sets. Experimental analyses of execution time and efficiency are presented and discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
All the High Energy Physics (HEP) experiments in modern accelerators can be seen as real-time imaging devices whose main target is the reconstruction of the trajectories of the particles generated directly in the beam interactions - e.g. interaction between proton-proton beams in the Large Hadron collider (LHC) at CERN – or in the decay of other particles. Silicon imaging sensors are segmented in pixel arrays (50 μm x 50 μm) and they are bonded to custom Front-End (FE) ASICs. The data rate generated amounts to hundreds of Gbps for each FE ASIC. Similar scenario characterized the array of detectors in nuclear medicine systems such as PET (Positron Emission Tomography) scanners. Within this scenario, the paper presents the L1-trigger processor, which acts as an image compression processor with a compression factor 40:1. The paper also presents the silicon photonic Mach Zehnder Modulator with the relevant high-speed driver to transfer the multi-Gbps data rate with a tolerance to radiation damage up to 1 Grad.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a computationally efficient pipeline to achieve 3D point cloud reconstruction from video sequences. This pipeline involves a key frame selection step to improve the computational efficiency by generating reliable depth information from pair-wise frames. An outlier removal step is then applied in order to further improve the computational efficiency. The reconstruction is achieved based on a new absolute camera pose recovery approach in a computationally efficient manner. This pipeline is devised for both sparse and dense 3D reconstruction. The results obtained from video sequences exhibit higher computational efficiency and lower re-projection errors of the introduced pipeline compared to the existing pipelines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computed tomography (CT) imaging became an indispensable modality exploited across a vast spectrum of clinical indications for diagnosis and follow-up, alongside various image-guided procedures, especially in patients with lung cancer. Accurate lung segmentation from whole-body CT scans is an initial, yet extremely important step in such procedures. Therefore, fast and robust (against low-quality data) segmentation techniques are being actively developed. In this paper, we propose a new real-time algorithm for segmenting lungs from the entire body CT scans. Our method benefits from both 2D and 3D analysis of CT images, coupled with several fast pruning strategies to remove false-positive tissue areas, including trachea and bronchi. Also, we developed a new approach for separating lungs which exploits spatial analysis of lung candidates. Our algorithms were implemented in Adaptive Vision Studio (AVS)|a visual-programming software suite based on the data-ow paradigm. Although AVS is extensively used in machine-vision industrial applications (it is equipped with a range of highly optimized image-processing routines), we showed it can be easily utilized in general data analysis applications, including medical imaging. Experimental study performed on a benchmark dataset manually annotated by an experienced reader revealed that our algorithm is very fast (average processing time of an entire CT series is less than 1.5 seconds), and it is competitive against the state of the art, delivering high-quality and consistent results (DICE was above 0.97 for both lungs; 0.96 for the left and 0.95 for the right lung after separation). The quantitative analysis was backed up with thorough qualitative investigation (including 2D and 3D visualizations) and statistical tests.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cortical surface extraction from magnetic resonance (MR) scans is a preliminary, yet crucial step in brain segmentation and analysis. Although there are many algorithms that address this problem, they often sacrifice execution speed for accuracy or they depend on many parameters that have to be tuned manually by an experienced practitioner. Therefore fast, accurate and autonomous cortical surface extraction algorithms are in high demand and they are being actively developed to enable clinicians to appropriately plan a treatment pathway and quantify response in patients with brain lesions based on precise image analysis. In this paper, we present an automated approach for cortical surface extraction from MR images based on 3D image morphology, connected component labeling and edge detection. Our technique allows for real-time processing of MR scans – an average study of 102 slices, each 512x512 pixels, takes approximately 768 ms to process (about 7 ms per slice) with known parameters. To automate the process of tuning the algorithm parameters, we developed a genetic algorithm for this task. Experimental study performed using real-life MR brain images revealed that the proposed algorithm offers very high-quality cortical surface extraction, it works in real-time, and it is competitive with the state of the art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In biomedical research, acquiring an image of large areas or a whole section is highly desirable, while retaining microscopic resolution. Scanning Electron Microscopy (SEM) typically provides resolution of 2 nm to 10 nm, but SEM is limited in terms of throughput, which leads to a small Field Of View (FOV) of typically 5x5µm. The Multi-Beam Scanning Electron Microscope (MBSEM) is developed to keep resolution while increasing the throughput, by using 196 parallel beams. All individual beams create one tile in a big composite image. We need to stitch the individual tiles. Usually, this is done based on information in the images itself, but that takes a lot of computation time. We developed a real-time MBSEM image stitching algorithm for our experimental setup, based on prior information about the position and contrast of the tiles. First, a calibration that is performed before starting the scan provides an estimation of the corner positions of the tiles with respect to the sample. In order to map tiles to the desired output frame, we apply the affine transformation which uses the coordinate information from the calibration step. Finally, blending is applied in the overlap regions generating a seamless composite image. Our fully automated, high-speed algorithm demonstrates that our method is very robust for illumination, rotation and zoom of the images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep Neural Networks (DNNs) have emerged as the reference processing architecture for the implementation of multiple computer vision tasks. They achieve much higher accuracy than traditional algorithms based on shallow learning. However, it comes at the cost of a substantial increase of computational resources. This constitutes a challenge for embedded vision systems performing edge inference as opposed to cloud processing. In such a demanding scenario, several open-source frameworks have been developed, e.g. Caffe, OpenCV, TensorFlow, Theano, Torch or MXNet. All of these tools enable the deployment of various state-of-the-art DNN models for inference, though each one relies on particular optimization libraries and techniques resulting in different performance behavior. In this paper, we present a comparative study of some of these frameworks in terms of power consumption, throughput and precision for some of the most popular Convolutional Neural Networks (CNN) models. The benchmarking system is Raspberry Pi 3 Model B, a low-cost embedded platform with limited resources. We highlight the advantages and limitations associated with the practical use of the analyzed frameworks. Some guidelines are provided for suitable selection of a specific tool according to prescribed application requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fundus image processing is getting widely used in retinopathy detection. Detection approaches always proceed to identify the retinal components, where optic disk is one of the principal ones. It is characterized by: a higher brightness compared to the eye fundus, a circular shape and convergence of blood vessels on it. As a consequence, different approaches for optic disk detection have been proposed. To ensure a higher performing detection, those approaches varied in terms of characteristics set chosen to detect the optic disk. Even the performances are slightly different, we distinguish a significant gap on the computational complexity and hence on the execution time. This paper focuses on the survey of the approaches for optic disk detection. To identify an efficient approach, it is relevant to explore the chosen characteristics and the proposed processing to locate the optic disk. For this purpose, we analyze the computational complexity of each detection approach. Then, we propose a classification approach in terms of computational efficiency. In this comparison study, we distinguish a relation between computational complexity and the characteristic set for OD detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although dynamic adaptive video streaming over http (DASH) has developed as a most subtle technology that can be used for the transmission of live and on-demand audio and video content over any IP network, the design of video segment size is an important aspect, as it varies from one technology to another. We proposed a method to investigate the effect of changing the buffer size, as it was configured to be dynamically adapted to the segment size. Our proposed method also retrieves the most appropriate video representation based on the available bandwidth compared to the size of the video representations. We expose an empirical study for different segment sizes (i.e. 1,2,5,10,15 and 20 seconds) striving for the best available quality. An objective evaluation was carried out in relation to study the impact of the segment size while streaming video. From the tests carried out, the larger segment size, the better PSNR value; however, it produces higher Initial delay. In our obtained results, segment size of 20 seconds has the highest PSNR value at 45.7dB; whereas segment size of 1 second has the lowest initial delay at 1.2 seconds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper exploits the video camera available on-board vehicles for public transport, such as trains, coaches, ferryboats, and so on, to implement advanced services for the passengers. The idea is implementing not only surveillance systems, but also passenger services such as: people counting, smoke and/or fire alarm, automatic climate control, e-ticketing. For each wagon, an embedded acquisition and processing unit is used, which is composed by a video multiplexer, and by an image/video signal processor that implements in real-time algorithms for advanced services such as: smoke detection, to give an early alarm in case of a fire, or people detection for people counting, or fatigue detection for the driver. The alarm is then transmitted to the train information system, to be displayed for passengers or the crew staff.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper discusses a real-time kinematic system for accurate geolocalization of images, acquired though stereoscopic cameras mounted on a robot, particularly a teleoperated machinery. A teleoperated vehicle may be used to explore an unsafe environment and to acquire in real-time stereoscopic images through two cameras mounted on top of it. Each camera has a visible image sensor. For night operation, or in case temperature is an important parameter, each camera can be equipped with both visible and infrared image sensors. One of the main issues for telerobotic is the real-time and accurate geolocalization of the images, where an accuracy of few cm is required. Such value is much better than that that provided by GPS (Global Positioning System), which is in the order of few meters. To this aim, a real-time kinematic system is proposed which acquires the GPS signal of the vehicle, plus through an RF channel, the GPS signal of a reference base station, geolocalized with a cm-accuracy. To improve the robustness of the differential GPS system, also the data of an Inertial Measurement Unit are used. Another issue addressed in this paper is the real-time implementation of a stereoscopic image-processing algorithm to recover the 3D structure of the scene. The focus is on the 3D reconstruction of the scene to have the reference trajectory for the actuation done by a robotic arm with a proper end-effector.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the paper a hybrid underwater drone maneuvering front-end, joining background subtraction and stereovision is presented. Novel formulation of the median based background subtraction allows for fast and reliable foreground/background scene segmentation based on drone-environment relative movement analysis. The following stereovision block performs matching of the foreground objects detected by the background subtraction module. Based on this, information can be provided to the drone on relative distance to the nearest objects in order to avoid collisions. The system does not assume any prior calibration and can operate in real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, in the field of person re-identification, the commonly used supervised learning algorithms require a large size of labelled sample, which is not conducive to the model promotion. On the other hand, the accuracy of unsupervised learning algorithms is lower than supervised algorithms due to the lack of discriminant information. To address these issues in this paper, we make use of a small size of labelled sample to add discriminant information in the basic dictionary learning. Moreover, the sparse coefficients of dictionary learning are decomposed into a projection problem of the original features, and the projection matrix is trained by labelled samples, which is transformed into a metric learning problem. It thus integrates the advantages of the two methods through combining dictionary learning and metric learning. After the data is trained, a new projection matrix is used to project the unlabeled features into a new feature subspace and the labels of the samples are reconstructed. The semi-supervised learning problem is then transformed to a supervised learning problem with a Laplace term. Experiments on different public pedestrian datasets, such as VIPeR, PRID, iLIDS and CUHK01, show that the recognition accuracy of our method is better than some other existing person reidentification methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, the level of urbanization in China has exceeded 50% and the number of car ownership has reached 140 million. The consequent problem of traffic congestion has become increasingly prominent. It is increasingly important that how to get the basic vehicle information in real time and accurately so that the traffic department can timely manage the vehicles of the specific road sections and intersections. At present, some related methods and algorithms have high real-time performance, but the accuracy is not high or the contrary. Accordingly, this paper proposes a method of automatic vehicle detection based on YOLOV2 framework which has both real-time and accuracy. The method improves the YOLOv2 framework model, optimizes the important parameters in the model, expands the grid size, and improves the number and sizes of anchors in the model, which can automatically learn the vehicle features and realize real-time and high-precision vehicle automatic detection and vehicle class identification. The evaluation on home-made dataset shows that compared with YOLOv2 and Faster RCNN, the accuracy rate is raised to 91.80 %, the recall rate to 63.86 %.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Focused on the issue that the person re-identification across non-overlapping camera views and the high dimensional features extracted from the images, a novel person re-identification algorithm is proposed. The algorithm obtained the semantic information of each camera view by the sparse learning, and then the Canonical Correlation Analysis (CCA) is used to carry out the high-level feature projection transformation. The algorithm aims to avoid the curse of dimensionality caused by the high dimensional feature operation via improving the feature matching ability. To the end, the characteristic distance between different views can be compared. The advantages of this method is to learn the robust pedestrian image feature representation and it also builds person re-identification model with block structure feature of pedestrian dataset, and the associated optimization problem is solved by utilizing the alternating directions framework in order to improve the performance of person re-identification. At last, the experimental results show that the proposed method has higher recognition efficiency on three benchmark datasets of the PRID 2011, iLIDS-VID and VIPeR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel discriminative deep transfer learning method called DDTML is proposed for Cross-scenario Person Reidentification( Re-ID). Using a deep neural network, DDTML learns a set of hierarchical nonlinear transformations for Cross-scenario Person Re-identification by transferring discriminative knowledge from the source domain to the target domain. Meanwhile, taking account of the inherent characteristics of Re-ID data sets, in order to reduce the distribution divergence between the source data and the target data, DDTML minimizes a new maximum mean discrepancy based on Class Distribution called MMDCD at the top layer of the network. Experimental results on widely used Re-identification datasets show the effectiveness of the proposed classifiers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.