PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12317, including the Title Page, Copyright information, Table of Contents, and Conference Committee Page.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We review the recent deep learning reconstruction algorithms for spectral snapshot compressive imaging (SCI), which used a single shot measurement to capture the three-dimensional (3D, x, y, λ) spectral image. Recent years, deep learning has been the dominant algorithm to conduct reconstruction due to high speed and accuracy. Various frameworks such as end-to-end neural networks, deep unfolding, plug-and-play networks have been developed. Furthermore, the untrained neural networks have also been used. In this paper, we review diverse deep learning methods for spectral SCI. In addition to the aforementioned frameworks, different backbones and network structures including the most recent Transformers are reviewed. Simulation and real data results are presented to compare these methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single-pixel imaging uses a single-pixel detector to capture all photons emitted from the two-dimensional scene, and then calculates and reconstructs the two-dimensional target scene image from the one-dimensional measurement data through single-pixel reconstruction methods (such as linear superposition, compressed sensing or deep learning) based on the one-dimensional acquisition data and the corresponding illumination coding. Compared with traditional cameras, single-pixel imaging has the advantages of high signal-to-noise ratio and wide spectrum. Due to these advantages, single-pixel imaging has been widely used in multispectral imaging. However, the traditional single-pixel image reconstruction methods have some disadvantages, such as low resolution, huge time consuming and poor reconstruction quality. In this paper, we propose a single-pixel image reconstruction method based on neural network. Compared with the traditional single-pixel image reconstruction method, this method has better reconstruction quality at lower sampling rate. Specifically, in this model, we first use a small optimized-patterns to simulate a single-pixel camera to sample the image to obtain the measured values, and then extract multi-channel high-dimensional semantic features from the sampled values through a high-dimensional semantic feature extraction network. Then, the multi-scale residual network module is used to construct the feature pyramid up-sampling module to up sample the high-dimensional semantic features. In the training process, the network parameters and pattern are jointly optimized to obtain the optimal network model and pattern. With the help of large-scale and pre-training, our reconstructed image has higher resolution, shorter reconstruction time and better reconstruction quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fourier single-pixel imaging (FSI) acquisition time is tied to the number of modulations. FSI has a tradeoff between efficiency and accuracy. This work reports a mathematical analytic tool for efficient sparse FSI sampling. It is an efficient and adjustable sampling strategy to capture more information about scenes with reduced modulation times. Specifically, we first conduct the statistical importance ranking of Fourier coefficients of natural images. We design a sparse sampling strategy for FSI with a polynomially decent probability of the ranking. The sparsity of the captured Fourier spectrum can be adjusted by altering the polynomial order. We utilize a compressive sensing (CS) algorithm for sparse FSI reconstruction. From quantitative results, we have obtained the experiential rules of optimal sparsity for FSI under different noise levels and at different sampling ratios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fringe projection profilometry using a uniaxial MEMS micro-vibration mirror is becoming popular in three-dimensional (3D) reconstruction owing to the advantages of fast projection, small size, low cost, and no demand of focus optics. The calibration method is crucial and directly affects the accuracy of 3D reconstruction. In conventional phase-height calibration methods, there exists a problem of recalibration of system parameters if the maximum fringe frequency varies between the stages of calibration and reconstruction. In this paper, the fringe projection is realized by a MEMS mirror with a 1.15 kHz resonant frequency and a line laser. The voltage of line laser is modulated according to the scanning position, which is related to the vibration characteristics of MEMS mirror. Subsequently, the uniaxial MEMS-based 3D reconstruction system is constructed. We propose a novel calibration method for the uniaxial MEMS-based 3D reconstruction system. The proposed calibration method is derived from the scanning characteristics of a uniaxial vibration mirror and considers the camera distortion. The proposed method is free from the problem of recalibration and the limitations of installation. The experimental results show the proposed method can reconstruct the 3D shape of target in high resolution and verify the feasibility of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning shows great potential for super-resolution microscopy, offering biological structures visualization with unprecedented details and high flexibility. An effective pathway toward this goal is structured illumination microscopy (SIM) augmented by deep learning because of its ability to double the resolution beyond the light diffraction limit in real-time. Although the deep-learning-based SIM technique works effectively, it is generally a black box that is difficult to explain the latent principle. Thus, the generated super-resolution biological structures contain unconvinced information for clinical diagnosis. This limitation impedes its further applications in safety-critical fields like medical imaging. In this paper, we report a reliable deep-learning-based SIM technique with uncertainty maps. These uncertainty maps characterize imperfections in various disturbances, such as measurement noise, model error, incomplete training data, and out-of-distribution testing data. Specifically, we employ a Bayesian convolutional neural network to quantify uncertainty and explore its application in SIM. The backbone of the reported neural network is the combination of U-net and Res-net with three low-resolution images from different structured illumination angles as inputs. The outputs are high-resolution images with double resolution beyond the numerical aperture and the pixel-wise confidence intervals quantification of reconstruction images. A series of simulations and experiments validate that the reported uncertainty quantification framework offers reliable uncertainty maps and high-fidelity super resolution images. Our work may promote practical applications of deep-learning-based super-resolution microscopy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Poor lighting conditions in the real world may lead to ill-exposure in captured images which suffer from compromised aesthetic quality and information loss for post-processing. Recent exposure correction works address this problem by learning the mapping from images of multiple exposure intensities to well-exposed images. However, it requires a large number of paired training data, which is hard to implement for certain data-inaccessible scenarios. This paper presents a highly robust exposure correction method based on self-supervised learning. Specifically, two sub-networks are designed to deal with under- and over-exposed regions in ill-exposed images respectively. This hybrid architecture enables adaptive ill-exposure correction. Then, a fusion module is employed to fuse the under-exposure corrected image and the over-exposure corrected image to obtain a well-exposed image with vivid color and clear textures. Notably, the training process is guided by histogram-equalized images with the application of histogram equalization prior (HEP), which means that the presented method only requires ill-exposed images as training data. Extensive experiments on real-world image datasets validate the robustness and superiority of this technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to limited spatial bandwidth, one has to compromise between large field of view and high spatial resolution in both photography and microscopy. This dilemma largely hampers revealing fine details and global structures of the target scene simultaneously. Recently, a mainstream method is formed by utilizing multiple sensors for synchronous acquisition across different sub-FOVs with high resolution and stitching the patches according to the spatial position of the cameras. Various inpainting algorithms have been proposed to eliminate the intensity discontinuities, but conventional optimization methods are prone to misalignment, seaming artifacts or long processing time, and thus unable to achieve dynamic gap elimination. By taking advantage of generative adversarial networks (GANs) on image generation and padding, we propose a conditional GAN-based deep neural network for seamless gap inpainting. Specifically, a short series of displaced images are acquired to characterize the system configuration, under which we generate patch pairs with and without gap for deep network training. After supervised learning, we can achieve seamless inpainting in gap regions. To validate the proposed approach, we apply our approach on real data captured by large-scale imaging systems and demonstrate that the missing information at gaps can be retrieved successfully. We believe the proposed method holds potential for all-round observation in various fields including urban surveillance and systems biology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Broadband multispectral filter array (BMSFA) has emerged as an attractive alternative for spectral imaging due to its compactness and high light throughput. It compresses the multispectral data cube to 2D measurements, and then we reconstruct the data cube from the measurements with its pre-calibrated spectral response. In practice, the BMSFA spectral response is usually calibrated using ultra-narrow filters along wavelength with high cost. In addition, the process introduces noise and inter-spectral crosstalk that would severely degrade reconstruction quality. In this work, we report a novel calibration technique using deep learning called deep calibration. The technique generates different spectral illumination and collects sets of measurements of BMSFA camera and corresponding true spectra. In this way, a more accurate characterization of BMSFA spatial-spectral modulation can be obtained. Furthermore, a reconstruction network following hybrid CNN-Vit architecture was employed to learn the demodulation process from the collected dataset. Then, using this network as a decoder, the scene’s hyperspectral data can be accurately reconstructed from the measurements. Extensive experiments validated that the reported technique performs with high efficiency and accuracy in both calibration and reconstruction of BMSFA.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a novel joint compressive imaging system, which combines the merit of Single Pixel Camera (SPC) and Coded Aperture Snapshot Spectral Imaging (CASSI) system. This enables us to capture multi- or hyperspectral information with a single pixel detector. The desired 3D image cube is reconstructed by a concatenation of deep-unfolding-based algorithm and plug-and-play algorithm with deep-learning-based denoiser. We demonstrate the feasibility of the proposed system in both simulation and experiments. With advanced algorithms, the joint compressive imaging system is able to output comparable hyperspectral images with existing SD-CASSI system. Moreover, by adapting ultra-broad-spectrum photodiodes, the proposed system can be easily expanded to Near- and Mid-infrared band and thus being a low-cost approach to IR spectroscopy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Metasurfaces and metalenses have drawn great attentions since they can manipulate wavefront versatilely with a miniaturized and ultrathin configuration. Here we propose and numerically verify a tunable bifocal metalens with two continuous-zoom foci. This device utilizes two cascaded and circle layers of metasurfaces with different phase distributions for incidences of opposite helicities imparted on each layer by the combination of geometric phase and propagation phase. By relative rotation of both layers, focal lengths of both foci can be tuned continuously with the zoom range for each focus designed deliberately, and the relative intensity of both foci can be adjusted by changing the polarization state of incidence. The proposed device is anticipated to be applied in polarization imaging, depth estimation, multi-plane imaging, optical data storage, and so on.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This conference was prepared for the Optoelectronic Imaging and Multimedia Technology IX conference at Photonics Asia, 2022.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Robust infrared small target detection is of great essence for infrared search and track system. To detect the low signal-to-clutter ratio (SCR) target under the interference of high-intensity structural background, we propose an infrared small target detection method using multidirectional derivative and local contrast difference (MDLCD). Noting that infrared small target tends to have 2D Gaussian-like shape, we present a new multidirectional derivative model to reflect this distribution in each direction, which effectively enhances the target. Additionally, the adjacent background is applied to construct the local contrast difference model, whose role is to further suppress the high-intensity structural clutters. After this, the MDLCD map is obtained by weighting the above two filtered maps, along with an adaptive segmentation operation to finally extract the target. Experimental results verify that MDLCD achieves satisfactory performances in terms of SCR gain (SCRG) and background suppression factor (BSF).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Defect inspection is indispensable process in manufacturing, and automatic optical inspection (AOI) has been rapidly applied to various areas. In AOI, artificial intelligence (AI) based deep learning methods are more and more advantageous in many fields. However, obtainment of high-performance deep learning algorithms always requires a large amount of training data, while defect samples are often scarce. So small sample has become one of the key problems in the industrial application of deep learning algorithms. Transfer learning enable us to utilize the knowledge of source domains to improve performance on target domain, which could be used to tackle the small sample problem intuitively. Therefore, this paper proposes a defect inspection network which is based on one of the transfer learning techniques: domain adaptation. We name the network as multi-source and multi-scale weighted domain adaptation network which is based on adversarial learning. Firstly, three adversarial domain adaptation modules are proposed to align feature distributions between multi-source domains and target domain under three scales, which make the backbone extract domain-invariant features. Simultaneously, the weights of domain adaptation module under each scale are set reasonably. Secondly, in order to reduce the effect of negative transfer, a novel similarity weight is proposed, which is applied on domain adaptation modules. Finally, experiments are carried out to prove the effectiveness of our method. The results show that our method can improve the mean average precision(mAP) from 62.3 to 78.5 in the case of 40 samples available for 4 defect categories, which surpasses other counterparts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The neural radiance field (NeRF) constructs an implicit representation function to substitute the traditional 3D representation, such as point cloud, mesh, and voxels, leading to consistent and efficient image rendering at desired observing spatial position. However, NeRF requires dense sampling in 3D space to build the continuous representation function. The huge amount of sampling points occupies intensive computing resources, which hinders NeRF from being integrated into the lightweight system. In this paper, we present a learning-based sampling strategy, which conducts dense sampling in regions with rich texture and sparse sampling in other regions, extremely reducing the computation resources and accelerating the learning speed. Furthermore, to alleviate the additional computation overhead caused by the proposed sampling strategy, we present a distributed structure to conduct the sampling decision individually. The distributed design releases the computation burden on the devices, which enables the deployment of the proposed strategy to the practical systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This conference was prepared for the Optoelectronic Imaging and Multimedia Technology IX conference at Photonics Asia, 2022.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Asteroid detection is of great significance for studying the origin and evolution of the solar system, collision warning, space resource development, etc. through the networking of micro-satellite constellation, large-scale and high-frequency observations of asteroids can be achieved. Aiming at the above requirements, through the collaborative optimization of imaging chains, this paper develops a 5kg high-performance visible camera for weak space target detection, which has the detection ability better than 13 Mv, which is equivalent to detecting a target with a diameter of 10cm at a distance of 1000km, and can be carried on a 10kg micro-satellite. Through the networking of micro-satellites, give play to the advantages of cluster detection, realize the high-sensitivity and high-frequency detection of asteroids, and provide high real-time data support for asteroid research and early warning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problem of inaccurate scale optimization of visual inertial odometer (VIO) algorithm under uniform motion, this paper presents a Visual-Inertial-Encoder Tightly-Coupled Odometry (VIETO) algorithm, and describes VIETO initialization as an optimal estimation problem in the sense of maximum-a-posteriori (MAP) estimation. Firstly, the pre-integration theory of encoder is introduced in this paper so that the scale and velocity information can be obtained by using the encoder to measure the pre-integration during the visual MAP estimation, which provides a good initial value for the optimal estimation of IMU parameters. Secondly, the encoder error term and random plane constraint are introduced into the visual inertia optimization framework to further constrain pose estimation. Finally, we apply VIETO to the monocular inertial ORB-SLAM3 system. By comparing the algorithm with other similar algorithms on the DS dataset, the results prove the effectiveness of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is difficult for normal CCD or CMOS camera to obtain high quality images under extremely low-light conditions for example the new moon or the quarter moon because the photons arriving at the detector are so few that signal to noise ratio (SNR) is much lower than what is necessary to resolve finer details in the nighttime scenario. To solve this problem, the intensified CCD or CMOS camera is adopted and the few photons is amplified to improve the SNR a lot. However, the intensifier is mainly composed of the cathode, MCP (Micro-channel-plate) and fluorescent screen and this complex structure and the multiple photoelectric conversion during the photon amplification process will lead to a big equivalent pitch size, which degrades the spatial resolution. Therefore in this manuscript, by improving the classical iterative back projection (IBP) algorithm a super-resolution reconstruction algorithm is proposed. By fusing multiple quite noisy lowlight images having sub-pixel displacements between each other, both the spatial resolution and the SNR could be enhanced. In the in-lab experiments, the spatial resolution can be increased to nearly 1.8 times the original one. Besides that, the increment in SNR bigger than 6dB and 9dB could be obtained for the quarter moon and the new moon light condition respectively. The out-door experiments show the similar results and besides that by fusing sub-pixel shifted low-light images corresponding to different low-light conditions together, the reconstructed high-resolution images will have even better visual performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This conference presentation was prepared for the Optoelectronic Imaging and Multimedia Technology IX conference at Photonics Asia, 2022.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Learner’s Emotional state has an important impact on affective and cognitive processes for classroom teaching. It is quite necessary to detect learner’s emotion state unconsciously in teaching and learning processes. A fast facial expression recognition algorithm is presented to detect the emotional state of the learner in real learning environment. Gabor convolutional network (GCN) is used to classify the facial expression. The image extracted from teaching and learning environment need to be preprocessed for accelerating the expression recognition. A skin color segmentation model, generalized Gaussian mixture distribution (GGMD), is designed by using expectation and maximization (EM) algorithm to detect the facial area rapidly. Then a fast facial expression recognition algorithm is designed by using the skin color model and the GCN. Experiment results show the satisfied accuracy and excellent time performance of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the wide application of Colored Point Clouds(CPC), the amount of data is increasing, and efficient compression is required in practical applications. The most advanced compression technology for static CPC is the G-PCC proposed by MPEG. However, quantification errors from G-PCC-1 (Octree) can cause grid hole distortion, which can have a serious impact on the quality of user's visual perception. For this reason, this paper proposes a Point Cloud Projection based light-to-medium G-PCC-1 Hole Distortion Repair method (denoted as P-GHDR) for CPC. The distorted CPC is projected from 3D space to 2D plane, and the G-PCC-1 distorted is repaired by combining multi-view color and geometric projection maps. Finally, the repaired CPC is reconstructed by reverse projection. Experiments show that the proposed method can effectively improve the geometric and visual objective metrics of G-PCC-1 coded CPC, and can significantly improve the quality of CPC reconstructed with light-to-medium G-PCC-1 codes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spectral imaging technology has been widely applied in many fields such as remoting sensing, biology, food testing, etc. However, most traditional imaging spectrometers have to scan in either spatial or spectral domain to obtain the full spatial-spectral information, which sacrifice the temporal resolution. In comparison, snapshot spectral imagers capture a 2D spatial and 1D spectral data cube in a single frame, which have great advantages in monitoring the spectrum of dynamic targets. In this paper, we report a video-rate hyperspectral imager with 60 bands based on a telecentric light field imaging system (LFIS). We analyze the degradation of the spectral image quality caused by the chromatic aberration of the fore-optics and propose a tighter constraint on the design metric of the chromatic focal shift. Then we design and customize the fore-optics that can fulfill the constrain. The spectral filter array (SFA) is densely arranged and coated on the sapphire glass to improve the number of spectral channels and eliminate the effect of uneven glue. Imaging experiments are conducted to demonstrate the capability of acquiring the spectral information of the dynamic scene using our designed video imager. The dynamic imaging experiment has been performed on a burning flame scene with a maximum acquisition rate of 50fps. The results indicate that our prototype has important application prospects in target identification, moving target tracking, dynamic spectrum monitoring, etc.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color is one of the most features of vehicles and can be used for vehicles recognition. Deep learning has greater advantages for vehicle color recognition over traditional algorithms. In this paper, we present a vehicle color recognition method using GoogLeNet with Inception v1. Inception v1 increases the width and depth of the network, reduces the parameters to save computing resources and uses sparse matrix to avoid redundancy of traditional neural network as well. We use a publicly dataset to train and validate GoogLeNet and a self-made dataset to test the method. The method can recognize regular eight kinds of vehicle colors and the probability is stable at 90%-95%. Afterward, we have a discussion on how the impact of different datasets on the method as well as the possible reasons. In the future, we will combine GoogLeNet and Yolo network structure to research vehicle color recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Dental oral disease is one of the most prevalent diseases worldwide, as of a medical analysis in The Lancet 2022[1]. The most common oral diseases worldwide are dental caries (cavities), periodontal disease, tooth loss, and overdevelopment of the jaw caused by excessive unilateral chewing. Dental radiography plays a very important role in clinical diagnosis, treatment and surgery. Automatic segmentation of medical lesions is a prerequisite for efficient clinical analysis. Therefore, accurate positioning of anatomical landmarks is a crucial technique for clinical diagnosis and treatment planning. In this paper, we propose a novel deep network to detect anatomical landmarks. Our proposed network consists of a multi-scale feature aggregation module for channel attention and a deep network for feature refinement. To demonstrate the superiority of our network, training comparisons with several popular networks are performed on the same dataset. The end result is that our network outperforms several popular networks today in both mean radial error (MRE) and successful detection rate (SDR).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spectral imaging can capture 2D spatial and 1D spectral information of target scene. This 3D data has important applications in wide range of fields, including military, medicine, and agronomy. Spectral imager combined with compressive sensing can significantly reduce the amount of detection data and detection time, so it has been widely studied. Coded aperture snapshot spectral imager (CASSI) is the first spectral imager that combines compression sensing theory. However, it uses dispersion prism, which makes the system very complex, to encode the incident light. In this paper, a spectral imager using dual spectral filter array to encode the incident light is proposed, and it avoids the use of dispersion elements. Dual spectral filter array is divided into a series of macro pixels, which is composed of 3×3 filters. The macro pixel of the first filter is composed of three low-pass filters, three band-pass filters, and three high pass filters. The macro pixel of the second filter is composed of 9 filters with different transmission curves to archive the coding. In addition, we add a beam splitter in front of the objective lens to divide the optical path into two paths, one as the detection arm for spectral imaging, and the other as the reference arm to improve the recovery effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Metasurfaces, composed of two-dimensional arrays of subwavelength optical scatterers, are regarded as powerful substitutes to conventional diffractive and refractive optics. In addition, metasurfaces with powerful wavefront manipulation capabilities can steer the phase, amplitude, and polarization of light, which provides the potential to joint optimization with algorithms by encoding and decoding the light fields. In this paper, we propose an end-to-end computational imaging system which is joint optimized of metaoptics and neural networks based on the designed initial phase. We construct the forward model of the unit cell to the optical response and the inverse mapping of the optical response to the unit cell for the differentiable front-end metaoptics. Based on the appropriate initial phase, the calculation of the framework would converge faster, and the proposed system will promote the further development of metaoptics and computational imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The development of large-aperture telescopes employing monolithic mirrors has been greatly limited by technical constraints and the difficulty of processing and manufacturing. The sparse aperture imaging system employing multiple small-sub apertures arranged and combined onto a co-phasing surface can achieve the equivalent resolution to the fullyfilled aperture system, which brings new research ideas for astronomical observation and ground survey. However, the sparsity of apertures will result in blurred imaging. In this paper, we focus on the high-resolution imaging from the geostationary orbit and propose a restoration method for blurred images obtained by the sparse aperture system with a 12- sub-aperture annular-like structure. A SASDeblurNet, containing U-shaped structures and skip connections, is proposed to rapidly restore blurred images end-to-end. MAE, MSE, DSSIM, Charbonnier, and edge loss functions are attempted to train a small amount of data sets in anticipation of better imaging results. The simulation results show that the image restored by the proposed method improves the PSNR by an average of 11 dB and the SSIM of the restoration image improves from 0.77 to 0.94, achieving a high resolution comparable to that of a full-aperture optical system. Compared with traditional non-blind deconvolution algorithms, SASDeblurNet can effectively remove the effect of artifacts. Our work shows that the proposed method has good real-time performance, generalization ability, and noise immunity, which can provide the corresponding data support for on-orbit and real-time observation of sparse aperture imaging systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image super-resolution technology successfully overcomes the limitation of excessively large pixel size in infrared detectors and meets the increasing demand for high-resolution infrared image information. In this paper, the superresolution reconstruction of infrared images based on a convolutional neural network with a priori for high frequency information is reported. The main network structure is based on residual blocks, BN blocks that are not suitable for the super-resolution task are removed. The introduction of residual learning reduces computational complexity and accelerates network convergence. Multiple convolution layers and deconvolution layers respectively implement the extraction and restoration of the features in infrared images. images are divided into high frequency and low frequency parts. The low frequency part is the image of down-sampling, while the high frequency information is obeyed a simple case-agnostic distribution, which is equivalent to having a prior of high frequency information for the super-resolution network, Which is captures some knowledge on the lost information in the form of its distribution and embeds it into model’s parameters to mitigate the ill-posedness. Compared with the other previously proposed methods for infrared information restoration, our proposed method shows obvious advantages in the ability of high-resolution details acquisition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional hyperspectral imagers rely on scanning either the spectral or spatial dimension of the hyperspectral cube with spectral filters or line-scanning which can be time consuming and generally require precise moving parts, increasing the complexity. More recently, snapshot techniques have emerged, enabling capture of the full hyperspectral datacube in a single shot. However, some types of these snapshot system are bulky and complicated, which is difficult to apply to the real world. Therefore, this paper proposes a compact snapshot hyperspectral imaging system based on compressive theory, which consists of the imaging lens, light splitter, micro lens array, a metasurface-covered sensor and an RGB camera. The light of the object first passes through the imaging lens, and then a splitter divides the light equally into two directions. The light in one direction pass through the microlens array and then the light modulation is achieved by using a metasurface on the imaging sensor. Meanwhile, the light in another direction is received directly by an RGB camera. This system has the following advantages: first, the metasurface supercell can be well designed and arranged to optimize the transfer matrix of the system; second, the microlens array guarantee that the light incident on the metasurface at a small angle, which eliminate the transmittance error introduced by the incidence angle; third, the RGB camera is able to provide side information and help to ease the reconstruction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spectral imaging can simultaneously capture the spatial and spectral data of target objects, and provide multidimensional technique for analysis and recognition in many fields, including remote sensing, agriculture and biomedicine. To increase the efficiency of data acquisition, compressed sensing (CS) methods have been introduced into spectral imaging systems, especially single-pixel spectral imaging systems. However, the traditional CS single-pixel spectral imaging system is not stable enough and has complex structure, so we propose a novel macro-pixel segmentation method based on broadband spectrum multispectral filter arrays. In this system, structural illumination and broadband multispectral filter arrays are used to generate spatial modulation and spectral modulation respectively, to modulate 3-D data cube of a scene. The macro-pixel units of the patterns are aimed to capture spatial information, and the sub regions in each macropixel unit are aimed to capture spectral information. The filter arrays can be designed and processed according to specific requirements. By changing the number of sub regions of each macro-pixel unit and the transmittance curve of each sub region, the imaging spectrum can be flexibly changed, and the anti-noise performance of the system can be greatly improved. CS algorithm is used to effectively recover 3-D data cube from one-dimensional signal collected by single-pixel detector. Compared with array detectors (e.g. CCD or CMOS), single-pixel detectors have potential in invisible band and low light applications. Besides, without mechanical or dispersive structure, our strategy has great advantages in miniaturization and integration of spectral imaging equipment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image correlations is an important computer vision technique for object detection localization applications, take superresolution optical imaging for example. The cosine similarity is an easy-to-used technique for correlation calculation. In this paper, we show that fast algorithm is available for cosine similarity and the performance of cosine similarity can be affected by background. To avoid the effect of background, gradient of the object image can be employed for robust object detection and localization. Simulation shows that the proposed cosine similarity image correlation algorithm results in high-quality correlation map for noisy image with strong background, which makes it attractive for high-performance object detection and localization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Array cameras can effectively balance the contradiction between large field of view and high resolution in existing imaging systems, and can effectively solve the problem of high-density crowd counting in large scenes such as stadiums and parks. In order to realize the intelligence of the array camera, a crowd counting method of the array camera based on yolov5 is proposed, which is of great significance to realize the intelligence of public security management. Through the rtsp video stream input by the array sub-camera, the position coordinate conversion relationship between the sub-image and the large image is established, and then the panoramic image is output in real time to meet the needs of video surveillance. Then preprocess each rtsp video stream, use the tensorrt-accelerated yolov5 target detection method to identify the target in each sub-image, and convert the target coordinates into the coordinates of the large image, and use the NMS method to filter out duplicate targets in the overlapping area of each sub-image. , and finally summarize the results to output the target number. The system can be deployed on Win10, Linux and embedded system, working reliably with high precision to meet the practical application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Transfer learning is a promising method for AOI applications since it can significantly shorten sample collection time and improve efficiency in today’s smart manufacturing. However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate the lack of samples. This research applies relational-based TL via domain similarity to improve the overall performance and data augmentation in both target and source domains to enrich the data quality and reduce the imbalance. Given a group of source datasets from similar industrial processes, we define which group is the most related to the target through the domain discrepancy score and the number of samples each has. Then, we transfer the chosen pre-trained backbone weights to train and fine-tune the target network. Our research suggests increases in the F1 score and the PR curve up to 20% compared with TL using benchmark datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image super-resolution technology successfully overcomes the limitation of excessively large pixel size in infrared detectors and meets the increasing demand for high-resolution infrared image information. In this paper, the superresolution reconstruction of infrared images based on a convolutional neural network with a priori for high frequency information is reported. The main network structure is based on residual blocks, BN blocks that are not suitable for the super-resolution task are removed. The introduction of residual learning reduces computational complexity and accelerates network convergence. Multiple convolution layers and deconvolution layers respectively implement the extraction and restoration of the features in infrared images. images are divided into high frequency and low frequency parts. The low frequency part is the image of down-sampling, while the high frequency information is obeyed a simple case-agnostic distribution, which is equivalent to having a prior of high frequency information for the super-resolution network, Which is captures some knowledge on the lost information in the form of its distribution and embeds it into model’s parameters to mitigate the ill-posedness. Compared with the other previously proposed methods for infrared information restoration, our proposed method shows obvious advantages in the ability of high-resolution details acquisition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scanning electron microscope (SEM) with feature analysis software has been used for micro-scale surface measurement tasks for many years because of the benefits of fast massive acquisition of nano-scale features, non-contact operation, and automatic data processing. Full information of surface usually needs to obtain in some inspection fields, such as vertical engine part monitoring, cleanliness analysis, melted bead and so on. According to the specific measured feature, the depth mode, resolution mode, and analysis mode of SEM should be firstly determined before use. Therefore, it is important to give user an easy operation mode to get deeper understanding on geometric features, thus offering a significantly enhanced user experience and higher measurement accuracy. Several common aspects of operated behavior should be tested that can cause them to yield larger measurement errors. In this paper, the experimental tests of full information acquisition of multi-scale pitches and step heights samples were respectively performed on a commercial SEM. The influence of the depth mode, resolution mode, and analysis mode of SEM were also discussed on edge features. Experimental results show that our works will be helpful of others who perform similar measurements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
3D data visualization is a non-trivial effort, however high-quality data processing and visualization is crucial in all spheres of computer vision tasks, especially if our tasks include work with real environment and require precise results. Many industries can benefit from automated object detection and its analysis. Effective environment information retrieving and its digitization open up great prospects in robotics and in the design of such systems that require scene reconstruction into point clouds. This solution offers new possibilities for mixed reality systems also. For example, with restored scene data we can add a virtual light source and illuminate the room, or it becomes possible to cast reflections of virtual objects in mirrors. A breakthrough in neural networks training on point clouds occurred recently after the "PointNet" architecture implementation, and the trend in working with 3D data continues to grow. Current research is aimed at implementing the interior objects recognition and 3D reconstruction approach that works with interior scenes and low-quality incomplete information from lidars. This method enables the selection of interior objects from the scene as well as the determination of their location and dimensions. PointNet neural network architecture trained on the ScanNet dataset was used to annotate and segment the point cloud. To create a triangle grid, the neural network "Total3D understanding" was employed. As a result, was built an interior environment reconstruction method using RGB images and point clouds as input data. A simple interior of a room reconstruction example is provided, along with the result quality assessment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work is devoted to the research of applicability of depth determination methods for solving problems of building 3D models of rooms. The authors propose a combined method of finding the disparity and statistical signals processing using auxiliary data, obtained due to the laser illumination of the scene. Solutions used in forming 3D models of objects or the surrounding space are considered, which led to the definition of the most appropriate method for building a scanning system – the stereo-reconstruction method. The finding of disparity by naive gradient descent method is presented. The results of the scanning system are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are two main types of mixed reality systems, these are optical see-through mixed and video see-through devices. The main difference between them is how the user observes the real-world. In the case of the optical see-through mixed reality system, the device user observes the real-world directly, and in the case of the video see-through mixed reality system, they observe the real world projected from the LCD screens to the eye retina. As a result, in the first case, the vergence of human eyes corresponds to the point where sight is directed, however in the case of the video see-through system it is not true as the direction of human sight might not correspond to the fixed direction of device cameras. Moreover, these cameras might be shifted to the top or bottom of a mixed reality device or replaced with a single RGBD camera which might result in additional discomfort. So, when observing the real-world using the video see-through mixed reality system an additional image processing is required to lower the disagreement of the vergence between device cameras and user's eyes according to the data acquired from the device eye-tracking system. In the scope of the current research, authors present the results of research of lowering the vergence disagreement based on the rough depth map restoration and processing images from device cameras as a cloud of points. Various external camera setups and corresponding vergence correction results are compared.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A study was made of the causes of the vergence-accommodation conflict of human vision in virtual and mixed reality systems. The technical and algorithmic approaches to reduce and eliminate the vergence-accommodation conflict in virtual reality systems are considered. As a technical solution, an approach was chosen that provides adaptive focusing of the eyepiece of the virtual reality system to the point of convergence of the human eyes, determined by the tracking system of his pupils. Possible algorithmic solutions are considered that provide focusing of the virtual reality image in accordance with the expected accommodation of human eyes. The main solutions are the classical solution of image filtering in accordance with the defocusing caused by natural accommodation at a given distance, and the solution in which the corresponding filtering is performed using neural network technologies. The advantages and disadvantages of the proposed solutions are considered. As a criterion of correctness, we used a visual comparison of the results of image defocusing with the solution obtained by the method of physically correct rendering using the human eye model. As a basis for physically correct rendering, the method of bidirectional stochastic ray tracing with backward photon maps was used. The paper presents an analysis of the advantages and disadvantages of the proposed solutions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The article an approach to improve the accuracy of restoring the boundaries of objects obtained to create 3D structures by The paper proposes a data processing algorithm that allows performing primary processing operations in order to identify the main parameters and fusion them into a single image. To form a complex image, it is possible to first enter the parameters selected by the operator, which corrects the mixing ratio. As a noise reduction algorithm for different ranges, the multi-criteria filtering method is used, which is based on minimizing the sum of the squared deviations of the input signal and the generated estimate, as well as the sum of the squared differences of the obtained estimates. Using the adjustment factor allows you to set the degree of influence of the criterion on the resulting processing. Using this method also allows you to detect the boundaries of objects. The search for the border is based on the analysis of frequency components and the search for sharp changes in color gradation. The possibility of applying this approach for various types of data is shown on the example of processing parallel streams. For the primary construction of areas of significance, an algorithm for changing the range of clusters and an object complexity analyzer are used. The analyzer is built on the basis of calculating the weighted value of the number of color gradient transitions per unit area. To visually improve the quality of the data, a color space conversion algorithm based on alpha mixed is used. As test data used to evaluate the effectiveness, pairs of test images are used, obtained by sensors fixed at resolution of 1024x768 (8 bit, color image) and far-IR spectrum 320x240 (8 bit, color image). Images of simple shapes are used as analyzed objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We proposed an approach for estimating the shape and geometric parameters of the observed objects from a perspective image based on typed elements, perspective geometry methods and convolutional neural networks. The proposed method uses the assumption that the object under study is rigid. A method is proposed for restoring a 3-D model of an observed object from one perspective image using reference objects and typed elements. Semantic segmentation of typed elements allows to set the photometric parameters of the coordinate system attached to the points on the image. According to the calculated photometric parameters and segmentation of the observed object in the image, its parameters and a 3-D model are estimated. The developed method is applicable for calculating 3-D models from a single perspective image in the vicinity of a road (both road and railway) infrastructure, where there are a large number of typed elements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce a new approach to reservoir computing (RC) in which single nonlinear device – semiconductor optical amplifier, replaces the entire nonlinear reservoir to perform computations. To study the performance of the proposed scheme, we use it for the benchmark prediction task of learning the Mackey-Glass chaotic attractor. Mildly chaotic attractor with tau = 17 and wilder chaotic behavior with tau = 30 are considered.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.