PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12634 including the Title Page, Copyright information, Table of Contents, and Conference Committee Page.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Optical Communication and Target Detection Technology
A phase correction method based on multi frequency heterodyne principle is proposed for the phase jump error caused by multi frequency heterodyne phase solution. The traditional phase shifting method and multi frequency heterodyne are used to calculate the wrapping phase and the unwrapping phase. Aiming at the jump error in the unwrapping phase, the unwrapping phase is used to locate the error position. The wrapping phase is used as a reference to correct the position of the positioning error to obtain a new unwrapping phase. The experimental results show that under the optimization method, this method can effectively judge the jump, and obtain the unwrapping phase without jump error after eliminating the jump, which greatly improves the unwrapping accuracy and accuracy of the fringe pattern after phase unwrapping.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The uneven distribution of light brightness on the surface of citrus spheres will make it difficult to directly partition the defect area of citrus surface. In view of the above shortcomings, a segmentation method of citrus surface defect based on brightness correction was proposed. In this method, Otsu algorithm was used to segment the background in HSI color space, and the mask template was obtained after the holes were filled. The mask template and I component were obtained by dot multiplication to obtain the I component image with the background removed. The I component image of the removed background was convolved with the established multi-scale Gaussian function to obtain the incident component image, and then the I component image after correction was obtained by point division with the incident component image. Finally, the single threshold method was used to extract citrus surface defects. The overall accuracy of this method is 94.7%, the recognition rate of normal fruit is 95%, and the recognition rate of defective fruit is 96.5%. It was found that the main reasons for misjudgment were that the color of the defect was similar to the normal fruit, the defect area was eliminated during denoising due to too small, and the convex or fold on the surface of the citrus was judged as the defect. This paper proposes an image brightness correction algorithm based on multi-scale Gaussian function, which can effectively correct citrus surface brightness and provide technical support for segmentation defects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vortex beams with orthogonal Orbital Angular Momentum (OAM) modes have potential applications in optical computing. By exploiting the learning capability of deep neural network and the complex light-field manipulation ability of multilayer diffraction layers, the spatial position of the vortex beam is manipulated by a five-layer diffraction deep neural network as the input of the logic gate. The result of the logic gate is expressed as the light intensity in the output plane. The simulation results show that the accuracy of AND gate and OR gate is 0.9912 and 0.9887 respectively, which proves that the method of implementing logic gate based on diffraction deep neural network modulated OAM is feasible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Light scattering of bubbles are critical for underwater exploration. Current methods for calculating light scattering lack computational efficiency due to the large bubble size, where the geometric-optical approximation can calculate light scattering of bubbles better than other traditional methods. The waist radius of Gaussian light is an important factor for the calculation of light scattering distribution. In this paper, the light scattering distribution of a large bubble in water with axial Gaussian beam are calculated using a geometric optical approximation, and the effect of the waist radius of the Gaussian light on the light scattering distribution were analyzed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a high precision target detection model, YOLOX still has the disadvantage of slow detection speed and is difficult to apply to the work scene with limited computing resources. Thus, an efficient target detection network, called YOLOX-Lite, which balances detection speed and detection accuracy, is proposed in this paper. Firstly, the mixed efficient channel attention module is designed to realize the adaptive refinement of spatial features and channel features in the network. Then the feature extraction ability of YOLOX-Lite network can be improved. Secondly, the optimized MobileNetv3 is used as the backbone network to replace Darknet53, so as to significantly reduce the computational complexity of the backbone network in feature extraction. Finally, efficient down_ sampler with focus is designed, which can efficiently integrate the low dimensional details in the backbone network with the high-dimensional semantic information in the neck layer. At the same time, when constructing the neck layer, the depth separable convolution is combined with PANet. It can reduce a lot of computing overhead caused by excessive multiplexing of standard convolutions. The experimental results on PASCAL VOC and TT100K datasets show that the mAP values of YOLOX Lite are 84.3% and 88.1% respectively, and the FPS value reaches 56.7. Thus, while ensuring the network detection accuracy, the detection speed of YOLOX Lite is increased by about 17% compared with the original YOLOX.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of marine resource exploitation, planktonic microorganisms have gradually become one of the research directions in the field of machine vision. In order to optimize the detection of small targets in the images of planktonic microorganisms, this paper proposes an improved YOLOv5s model to enhance the detection of planktonic microorganisms. The SE attention mechanism allows the network to pay more attention to the target feature area and suppress useless feature information. The PANet feature module is improved into a weighted bidirectional pyramidal BiFPN feature fusion network to achieve high-efficiency bidirectional cross-scale connectivity and weighted feature map fusion. The results show that the combination of the SE attention mechanism and BiFPN feature fusion improves the mAP value by 6.96%, increases the precision by 11.45%, and reduces the loss rate by 1.62%. Our proposed method effectively solves the problems of false detection, missed detection, and low detection accuracy of the existing models for small target detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, deep learning has been applied in a lot of autonomous driving, and vehicle classification based on deep learning is an important content. Although deep learning methods have proven to be more effective at classifying vehicles than traditional machine learning methods. However, in practice, due to certain similar characteristics among different types of vehicles, the accuracy of vehicle classification based on deep learning method is not high enough. In order to improve the effectiveness of deep learning in the field of vehicle classification, this paper studies from the data side. The method of this paper is to propose a novel data augmentation method according to the characteristics of vehicles and combine with ResNet34 model. After experimental verification, the results of the test set show that the classification accuracy of the ResNet34 model after data augmentation in this paper is 80.0%, higher than the classification accuracy of 75.42% without data augmentation. The above results show that the data augmentation method proposed in this paper is very effective for vehicle classification problems and can be used in conjunction with deep learning model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A modulation transfer function test system that can test miniature lenses quickly is designed, compared with the traditional pattern contrast test method. In this paper, the design of an auxiliary imaging lens applied to a modulation transfer function test system is proposed. The optical system has a long entrance pupil distance and perfect image quality to reduce the impact on test results. We built a test platform and developed modulation transfer function calculation algorithms by using image processing technology to calculate the modulation transfer function in the tangential and sagittal direction of the lens effectively and quickly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
LiDAR systems have demonstrated effectiveness for Earth observation. However, heavy weight and high cost have greatly hindered them to be widely used in large-scale scenarios. This paper provides an ultralight and low-cost UAV LiDAR system for further flexible low-altitude observation, which weighs only about 0.98 kg and is compactly integrated with multiple low-cost sensors consisting of a panoramic laser scanner, a downward-looking camera and a consumer-grade IMU, etc. Details of the developed system, data processing process and system self-calibration are introduced. Comprehensive evaluation from four aspects including planar noise, consistency among multiple strips, absolute precision and coloring accuracy is designed and tested. Road signs and rectangular targets were used to participate in experimental evaluation, and test data were obtained at two different flight altitudes (50m and 100m). Results show that the plane fitting accuracy is 1.8cm@50m and 2.4cm@100m for single strip data, and 3.7cm@50m and 4.6cm@100m for multiple strips respectively, illustrating high performance of the scanner and whole system. Consistency between multiple strips is improved from 12.7cm to 7.3cm by strip adjustment at the height of 50m. Absolute precision of the system is about 2.8cm and 4.3 cm at both flight cases, and the coloring accuracy is about 2 cm to 4 cm. The results demonstrate great potential in terms of both hardware and overall performance. We believe that the system developed in this study will have great potential applications for many fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep neural networks usually suffer from significant catastrophic forgetting when faced with class incremental tasks. Existing methods mostly use generative models to synthesize pseudo samples or save old exemplars to overcome the forgetting, but they are difficult to deal with the memory-constrained scenarios. In this paper we propose a Multi-Level Distillation and Continual Normalization (MLDCN) method which applies the framework of the exemplar-free method PASS. We first analyze the two bottlenecks in PASS: prototype mismatching problem and normalization preference for statistical properties of the current task. Hence, we propose MLDCN which contains a multi-level distillation framework to improve the model's ability to retain old knowledge. In addition, we introduce the continual normalization layer in the backbone to further enhance the stability of the model. Experimental results on CIFAR-100, ImageNet-sub show that our method can effectively alleviate the problem of catastrophic forgetting without saving old exemplars, and better preserve the knowledge of old categories in the incremental process. The performance of the proposed method outperforms many exemplar-free methods and several exemplar-replay methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Switchable surfactants are key to the stability of titanium dioxide dispersions. In this study, we used dual trap optical tweezers to measure the inter-particle force dynamics of titanium dioxide particles in dispersions with different surfactants. The results of the inter-particle force interactions were fitted using the DLVO theoretical model. The experimental results show that the interaction force on the surface of bare titanium dioxide particles was almost zero, which was consistent with the theoretical results, while the inter-particle interaction force changed significantly in the presence of different surfactants. The effects of different surfactants on the stability of their dispersions were also different: SHMP⪆SDBS⪆CTAB⪆F68⪆SDS⪆OP-10. In addition, the effects of mixed surfactants and salt solutions on the stability of titanium dioxide dispersions were also compared, and the results showed that the mixing ratios had similar stabilizing effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Kidney tumors are among the ten most common tumors in humans. Precise resection of renal tumors has become an essential means of tumor treatment. Accurate kidney segmentation in CT images is a prerequisite for surgery, and segmenting kidneys and kidney tumors is challenging. At present, most segmentation methods use traditional convolutional neural networks. This paper uses a visual transformer to replace the encoder part of the neural network and innovatively adds a new attention mechanism, encoder-decoder transformer (EDformer), to the skip connection to learn local features. We also adopted a new type of skip connection to integrate low-level semantic features with high-level semantic features as much as possible. I named our method TAU-Net3+. Based on the experimental results of CT images of 300 patients, our proposed method can detect kidney and renal tumors with the highest accuracy. The mean dice coefficients of kidney and kidney tumors obtained by this method are 0.9885 and 0.8638, respectively, which are higher than the other three advanced segmentation methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to avoid fatigue driving, the driver fatigue detection technology is studied by extracting facial fatigue feature parameters. Use the optimized SSD to extract facial features, use PFLD to detect key points of the face, and detect the key points and spatial attitude angles of the eyes, mouth, and head of the face; calculate the face fatigue feature parameters based on time series The matrix is input to GRU for fatigue driving detection. Compared with other eight methods in the case of low computing power, it has a high accuracy rate and detection speed, which meets the needs of the fatigue driving detection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image registration is an important research content in image processing technology. Aiming at the problem that traditional registration methods are difficult to accurately register low-texture images, this paper proposes a high-precision image registration algorithm based online segment features. For the case of wrong match, using the RANSAC random consistent sampling algorithm when searching for the transformation model, and then iterate the nearest points through ICP to transform the parameters further accurately. From the experimental results, the image registration method proposed in this paper is robust to scale, rotation, and illumination and the registration accuracy is improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most of the laser interfered image quality assessment algorithms need to know the reference images or partial information of reference images. However, in practical application, the reference image or its related information is difficult to obtain, which makes the application scenario of laser interference image quality evaluation algorithm is greatly limited. To solve this problem, this paper starts with the prediction processing of the obscured information and improves the Markov Random Field estimation algorithm (MRF) to realize the real-time estimation of the obscured area information. Then, proposes a non-reference image quality assessment method based on occlusion area information estimation and natural scene statistics (IENSS), which analyzes the statistical characteristics of laser interfered images in natural scenes. The model is trained by machine learning. Finally, simulation experiments are carried out to verify the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ventricular arrhythmia is a common arrhythmia diseases, which poses a severe threaten to human health. Electrocardiogram (ECG) carries plenty of pathological information about the heart activities of a patient and has been widely used for diagnosing arrhythmia. Despite some models that have been proposed so far for the automatic classification of arrhythmia by the features of the ECG, their performance will be limited, because the extracted features are relatively monotonous and simple. In order to improve the performance of arrhythmia classification, a Convolutional Neural Networks (CNN), Transformer and Long Short-Term Memory (LSTM) based assemble neural network framework named CLSTM-Transformer is proposed for automatic heartbeat classification under the inter-patient paradigm. It's worth noting that we use the transformer, which is rarely used in medical signaling to pay attention to the important heartbeats. Compared to most current single network models, CLSTM-Transformer extracts feature from three different levels, which can extract more hidden information in the heartbeats and makes the model more advantageous. The experiment results show that our model achieves a 98.56% accuracy, a 93.45% sensitivity, a 93.45% specificity, and a 98.54% positive predictive value, respectively. Compared with other models, this model has better classification performance, which makes it more applicable to the diagnosis of arrhythmia diseases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
GPS and machine vision are commonly used in traditional UAV landings. Due to the influence of meteorological environment and positioning deviation, as well as the limitation of the detection accuracy and efficiency of visual landing signs, the UAV cannot accurately land on the target position. Aiming at this problem, a novel method of UAV autonomous precise landing by machine vision was proposed. First, a "Hui" shaped UAV landing sign was designed. Then, image processing algorithm was used to identify and locate the landing sign of UAV. The relative position was calculated between the UAV and the center point of the landing sign to control the landing of UAV by PID algorithm. Finally, an autonomous landing test of UAV was conducted. The test results showed that the method could achieve the accurate landing of UAV at 40-50 meters. The average detection time of the algorithm was 3.58 ms, and the average accuracy was 102 mm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation is one of the fundamental problems in computer vision. Many learning paradigms have been proposed to alleviate the annotation burden for the semantic segmentation task, such as weakly-supervised learning, one-shot learning, and few-shot learning. However, these approaches still require collecting a large amount of labeled data for training. In this paper, we are motivated to release the model from collecting such training samples and propose to retarget on-the-shelf self-supervised models for semantic segmentation tasks. We observe that the self-supervised transformers have reasonable representations to evaluate the patch affinities and the semantic meanings. By leveraging the patch-level affinities, accurate segmentation masks can be obtained. Meanwhile, the semantic assignments can be obtained by comparing the pixel’s representations with precomputed prototypes. Assembling them together, semantic segmentation results can be derived from the model without any finetuning. With only a single example per class, our approach achieves up to 51.4% mIoU on the challenging PASCAL VOC 2012 val set without any training using masks, demonstrating the effectiveness of the approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision measurement, a fast and high precision measurement technology, has great application potential in aerospace applications such as on-orbit assembly and maintenance. The On-orbit Multi-view Photogrammetry System (OMPS) has insufficient known spatial reference information to assist camera orientation. To solve this problem, this paper proposes a method to realize the calibration of all Cameras External Parameters (CEP) of the OMPS using stars and scale bars. The method firstly establishes the imaging model of stars and scale bars, in which a relative position model is proposed to solve the reference camera position unconstrained problem. Subsequently, based on the constructed error equations of star imaging point, scale bar target imaging point and scale bar length, a multi-data fusion bundle adjustment algorithm is proposed to realize the high-precision calibration of CEP. The practical experiments show that the image plane errors of stars and scale bar targets are 1/7 pixel (1σ) and 1/16 pixel (1σ) respectively, and scale bar length error is 0.045 mm (1σ). Taking the measurement of V-star System (VS) as the true value, the OMPS measurement error of spatial targets in X, Y and Z directions are 0.45 mm (3σ), 0.12 mm (3σ) and 0.15 mm (3σ), respectively. This method can provide an algorithm and data reference for the calibration problem of CEP in the on-orbit application of Photogrammetry (PG).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For automatic identification and positioning of the weld seam by the robot, this paper proposed a weld seam recognition and localization method based on the RealSense depth camera and the improved YOLOv5. Firstly, the original YOLOv5 model is improved by inserting the coordinate attention model and getting the center point of the weld in the pixel plane according to the prediction frame. The actual position of the weld seam is then calculated by combining the depth information acquired by the RealSense depth camera. The test results show that the mAP index of this training model improves from 82.3% to 90.8%, which is significantly better than the model before the improvement. The maximum error is 2.9mm when identifying and positioning the object at a distance of 300mm, and the error percentage is within 2% when identifying and positioning the object at a distance of 0.3m-2m. This method established the relationship between the weld detection object and the robot position, compensating for manually moving the weld tracking sensor to its working range. It is of reference significance for welding robot automation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compared with rasterization rendering, ray tracing rendering can improve the image’s visual effect and make the image look more realistic. Real-time ray tracing requires very high computing power of Graphics Processing Unit (GPU). When the number of GPUs is limited and the performance of a single GPU cannot be fully utilized, high rendering latency will occur. In this paper, we propose Multi-Queue Concurrent Pipeline Rendering (MQCPR), a novel ray-tracing parallel rendering scheme based on GPU multi-queue. This scheme divides the image area into multiple parts and uses multi-queue of GPU to enable the computation and transmission tasks in the rendering process to be executed simultaneously, which can maximize the performance of a single GPU and improve the graphics rendering speed. MQCPR may keep the GPU busy to make full use of the GPU resources. Experiments illustrate that in the case of a single GPU, compared with the single queue serial rendering scheme, the number of Frames Per Second (FPS) is increased by 1.5 times after using MQCPR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Planet rovers are becoming more and more crucial as deep space exploration missions develop. Accurate terrain segmentation can increase the rovers' capacity for independent detection. At present, the task of Mars terrain segmentation is difficult and the computational resources on rover are limited. This paper proposes an improved PSPNet algorithm for Mars terrain segmentation. MobileNetV2 is a lightweight network that replaces ResNet as the feature extraction backbone network. The channel attention mechanism SEnet is integrated into the network structure to improve the network's feature extraction capability. The experimental results show that the number of parameters and calculation amount of M-PSPNet are greatly reduced when compared to traditional PSPNet, but the recall rate of M-PSPNet on soil, bedrock, sand, and big rock can reach 92%, 91%, 86%, and 72%, respectively. Engineering-wise, the M-PSPNet is competitive.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper uses the jTessBoxEditorFX combined with the Tesseract-OCR to complete a handwritten font library with nearly 2,500 Chinese characters and completes the program development on the server and mobile terminals. The application realizes the recognition of Chinese handwritten Chinese characters on the mobile terminal. The mobile application provides a construction method for application scenarios and font libraries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Abnormal target detection has been widely used in the field of intelligent monitoring, but its application in housing demolition site is still in its infancy. Thus, on the basis of clearly defining the time, space and category of latent anomalous targets in the housing demolition site, detection algorithm of potential abnormal targets in housing demolition site is proposed in this paper. In order to meet the real-time requirements, some lightweight improvements are designed on YOLOX. Firstly, in order to reduce redundant computing, the backbone network is optimized. At the same time, ECA attention module is introduced to improve the information exchange between channels. Secondly, the DAM module is designed to enhance the receptive field of the effective feature map, and the Ghost module is used to replace the standard convolution block. The experimental data set is composed of MOCS dataset, and 3000 images actually taken at the housing demolition site. The experimental results on the dataset show that the FPS and mAP values of our model reach 57.7 and 83.9% respectively, which fully meet the actual needs of site monitoring.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In weld defect detection, due to differences in sample distribution, the single threshold-based object detection algorithms may lead to low detection accuracy when locating and identifying defects in x-ray images. To address this problem, we propose a weld defect detection method based on the cascaded structure model. More specifically, we improve Cascade Mask R-CNN by using deformable convolution, feature pyramid network, an efficient global context modeling, and self-setting the aspect ratios of anchors. In addition, we introduce the data augmentations of flipping and crop-paste to enhance the size of the dataset. Experiments show that the improved Cascade Mask R-CNN significantly realizes better detection accuracy than other classic two-stage object detection models, especially for minor defects such as round defects and cracks, and verify that the improved Cascade Mask R-CNN partially counteracts the effects of differences in the defect samples’ distribution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Videos of winter sports can be a very good resource for learning about sports in China. In this paper, we address the issues of a small winter sports video dataset and low classification accuracy by proposing a fast video classification approach that combines a transformer and a convolutional neural network. The 3D-Video Swin Transformer model is built using the resnet3D feature extraction component and the Video Swin Transformer, and it improves local and global modeling capabilities through a multi-headed self-attention mechanism. Convolutional operations are used at the network front-end to make for the Transformer's lack of inductive bias, enhancing the network's local modeling capabilities. Convolution operations are utilized at the network's front end to make for Transformer's lack of inductive bias, hence increasing the network's capacity for local modeling and decreasing the model's dependency on vast amounts of data. The experimental results indicate that the 3D-Video Swin Transformer model may achieve an accuracy of up to 76.43 percent on the winter sports video dataset developed in this paper. The classification impact is also substantially stronger, and this accuracy is also 1.15 percent higher than that of the Video Swin Transformer Network. Additionally, we develop and implement a winter sports video classification system based on the Milvus database to facilitate user interaction and enable the submission, categorization, and recommendation of winter sports movies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Snapshot Compressive Spectral Imaging (SCPI) is a computational imaging technique that reconstructs three-dimensional (3D) spectral datacube from two-dimensional (2D) compressive measurements. The dual-disperser coded aperture snapshot spectral imaging (DD-CASSI) system is one of the prototypes to implement the SCPI technique. It can simultaneously acquire and compress the spectral images of target scene, and then the spectral images can be reconstructed from the compressive measurements. Some image priors such as Deep Image Prior (DIP), sparsity prior, low-rank prior and Total Variation (TV) prior can be used to improve the performance of different SCPI reconstruction algorithms. In this paper, we compare the spectral image reconstruction approaches based on the split Bregman algorithm combined with different image priors. These algorithms are assessed based on both simulation data and experimental testbed of DD-CASSI system. Simulation and experimental results show that the DIP prior can achieve better reconstruction performance compared to the other three image priors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a fast defect detection method for oranges is studied from the perspective of machine vision. Firstly, in order to eliminate the interference of orange stems in defect detection, a shape-based template matching method is used to extract the orange stems region and the average pixels of the image are used to fill the region. Secondly, contrast enhancement and edge extraction are performed on the image with orange stems eliminated, and the background is removed by combining morphological processing and filling techniques to obtain the orange region. Finally, by carrying out channel separation on the orange region image, the OSTU threshold segmentation is performed on its red channel image according to the color characteristics of the orange and its defects, and the extraction and contour fitting of the defects are completed by combining area features. The experiments show that the proposed method can quickly and accurately achieve the defect detection of oranges, and then further provide the core detection technology for some automatic orange sorting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.