PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13288, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer Vision and Information Recognition Technique
Optical Music Recognition aims to automatically extract music information such as notes and beats from printed or handwritten music score images using computer vision technology, which holds significant value in fields such as music information retrieval and sheet music digitization, etc. This paper introduces YOLOv8 object detection algorithms into music symbol recognition field, and an improved model named YOLO-Score is proposed based on YOLOv8s.This model brings SPD-Conv into the backbone feature network to enhance the recognition ability for small targets; incorporates LSK selective attention mechanism to focus on more meaningful feature information using extensive contextual information; redesigns the detection layer by adding a small target detection branch and removing the large target detection branch to strengthen the network's feature fusion capability; and employs Shape-IoU as the bounding box regression loss function to improve network convergence accuracy. The experimental results show an 11.2% increase in precision, a 33.0% increase in recall, a 26.6% increase in mAP, and a reduction of 1.8Mb in weight file size compared to the YOLOv8s model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the high complexity and difficulty distinguishing different tea leaf grades in machine-picked fresh tea and avoiding subjective factors, this paper proposes a geometric feature-based grading method for fresh tea classification. It classifies three common types of fresh tea leaves in factories: single bud(A0), one bud with one leaf(A1), and one with two leaves(A2). Firstly, the collected samples of fresh tea leaves are preprocessed, including target cropping, shadow removal, and image denoising. Then, the image is grayscale and binarized to extract the contour of the fresh leaves. Next, the Douglas-Peucker algorithm is used to approximate the edge contour of the fresh tea leaves with a polygon, and the contour edges are extended along a specific vertex. The position relationship between an arbitrary point on the extension line and the polygon contour determines the convex-concave nature of the vertex. Then, the initial key points of the tea leaf contour are found among all convex vertices of the polygon contour based on geometric distance, and the remaining key points are found by combining the initial key points and the concave points of the contour. Finally, fresh tea leaves are classified based on the number of key points. The results show that the accuracy of fresh tea grading can reach 97.08%, effectively classifying fresh tea and providing a reference for objective grading of fresh tea.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most of the current SLAM methods are for experiments in static environments, but most of the practical application scenarios are in dynamic environments, which increases the fitting error of the dynamic feature points and reduces the accuracy of position estimation. In order to eliminate this error, this paper proposes a YRG-SLAM method based on RealSense D435i camera parameters in dynamic environments. In this method, the YOLOv5s target detection thread is added to the original ORB-SLAM3 algorithm structure, the YOLOv5s target detection algorithm is placed on the GPU, and the dynamic feature point rejection module is added to the tracking thread. After acquiring the image, YOLOv5s target detection algorithm first extracts the target features, recognizes the target region in the dynamic environment, and frames it as a dynamic target. The feature points on the dynamic target frame are directly removed by GPU acceleration, and finally feature matching is performed on all remaining static feature points to estimate their positions and orientations. Validation on the publicly available TUM dataset shows that the proposed algorithm in this experiment reduces the absolute trajectory error root-mean-square error by an average of 82.75% in highly dynamic environments compared to the original ORB-SLAM3 algorithm, which significantly improves the localization accuracy and confirms the feasibility of the measure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous development of transportation systems and the rise of autonomous driving, the recognition of road traffic signs is becoming increasingly important in the field of intelligent transportation. The recognition of traffic signs requires higher accuracy and faster speed, which also imposes higher requirements on traffic sign recognition models. Currently, most research tends to focus on higher accuracy, lacking comparisons in model speed. Although most researchers have recognized the good results of training traffic sign recognition models using convolutional networks, they have overlooked the application of the ResNet18 model in traffic sign image recognition. Based on this fact, this paper focuses on constructing and improving the ResNet18 network model and training and evaluating it based on GTSRB, aiming to improve model speed while ensuring high accuracy. After multiple experiments, the accuracy of the recognition model reached 99.60%, with a speed of recognizing each image reaching 0.26ms. Comparative experiments with models such as Single-linkage+CNN and VGG16 on the GTSRB dataset validated the performance advantages of the improved model proposed in this paper (ResNet18_final model).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article mainly takes the visual semantic segmentation of soccer robots as the background, introduces the development history of intelligent football, and explains the importance of machine vision. Firstly, the dataset is introduced, as well as the data processing and enhancement functions. Then, the model construction and architecture are introduced. The focus of this article is on using the model, replacing the optimizer, changing the model architecture, adding modules, and other methods to enhance the accuracy of model image recognition. It includes explanations and comparisons of various models and optimizers. Explained the operation and significance of enhancement methods, attention mechanisms, and freezing layers. The purpose is to improve training effectiveness through these operations. As expected, the accuracy has been improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Virtual reality has shown the trend of accelerating to multi-industry multi-scene application landing. In industrial production, safety emergency and other important fields, numerical simulation calculations based on mechanistic models are often used to simulate the law of change of physical object properties, such as the real law of harmful gas diffusion and the assessment of the scope of the impact of accidents, etc., which are used for virtual scene presentation or the implementation of decision-making. However, the virtual dynamic visual presentation in existing virtual reality applications lacks the information feedback problem based on the analysis of the mechanistic model, resulting in the virtual environment to the real environment exists in the real physical property changes lack of vision optimization. In this paper, we propose a fusion of finite element simulation and data-driven algorithms to quickly embed the real physical object attribute changes into the virtual reality scene, and to enhance the immersive sense of virtual reality industrial applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision is a crucial means for humans to perceive external information. As an artificial device utilizing functional electrical stimulation, visual prosthesis can induce phosphenes through electrical stimulation of the retina, optic nerve, or visual cortex to assist implanters in restoring part of their visual perception. In order to investigate object recognition among prosthesis wearers based on deep motion, this experiment employed virtual reality technology in the prosthesis vision experiment, constructed a virtual simulation of the prosthesis vision scene in Unity, and analyzed the impact of deletion, resolution, and dot size on target recognition under prosthesis vision. The experimental findings revealed that subjects had a lower recognition rate for scenes with 50% missing compared to scenes with 30% missing and standard scenes. Additionally, scenes with a resolution of 128×128 exhibited higher recognition rates than those with resolutions of 64×64 and 48×48. Significant differences were observed between small and standard scenes as well as large scenes. However, there was no significant difference between standard and large scenes. Moreover, the recognition rate for larger object shapes was higher. This study can provide ideas for virtual reality research on simulated prosthetic vision, and provide a theoretical basis for the improvement of prosthesis wearers' training and life ability in the future.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid development of digital media has influenced the change and update of technology, and mixed reality (MR) technology has gradually emerged. While people rely on digital media to obtain information, the extreme dependence on electronic devices also makes more and more people call for it to be put down and walk into reality. Based on this, the exhibition hall culture has become hot. However, it also exposes its shortcomings: low timeliness and high cost. In this context, mixed reality (MR) technology is integrated with the exhibition space to bring a new experience to the audience. Today, some showrooms have begun to combine with MR Technology. On this basis, this paper will introduce the impact and application of MR Technology in exhibition space, as well as the future development trend of the integration of mixed reality, an emerging technology, and exhibition space.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The geometric stylization of 3D shapes has gained significant attention in computer graphics for its unique aesthetic appeal. This paper presents a novel method utilizing face normals to achieve regular dodecahedral stylization, capable of transforming the input shape into the regular dodecahedral style while preserving the content of the original shape. Implementing our method is straightforward because it incurs a cost only in solving several linear systems. Extensive testing demonstrates the effectiveness of the algorithm in generating aesthetically pleasing geometric shapes with a regular dodecahedral style across varying levels of complexity and topological structures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article introduces an automated tool for processing PDF file splitting and drawing information extraction, aiming to split PDF documents in designated folders into separate pages, save each page as a separate file, and use text extraction or image recognition to obtain product model, group number, drawing number, and other information. Drawing naming is standardized according to research institute standards, improving the efficiency and accuracy of drawing management. This tool can quickly and effectively process large quantities of PDF files, providing convenience for the work of the research institute.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the current mainstream traffic sign target detection algorithm's low accuracy, misdetection and omission of small target detection in complicated environments, this paper presents an improved traffic sign detection algorithm based on YOLOv9. AKConv is used to replace the Conv module in RepNCSPELAN4, which maintains the detection accuracy while lightening the weight. Meanwhile, Focal-EIoU Loss is proposed instead of the original regression loss function Clou Loss, this accelerates convergence and raises the accuracy of the regression by dividing the aspect ratio's loss term into the difference between the minimum outer frame's width and height and the anticipated width and height. In addition, the feature extraction capability of the network and the detection accuracy are further strengthened by adding the Convolutional Block Attention Module (CBAM) attention mechanism. On the TT100k traffic signage dataset, the improved algorithm achieves performance metrics of 92.3% and 91.5% in terms of accuracy and mAP@0.5, which are 3.1% and 3.7% higher compared to the original YOLOv9 algorithm. Moreover, the algorithm's misdetection and missed detection problems in complex environments are significantly improved, and the comprehensive detection performance is significantly higher than that of the comparison algorithms, which has greater practical value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Depth or disparity estimation plays an important part in computer graphics and computer vision in recent years. Light field imaging has been widely used in the field of depth or disparity estimation because it contains information on light direction and intensity which can provide dense depth estimation. This paper proposes the SROACC-Net for light field structured light disparity estimation based on the OACC-Net with occlusion-aware cost constructor, where squeeze-andexcitation residual net (SE-ResNet) module is added to improve the accuracy. Moreover, Huber-SSIM loss function is designed to boost the performance of the model. The experimental results demonstrate that the SROACC-Net outperforms the OACC-Net in light field structured light depth prediction. The SROACC-Net under light field structured light provides a promising way for depth estimation in computer graphics and computer vision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The implementation of traffic light detection algorithms is a crucial aspect for the mobility of blind individuals. However, the existing algorithms present limitations in terms of detection efficiency, cost, and applicability to mobile devices. In response to these challenges, a novel lightweight traffic light detection algorithm has been proposed, which enhances the YOLOv8 algorithm. Firstly, a two-layer routing attention mechanism is introduced into the end of the backbone network and the neck network to strengthen the feature extraction capability and suppress the interference of irrelevant features. Secondly, the C2fGhost module is used in the neck network of YOLOv8 in order to reduce the amount of floating-point computation in the fusion process of the feature channels and lower the number of model parameters, while improving the feature expression performance. The experimental results demonstrate that the enhanced algorithm yields an mAP50 improvement of 9.3% on the S2TLD dataset. It also reduces the number of model parameters by 6.7% to 2.7 million. The algorithm achieves a detection speed of 102 FPS, enabling real-time detection of traffic signal targets.
This method has been demonstrated to be effective and superior to other mainstream target detection algorithms through a comparison of their respective results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This research aims to explore the design and realization of 3D shooting fishing game for mobile based on augmented reality (AR) technology. This paper firstly introduces the basic principles of AR technology and proposes the key technologies and methods for the design of mobile 3D shooting game based on AR technology. Then the functional modules and specific implementation process of this shooting fishing game are designed, including game scene modeling, UI interface design, user interaction and other aspects. Finally, the feasibility and effectiveness of the proposed method is demonstrated through the production and practice of the game, which provides a certain reference for the design and realization of 3D shooting game on cell phone mobile terminal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Apples are well adapted to most climates and are grown all over the world. Some diseases are encountered during the growing of apples.0 These disease problems can be manifested through leaf changes. This paper aims to improve the recognition accuracy of various diseases by improving the vgg19 network and transfer learning methods. In this study, we introduce a transfer learning approach and a more stable approach to the optimal solution through a decay strategy of learning rates, utilizing the capabilities of the VGG19 architecture to efficiently classify various apple leaf diseases. The performance of the model was calculated on the verification data set after training, and the accuracy rate was more than 99%. A series of detailed evaluations were made on the test set to confirm the excellent accuracy of the developed model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The multi-modality fusion scheme based on LiDAR and camera has become a mainstream method for 3D object detection task in the Bird's Eye View (BEV) space. To resolve the issue of poor accuracy and loss of height information in the current process of LiDAR point clouds, a 3D object detection algorithm based on Multi-scale Voxel Sampling and Bird’s-eye view Fusion (MVSBF) is proposed. First, the raw LiDAR point clouds are voxelized, and the voxels in different heights are randomly sampled by the multi-scale sampling. Second, the LiDAR-BEV features are generated by incorporating a random voxel sampling layer into the Sparsely Embedded Convolutional Detection (SECOND) network. Third, the extracted camera images features are processed based on the depth estimation to generate the corresponding camera-BEV features. Finally, the two-branch BEV features are subjected to feature fusion by utilizing a module designed to integrate BEV features from two frames. The experiments show that MVSBF can achieve the mean of Average Precision (mAP) of 70.1% and the NuScenes Detection Scores (NDS) of 73.5% on the NuScenes test set, and can outperform the baseline models by at least 0.9% mAP and 1.7% NDS, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given the rapid progress of artificial intelligence technology, object detection, as a significant research direction in the realm of computer vision, has made remarkable progress. However, existing algorithms still face challenges in terms of insufficient detection accuracy, poor robustness, and difficulties in balancing model performance and efficiency in complex backgrounds. To address these issues, this study proposes an innovative object detection algorithm, CGR-YOLO. This algorithm introduces Context Guided down-sampling technology based on YOLOv8s, which effectively filters out irrelevant information and retains valuable data by comprehensively considering the surrounding contextual information, thus markedly enhancing the model's detection accuracy. Furthermore, the CGR-YOLO algorithm employs Repconv to substitute the conventional convolution operations in the Bottleneck module, combining re-parameterization techniques, which not only enhances the model's operational efficiency but also improves overall performance. A sequence of experiments conducted on the Pascal VOC dataset confirms the superiority of the CGR-YOLO algorithm. Specifically, CGR-YOLO improves the mAP50 accuracy by 1.8% and the mAP50-95 accuracy by 2.2% compared to YOLOv8s. These results show that the CGR-YOLO algorithm provides higher accuracy in the object detection domain while enhancing the robustness and accuracy of the model, which provides an effective solution to the limitations of existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, the rapid advancement of automotive and computer vision technologies has made self-driving cars a focal point of interest. A critical factor in the safe and efficient operation of self-driving cars is their ability to accurately recognize traffic signs. Consequently, traffic sign recognition has become essential for autonomous driving systems. This study introduces an optimized YOLOv8 method to enhance traffic sign recognition (TSR) performance. Recognizing that most targets are small and pose a challenge for model accuracy, we employ three different data augmentation techniques on the input images. Additionally, we improve the YOLOv8 loss function by incorporating the Wasserstein distance, which enhances the model's efficiency in detecting small targets. To validate the proposed method's effectiveness, we conducted comparative and ablation experiments. Experimental evaluations on the TT100K dataset indicate that the mAP and precision improved by 1.3% and 0.90%, compared to the standard YOLOv8. These results confirm the proposed method's superior performance in traffic sign detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective: To design a stereoscopic ranging method for anterior flexion ranging using binocular stereo vision technology, and to verify its accuracy. Methods: A binocular camera was used to acquire the test images of the subjects, and OpenPose, an open source framework for human posture recognition, was used to intelligently detect the key points of the hands and reconstruct the 3D coordinates of the key points to complete the distance measurement. Results: A total of 240 sets of binocular images of seated forward bending were collected, and the maximum value was 2.1 mm, the minimum error was 0.1 mm, the average error was within 0.5 mm, and the standard deviation was 0.29. Conclusion: The accuracy of seated forward bending measurement was improved, and it can be applied to the measurement of seated forward bending in school physical fitness test.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Text recognition for Tibetan historical document images is the automated process of extracting and identifying text from these photos utilizing image processing, computer vision, and natural language processing techniques. It is generally understood to be a method of extracting text from visual data. In order to create editable, searchable, and analytical text forms that readers, researchers, and academics may study, preserve, and pass on, it attempts to analyze, identify, and change the text found in traditional Tibetan historical document images. Recent years have seen significant advancements in the effectiveness and performance of Tibetan historical document image and text identification thanks to the quick development of artificial intelligence technology. However, issues with low image quality, non-standard formatting, and notable font style inconsistencies persist in the field of image recognition of Tibetan historical document texts. As a result, the accuracy and universality of current recognizers are low. First, this article gives a general overview of the basic knowledge of Tibetan historical documents, which are the research objects in this field. This helps to better summarize the work of predecessors and support future research.The methods of integrating deep learning and conventional methods for Tibetan historical document image text recognition were then arranged, categorized, summarized, and introduced around the four subtasks of dataset construction, text detection, text recognition, and layout analysis. This served as the basis for sorting through the pertinent statistics and assessment indicators and summarizing the state and advancement of the research. Lastly, future technical development patterns are projected based on the primary issues and problems in this subject that require immediate attention.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the limitations of gait recognition based on a single feature, complementary dynamic and static information is integrated for more accurate identification. Initially, Procrustes mean shape is utilized to extract static features of gait silhouette contours. Then, gait energy image (GEI) is computed, followed by Fan-Beam transformation of GEI. Twodimensional principal component analysis is employed for feature space dimensionality reduction to obtain frequency dynamic features of the moving target. Finally, the Euclidean distances of the two features are fused to achieve the ultimate recognition outcome. Experimental verification on Dataset B from the Chinese Academy of Sciences demonstrates the expected recognition performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, target detection technology is becoming mature, but detecting small targets remains a challenging area of research. To address the issues of small targets on water surfaces, which have less feature information, small coverage areas, and are more prone to occlusion and missed detections, this paper introduces a water surface small target detection model named DSW-YOLOv7. First, the backbone network incorporates deformable convolutions with multiple groups and shared convolutional weights to overcome the shortcomings of insufficient sampling in fixed rectangular structures, expanding the receptive field and enhancing the ability of the model to focus on targets of different sizes. Subsequently, the SimAM attention module is used to enhance network responsiveness to smaller targets while reducing noise effects. Finally, the WIoU v3 loss function is employed, incorporating a dynamic focusing mechanism to boost both the convergence speed and the regression precision of the model. Experimental comparisons are conducted on the FloW-Img sub-dataset publicly released by Orcauboat. The results indicate that the mAP50 of the DSW-YOLOv7 network model can reach 86.5%, which is a 6.4% enhancement relative to the baseline model. Moreover, the detection speed is increased by 19.8%. For images depicting small targets, ultra-small targets, and densely concentrated scenarios, the DSW-YOLOv7 network model shows significant improvement in terms of false positives and missed detection compared to the original network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image Classification and Feature Extraction Technology
Small sample learning aims to solve the problem of scarcity of labeled data and improve the classification performance of the model under limited labeled data. This paper first introduces small sample learning, then introduces transfer learning, incremental learning and meta-learning learning methods and compares them, and puts forward the challenges faced by small sample learning and the future research direction. With the continuous progress of technology and in-depth research, the image classification method based on small sample learning is expected to provide more efficient and accurate solutions for practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-stage classification for lung lesion detection on CT scan images employs a hierarchical approach, involving sequential stages for accurate identification and classification. This methodology integrates medical image processing techniques, including segmentation, feature extraction, selection, and classification, to enhance the detection performance and reliability of lung lesion diagnosis. Medical imaging, particularly medical image processing, is rapidly advancing and transforming various aspects of healthcare, including prevention, diagnosis, and treatment. In lung cancer diagnosis, computed tomography (CT) scans play a crucial role. Accurate identification of masses is essential, as misdiagnosis can lead to incorrect treatments. Detecting and delineating masses within lung tissue pose critical challenges in diagnosis. In this work, a segmentation system in image processing techniques has been applied for detection purposes. Particularly, the use and validation of a novel lung cancer detection algorithm have been presented through simulation. This has been performed employing CT images based on multilevel thresholding. The proposed technique consists of segmentation, feature extraction, and feature selection and classification. More in detail, the features with useful information are selected after featuring extraction. Eventually, the output image of the lung cancer is obtained with 96.3% accuracy and 87.25%. The purpose of feature extraction applying the proposed approach is to transform the raw data into a more usable form for subsequent statistical processing. Future steps will involve employing the current feature extraction method to achieve more accurate resulting images including further details available to machine vision systems to recognise objects in lung CT scan images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The task of few-shot image classification involves dividing a large number of unknown label samples using only a limited number of known label samples. Previous approaches commonly incorporated an efficient initial embedding network into the meta-learning process, which significantly influenced model performance. However, these methods fail to address the issue of overfitting in few-shot learning (FSL) due to the scarcity of available data. Traditional convolutional neural networks also struggle to effectively extract information from such limited samples. Therefore, this paper proposes the introduction of AmdimNet as an embedded network that maximizes mutual information among samples, enabling it to capture detailed feature information from each individual sample within this constrained setting. Additionally, we perform simple data augmentation on the support set to increase its size and mitigate overfitting occurrences. Finally, we adopt a suitable evaluation metric to enhance classification accuracy. Experimental results demonstrate significant improvements achieved by our proposed method on mainstream few-shot image classification benchmark datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Oracle bone script is an important form of ancient Chinese writing, and its study is significant for understanding ancient society, culture, and the development of language. With technological advancements, particularly in computer vision and deep learning, the digital study of oracle bone script has become feasible. This research aims to enhance the efficiency of automatic segmentation and recognition of oracle bone script images through deep learning technology. First, this paper investigates the unique challenges presented by oracle bone script images, such as dotted noise, artificial textures, and intrinsic patterns, and applies image preprocessing techniques such as gray-scaling, binarization, and noise reduction to optimize image quality. Second, convolutional neural networks (CNNs) are employed for image segmentation to accurately isolate individual characters within the oracle bone script images. Then, recurrent neural networks (RNNs) are used to automatically recognize the segmented characters, achieving the conversion from images to text. Finally, tests on original rubbings validate the model's high accuracy and reliability in automatic single-character segmentation and recognition. This study not only improves the level of automation in the image processing of oracle bone script but also provides methods and references for the image processing of other ancient artifacts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
During the evaporation process of ink droplet, solutes easily deposit as "coffee ring" because of the inconsistent evaporation speed of the solvent. Marangoni effect can effectively restrain the coffee ring phenomenon and improve the uniformity of film. In this paper, multi-field coupled simulation is carried out by COMSOL software to simulate the heat conduction of different solvents such as water, ethanol, ethyl ether, and ethyl acetate under different temperature differences. Through simulation analysis, it can be inferred that Marangoni effect is related to solvent volatility. To enhance Marangoni effect and subpress coffee ring phenomenon, low volatile solvents can be chosen as prior solvent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This research focuses on exploring the challenges and problems faced by deep learning in image format alteration. Conventional neural networks often face problems such as blurred edge margins, lack of three-dimensionality in texture details of the generated stylized simulated images, and distorted lines in the stylization of simulated images. We have investigated the method of applying convolutional neural networks based on semantic segmentation for image style transformation with the aim of generating higher quality image data. We used neural network style transformation means combined with semantic segmentation techniques for FCN-CRF images to obtain the corresponding binary masks. The content corresponding masks were fed into the CNN for network analysis to achieve the image style transformation, which in turn generated the starting simulated image in full embroidery style. Finally, the image after the enhancement of the edge contours was integrated with the embroidered fabric and the initial simulated image, so we obtained a more superior visual perception. Using the optimization technique, we can enhance the hierarchical structure of the foreground and background images in the style simulation effect image, and we can improve the edge definition and clarity of the style simulation effect image, so as to create a superior style simulation visual performance than the traditional algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that the image is defaced with more noise and complex features that make the target difficult to be segmented accurately, the traditional CNN target segmentation method is difficult to fully extract the detail information. Based on this, a segmentation network based on improved MaskFormer is proposed in this paper. The lightweight mask attention mechanism is used instead of the Transformer decoding attention mechanism, and the foreground region of each prediction mask is constrained by cross-attention, which reduces the interference of the smeared region and enhances the extraction of local features; GAMS, a multi-scale feature fusion module based on the gating mechanism, is used to carve out the semantic information of the image at different scales, which improves the feature discriminative ability of the model; the exchange of the self-attention and the lightweight mask attention order to reduce the network computation and improve the model training efficiency. The optimal values of evaluation indexes such as MIoU, ssMIoU and msMIoU are obtained in the segmentation experiments on ADE20K and COCO-Stuff-10k datasets after noise defacement treatment, and the improved MaskFormer segmentation performance is better compared with other networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In China, through investigation, it has been determined that Qin bamboo slips unearthed as artifacts are severely damaged, deformed, and corroded. The sluggish advancement in Qin bamboo slip inpainting has prompted the exploration of artificial intelligence applications in the domain of image text, offering a promising avenue for the automated restoration of ancient texts. This paper proposes an improved context encoder to restore missing parts in the Qin bamboo slip character images. An encoder can be used to process images, while another encoder can handle text problems, and the encoded representations from both can be combined to generate answers. Additionally, in generative adversarial networks, using two encoders can enhance the performance of both the generator and discriminator, improving training stability. While one encoder encodes the input data into latent space, the discriminator employs the other encoder to improve discrimination between real and generated samples, thereby elevating generation quality and training stability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In medical image analysis, the robustness and accuracy of classification models are paramount, especially under adverse conditions such as data scarcity, class imbalance, and potential adversarial attacks. This paper introduces a novel deep learning architecture employing curvature regularization (CURE) to enhance the robustness of medical image classification on the PATHMNIST dataset. Our approach integrates advanced convolutional techniques including depthwise and dilated convolutions with an attention mechanism, focusing on precise and detailed feature extraction essential for medical diagnostics. We incorporate curvature regularization to stabilize the learning process by controlling the magnitude of the Hessian's eigenvalues, making the model less sensitive to input variations. The effectiveness of our architecture is demonstrated through rigorous testing, including adversarial scenarios using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). Results show our model not only achieves superior accuracy compared to traditional architectures like ResNet-18 and ResNet-50 but also maintains higher resilience against adversarial attacks. This robustness is critical for practical deployment in medical image analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the present era, the visual detection precision of millisecond bubble level has achieved micrometer or higher accuracy. Consequently, the quality assessment necessitates exceptionally high standards for the pixel values obtained from captured images. The costliness and limited pixel values of high-resolution cameras present a challenge in meeting the stringent detection requirements for individual pixels at 1 micron or above. Mere augmentation of camera pixel values proves insufficient in resolving this issue. This paper proposes a real-time acquisition system based on an array of dual cameras. This system seamlessly integrates algorithms such as image correction, fusion, and median filtering, thereby achieving a substantial enhancement in image pixel values. By significantly amplifying the image pixel values while minimally escalating hardware expenses, this approach elevates the visual detection accuracy of millisecond bubble level, effectively catering to industrial production needs. This study independently establishes an array of dual cameras for real-time acquisition, conducts comprehensive testing, and employs a 25-million-pixel camera for image capture. Through algorithmic applications, the pixel values are boosted to 100 million. Moreover, the model demonstrates consistent performance, significantly enhancing the precision of visual detection for millisecond bubble level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Projection blurring and defocusing is a prevalent issue that can significantly degrade the quality and legibility of projected images and visual content. This paper introduces a novel method to address this problem through the development of a deblurring network based on convolution operation and Triplet attention (DeTNet). This dual-pronged design enables the network to effectively extract salient features related to out-of-focus blurring, while also capturing the crucial interdependencies and interactions across multiple feature dimensions. By modeling both the low-level blur characteristics as well as the higher-order feature correlations, the DeTNet is able to reconstruct sharper, more focused projection outputs. Through extensive experimental validation on the collected datasets, the effectiveness of the proposed approach is thoroughly demonstrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As the form of information transmission has shifted from text to images in recent years, the in-crease in information dimension has led to a growing interest in secure and efficient encryption algorithms. This paper proposes a chaotic system that combines tent mapping and a time-delay chaotic neural network, and it extracts a single neuron to test the system's chaotic properties and stability. On this basis, the randomness test of the chaotic sequence generated by the system is conducted to ensure that the new chaotic system can maintain the chaotic state more effectively. In addition, genetic recombination compensates for the restriction that DNA code cannot be confounded and diffused during the encryption procedure. In this paper, chaotic sequences are applied to genetic recombination and DNA coding, and an algorithm for image encryption is proposed. The data indicate that the algorithm has excellent anti-attack capabilities. In addition, the algorithm has passed the NIST and TestU01 randomness evaluations, guaranteeing its sensitivity and randomness. Therefore, the algorithm is secure and can be used to encrypt color images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this research, we introduce an innovative saliency detection algorithm, comprising three essential steps. Firstly, leveraging fully convolutional networks with aggregation interaction modules, we generate an initial saliency map. Secondly, we extract hand-craft and deep features to express the image, then use manifold ranking method to construct saliency maps. Ultimately, by integrating the outcomes from preceding stages, we generate the final saliency map. Experimental findings demonstrate that our method surpasses twelve cutting-edge saliency detection techniques in terms of precision, recall, F-measure, and MAE value metrics."
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Supervised image captioning methods have made significant progress, but high-quality human-annotated image-text paired datasets are costly to collect. Recently, pre-trained vision-language models, such as CLIP, have shown exceptional performance in cross-modal associations, offering novel solutions for image captioning, such as zero-shot captioning through purely textual training. However, the modality gap exists between image and text presents an obstacle to cross-modal alignment within text-only captioning. Moreover, insufficient visual understanding and over-reliance on textual data in training lead to hallucinations (e.g., object misidentification and inaccurate object counts), resulting in irrational captioning. To tackle these issues, this paper presents RAPCap, a text-only method with retrieval-augmented prompts for text-only image captioning. Specifically, RAPCap enhances the language model's understanding of images by incorporating similar captions obtained through retrieval augmentation, thereby alleviating hallucinations. During inference, RAPCap translates the image to textual space to bridge the modality gap. Experimental results demonstrate that RAPCap achieves a new state-of-the-art performance on the Flickr30k and performs competitively on the MSCOCO compared to previous zero-shot captioning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current image retrieval technologies primarily rely on features such as color, texture, and shape to conduct searches, but their search speed and accuracy still cannot meet user demands. In this paper, while using the Scale Invariant Feature Transform (SIFT) algorithm for image retrieval, the Locality Preserving Projection (LPP) method is employed to reduce the dimensionality of the algorithm, decreasing the number of feature points to enhance the real-time nature of feature point matching. Furthermore, an enhanced approximate nearest neighbor method is applied in the feature point matching process, incorporating a secondary verification mechanism to confirm potential point matches, thus improving matching accuracy. A set of 300 images across five different categories is selected as the target image database for experimental research on image retrieval. Experiments conducted under varying lighting and scaling conditions validate the improved real-time performance and matching accuracy of the enhanced SIFT algorithm in image retrieval, highlighting its potential utility in practical image retrieval applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In computer vision field, the style transfer technique is to synthesize the style of one image with the content features of another image to generate a new migrated image, thus creating a unique graphic art image. In this thesis, the image style transfers achieved based on the VGG19 network model and pre-trained Pix2Pix network model separately were investigated by the comparison method, and the COCO dataset was selected as the dataset. The experiments used the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) to evaluate the performance of the style transfer images. It can be found that the PSNR of the migrated image of the VGG19 network model is 12.6776 and the SSIM is 0.2981, while the PSNR of the migrated image of the Pix2Pix network model is 12.9153 and the SSIM is 0.3182. The result indicates that the migrated image generated by the Pix2Pix network model is better than that by the VGG19 with respect to the image quality features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The laser remote sensing technology, known for its high accuracy and excellent resolution, has found widespread application in various fields such as geographic information systems, environmental monitoring, and urban planning. As the amount of remote sensing data increases, challenges arise in the efficient and accurate detection of multiple targets from laser remote sensing images. In the present work, a multi-target detection method for laser remote sensing images based on machine learning is developed, in which the transfer learning, selection of feature extraction networks, and optimization of the Region Proposal Network (RPN) are explored.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An efficient lightweight forest fire risk prediction algorithm is proposed for extracting fire risk features from remotely sensed images for fast and accurate prediction of forest fire risk. Firstly, a lightweight convolutional attention module is introduced to improve the lightweight bottleneck convolutional kernel in the main module of Efficientnet-B0; then the convolutional layers in the network are optimized by a global non-local convolution module to reduce the parameters and computation. On the test set consisting of 88,061 remote sensing images of forest scenes labeled as zero, low, medium and high forest fire risk, the recognition accuracy of the proposed method was 78.38%, an increase of 4.36% over the original one; it was 4.99%, 7.82%, 3.57%, 5.97%, 4.96% and 7.93% higher than that of similar classic neural networks VGG16, ResNet50, DenseNet121, ConvNeXt, MoblieNetV1 and EfficentNetV2, respectively. The model parameter volume of the proposed method is 4.4M, which is 0.9M less than the selected backbone network EfficientNet- B0, compared with others, its parameter volume is less than only 44.10M, 21.16M, 3.58M, 84.6M, and 17.05M. The results show that the proposed method demonstrates the ability to be accurate and fast in forest fire prediction, while the prediction model has lightweight and fewer network parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Target tracking relying solely on visible images can be unreliable under adverse lighting conditions. In contrast, infrared images, capturing thermal radiation, remain unaffected by such factors. This complementarity has spurred the development of numerous RGB-T tracking methods. However, existing approaches often neglect effective feature extraction across various levels and overlook the computational demands of self-attention mechanisms. Addressing these challenges, we propose SiamEFM, a twin network-based RGB-T tracking algorithm. Our method leverages the Recurrent Cross-Circular Attention (RCCA) mechanism to enhance pixel-level representations, integrates modal information through a feature fusion network, and employs Layer Attention to consolidate features across different levels. Experimental validation on the GTOT dataset demonstrates the competitive performance of our tracker.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the existing UAV aerial image target detection algorithms with lower detection precision and more complex models, a small target detection algorithm is proposed to improve YOLOv8s. First, a new C2f-Faster module is constructed in the feature extraction network using the Partial Convolution (PConv) of the FasterNet Block module, this approach optimizes the preservation of feature information related to small targets while simultaneously minimizing network parameters and computational overhead. Secondly, a small target detection header is introduced in the Neck part to further improve the small target detection capability. This improved algorithm, FD-YOLOv8s, is evaluated on the VisDrone2019 dataset and achieves a detection accuracy of 41.1%, which improves the detection accuracy by 2.6 percentage points and decreases the parameter count by 1.8 points in comparison to the YOLOv8s algorithm. Better detection performance is also obtained compared to other mainstream target detection methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study introduces a Diffuse Optical Tomography reconstruction method that employs both L2 and L∞ fidelity and total variation regularization, solved by the ADMM method. By incorporating L∞ and L2 fidelity to mitigate the effects of uniform and Gaussian noise, along with TV regularization to enhance edge definition and structural integrity, our method effectively addresses the challenges associated with the ill-posed nature of DOT reconstructions. Numerical results shows that our approach can reconstruct satisfactory images under different noise levels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a real-time, high-precision virtual meeting system based on infrared structured light 3D reconstruction principles. The system uses Infrared Structured Light 3D Reconstruction to enable immersive interaction for users within a virtual environment. The focus of the research is to explore the application of structured light technology in developing an efficient VR virtual meeting system. Starting from the background of VR technology in virtual meeting applications, current challenges, and the potential advantages of structured light technology, we establish the theoretical foundation for our study. The architecture and implementation methods of the system are detailed, with particular emphasis on the application of structured light technology in spatial scanning and participant tracking. Methods for assessing system performance, including user experience and efficiency, are introduced. Performance test results highlight the significant role of structured light technology in enhancing VR meeting experiences and system efficiency, supported by quantitative analysis through charts and images. The overall performance of the system is analyzed, discussing the role of the technology, comparison with existing solutions, potential applications, and limitations. The paper concludes by summarizing the main achievements of the VR virtual meeting system, emphasizing the importance of structured light technology in creating compelling VR meeting experiences, and proposing directions for future research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ensuring pavement quality and boosting the efficiency of road maintenance heavily rely on the capability to detect cracks automatically. Aiming at the shortcomings of existing methods in paying attention to crack features and the problem of easy loss of crack detail information in deep feature maps, this paper proposes a network model that integrates side optimization strategy and attention mechanism, using VGG16 as the backbone network. Firstly, to enhance the network's responsiveness to features of cracks, a lightweight shuffle attention module is incorporated following the backbone network's middle and high-level convolution layers. Secondly, in order to further enhance the capture ability of crack features, the corresponding attention module is embedded in the side output of each stage. Finally, the introduction of a spatial separable pyramid module, coupled with the creation of a residual attention fusion module, is aimed at refining the deep feature map to enhance the restoration of intricate crack details. The side network assisted in generating the final prediction image by fusing the different features at multiple levels. The model uses the weighted cross-entropy loss function to calculate the loss, and the trained network can accurately locate the crack in the complex background. To verify the validity of the proposed method, it was compared with six different methods on two public available datasets. The algorithm achieves a good result, and the F-score is 87.19%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cattail weaving products are a green and renewable material, which is highly favored by the market in recent years. Cattail weaving is a traditional folk art in which a variety of household items are made by hand-woven cattails. Leizhou City in Guangdong Province, China is famous for its unique cattail weaving technique. However, the Leizhou cattail weaving products in the market are relatively unoriginal and homogenous, which hinders its further development. In order to broaden the application scenarios of cattail weaving and inject new vitality, this study analyzes, summarizes and refines the design patterns in the cattail weaving process by combining digital technology. Using the Grasshopper plug-in in Rhinoceros software, the digital modeling technology and parametric design of cattail weaving products are developed in depth, and a series of cattail weaving lamps and lanterns are designed. By studying the digital protection and dissemination methods, new vitality is injected into Leizhou cattail weaving products, promoting their development and inheritance in the market.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
From the perspective of adapting to the demand of power dispatching, it is extremely necessary to improve the training effect of power dispatchers. Therefore, a virtual training method for power dispatchers based on knowledge transfer and multi task learning is proposed. After analyzing the characteristics of the specific composition of the power transmission and transformation environment, taking each sequence network as the basic unit, a separate power flow characteristic analysis is carried out by means of multi task learning, and the three-phase power flow state information of the power transmission network is determined through the change law of the phase sequence, and the 3D model of the power dispatcher training scene is constructed by means of knowledge transfer, This is the implementation carrier of power dispatcher virtual training. In the test results, under the design of virtual training methods, the accuracy rate of the trainees' implementation of business skills in each major business scenario in the simulation training process is higher than 95%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Augmented Reality (AR) has become an increasingly used technology to support and enhance the development of cultural tourism products. In this type of cultural tourism product, experience design is critical to the effectiveness of the narrative. The case study presented in this paper examines the path of combining mobile AR technology with experience design to integrate tourism cultural and creative stamp products, and takes Doumen Old Street culture as an example to illustrate the specific design ideas for combining AR technology with cultural and creative products. Through three-dimensional modeling to build virtual cultural buildings, apply image processing technology for cultural and creative products seal design, combined with Unity engine and Vuforia engine technology to create a supporting APP. enhance the interest and willingness of visitors to buy AR cultural and creative seal products.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Agriculture, an integral part of human existence, plays a central role in the economic and strategic sphere of every country. Crop productivity is of concern to all, as pests and diseases can significantly reduce yields or even lead to the death of crops. Therefore, the detection of such threats is of utmost importance. With the advances in deep learning, the application of object detection algorithms to identify and monitor these problems has gained momentum. The present work uses the improved YOLOv8 algorithm as a foundation and introduces extensions that show improved performance on the widely recognized IP102 dataset intended for public insect images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem of cellular lung CT image segmentation difficulty caused by complex backgrounds, different shapes, and blurred tissue boundaries, an ARIFNet model with a codec structure is proposed. Firstly, depthwise separable convolution with fewer parameters is used as the decoder to reduce the complexity of the model and improve the segmentation efficiency. Second, a reweighted jump connection module RWC acting on jump connections is constructed to adjust the pixel weights according to the spatial locations of the features to suppress the expression of irrelevant information in the high-level features. Finally, the multi-scale information fusion module MIF is added to fuse multi-scale information to provide richer contextual information. Experiments showed that the ARIFNet model could segment the lung lesion region more accurately, and the segmentation indexes IoU, mIoU, Dice, and mDice reached 89.75%, 94.76%, 94.09%, and 96.99%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of Computer-Aided Diagnosis (CAD) systems and Deep Learning (DL) technologies, significant advancements have been made in the automatic analysis of medical images, particularly chest X-rays (CXR). However, the scarcity of annotated medical image data poses challenges to the performance and generalization ability of machine learning algorithms. Recent studies have investigated this issue by utilizing Visual-Language (VL) models such as MedCLIP, which leverage image-text pairs to reduce data acquisition costs and enhance training efficiency. In this paper, we focus on CLIP, a type of VL model which is used for medical image analysis. We have implemented an automatic lung diagnostic system based on MedCLIP: ChexPert-MedCLIP Integrated Radiography Analyzer (CMIRA). This system utilizes advanced DL algorithms and robust understanding capabilities of text-image correlations to achieve rapid and accurate analysis of chest X-ray images. The application of this system aims to provide clinicians with more reliable diagnostic support tools, promoting early detection and treatment of diseases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to more effectively extract the identification images of road settlement monitoring points of autonomous rail rapid transit (ART) from the actual complex scene, this paper proposes a binarization processing method based on the optimized Otsu algorithm. Firstly, the color-coded target image is converted into a grayscale image using the weighted average method to reduce the computational amount and improve the computational efficiency; secondly, Gaussian filtering is performed on the grayscale image to effectively remove the Gaussian noise in the image and improve the quality of the image; lastly, combining the morphology closure operation with the improved Otsu algorithm, the filtered image is binarized in order to eliminate the boundary effect and accurately extract the identification of monitoring point. The results show that the proposed method of this paper has higher stability and accuracy in processing the coded target images of road settlement monitoring point of ART.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the issues of low accuracy and high computational demand in road defect detection, This study puts forward a lightweight detection network, YOLOv8-DATB, built upon an enhanced YOLOv8n. Initially, the C2f module in the backbone is substituted with the C2f-DAT to effectively extract key regional features in road defect images while significantly reducing computational load. To bolster the model's capacity for handling features across various scales and enhance its fusion efficacy, a modified, lightweight Bi-directional Feature Pyramid Network (BiFPN) is integrated into the network's neck component. This strategic inclusion aims at not only optimizing the model's ability to detect smaller objects with heightened precision but also curbing the increase in model parameters, thereby achieving a balance between efficiency and performance. Compared to the original algorithm, the YOLOv8-DATB improves accuracy and mean average precision (mAP) by 4.9% and 1.1%, respectively, while reducing parameters by 33.7% and computational load by 11.1%. Additionally, it shows a significant increase in detection accuracy compared to other mainstream models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Chinese chainmail pattern has a rich historical lineage, constituting a mathematically precise and aesthetically ordered geometric manifestation originating from ancient chainmail armor. It stands as an exemplar among the traditional Chinese patterns. However, contemporary research on the Chinese chainmail pattern confronts challenges associated with its insufficient cultural connotation, deficient parameter expression methods, and a notable absence of modern design transformation and innovation. In the era of booming digitization and the Internet, the integration of computer technology into the inventive redesign of geometric traditional patterns, with the Chinese chainmail pattern being a striking example, has the potential to breathe life into these ancient patterns within the framework of modern design. This not only revitalizes traditional patterns but also makes them more prolific and easy to communication and application. This article positions the developmental trajectory of the Chinese chainmail pattern as its inception point. By methodically enhancing its reconstruction and iterating digitally using parametric techniques, this study attempts to amalgamate the traditional Chinese chainmail pattern with contemporary design paradigms. This amalgamation is subsequently applied to the field of modern product design, thus offering novel perspectives and innovative approaches for the transformation of modern design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ships usually affected by various weather conditions. How to deal with bad weather is the challenge that ocean-going ships must face. The radio-meteorological facsimile receiver is an important equipment for the ship to obtain weather data in the sea where the ship is located. The weather map received by the receiver gives the weather situation at a certain time in the sea where the ship is located. This paper proposes a detection and identification method for general pressure systems on weather charts, that is, various elements contained in general pressure systems on weather charts are detected and identified through image processing and machine learning methods. In this paper, 5 weather charts are randomly selected for testing, including 51 groups of general pressure systems, among which 36 groups of general pressure systems have completely correct identification results. 11 groups of identification results were partially correct, and 4 groups of identification results were completely incorrect. The implementation of this method can assist the officer to intuitively judge the weather situation of the ship sailing area, and provide support for the ship sailing decision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To provide more reliable data support for road maintenance, this paper proposes a DeepLabv3+ semantic segmentation algorithm that integrates dense and attention mechanisms. Due to the indistinct boundaries of weak cracks, convolutional kernels are difficult to match with crack images, resulting in the loss of crack information in some segmentation results. Therefore, a dense connection mechanism is introduced into the baseline network. To address segmentation errors caused by interference such as shadows and zebra crossings in crack images, two attention mechanisms are further integrated. Experimental results show that the segmentation performance is significantly improved after introducing the dense connection mechanism, with the mean Intersection over Union (MIoU) reaching 80.2%, a 4.7% improvement over the baseline network, while the segmentation speed is also superior. The integration of dense connections and attention mechanisms effectively improves the segmentation performance of the network, with an MIoU of 83.9%, which is 8.4% higher than traditional methods, and a pixel accuracy (PA) of 98.8%, which is 6.7% higher than traditional methods. Therefore, this method outperforms traditional methods in the segmentation performance of road crack images and also demonstrates superior segmentation speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Forests, as vital ecological resources of the Earth, frequently witness devastating wildfires that pose enormous threats to the environment, ecosystems, and human life. To mitigate the losses caused by forest fires, it is imperative to enhance fire prediction and control measures, with video surveillance technology playing a pivotal role in fire watch. Focusing on the extraction of prominent features from wildfire videos and the application of deep learning techniques, the system employs unmanned aerial vehicles (UAVs) to acquire navigational video data in forest areas, integrating multiple payloads for real-time forest fire detection. The project enhances methods for identifying smoke from forest fires in videos, ensuring a stable and reliable forest fire monitoring system alongside real-time data acquisition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of information technology and smart classroom, the accuracy of learner behavior recognition in classroom has become more and more important. In the process of behavior recognition by means of information technology, including deep learning technology, it is often difficult to accurately identify learners' actions due to limb occlusion, motion blur and camera position. Meanwhile, frequent classroom management with the help of teachers will also lead to class interruption and destroy the overall fluency of teaching. Based on this problem, this work uses an improved pose estimation method based on single-stage key points and pose detection algorithm to classify and judge the movements of learners in the classroom for problems such as limb occlusion and motion ambiguity, so as to achieve the purpose of assisting the management of classroom teaching process. Finally, the results of the behavior recognition detection method proposed in this paper are compared with the marked action image data, and the experimental data show that the proposed method can meet the needs of learners' behavior recognition detection in the classroom to a certain extent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For fire detection in People's Daily life can effectively avoid property losses and casualties, based on this, an improved smoke fire identification algorithm based on YOLOv5s model is proposed. The feature extraction module of YOLOv5s was redesigned, and the CA attention module was introduced into the C3 module to build a new feature extraction module C3CA, which enhanced the feature extraction capability. The neck conv module is improved to AKConv module, which reduces the parameters required in the running of the model, speeds up the reasoning speed and improves the timeliness of the model. For the feature fusion part of the network, BAFPN, which combines the repeated bidirectional cross-scale connection with the weighted feature fusion mechanism on the basis of BiFPN and the AVCStem modules, can better focus on the features of small and medium-sized targets in different situations and complex backgrounds. It effectively solves the problem of difficult to determine the edge range and fuzzy target shape for detecting such objects as flame and smoke. The experimental results show that the improved map index is increased by 6.9%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of weather image recognition, the recognition of weather categories should not only consider local features, but also needs to consider the global multi-feature synthesis to avoid misjudge. To address this issue, a weather recognition method based on image organic fusion of deep neural network DenseNet and multi-head attention mechanism is proposed. Among them, DenseNet is responsible for capturing local features in weather images, while the attention mechanism is responsible for focusing on the global feature information of the image, so that the organic fusion of local and global features can improve the accuracy of weather recognition. The proposed method can recognize six types of weather—cloudy, haze, rainy, snow, sunny and thunder. The experimental results show that the model using DenseNet alone achieves 85.82% accuracy on the test set, while the accuracy of the model with the fusion of the attention mechanism increases to 87.45%. This suggests that fusing deep neural networks with attention mechanisms is an effective way to recognize weather categories.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Transmission line construction progress monitoring is critical for grid security control. This study proposed a method of using satellite remote sensing and UAV technology for transmission line construction progress detection and unplanned operation troubleshooting in response to the challenge of controlling unplanned operation problems in transmission line construction operations. The study utilized satellite remote sensing technology to obtain high-resolution images of the transmission line construction operation site, and combined with intelligent recognition algorithms, it realized the automatic identification and status monitoring of typical phases such as foundation construction, tower formation construction and wire erection construction. By intelligently comparing with the progress of the operation plan, the method in this paper can realize the accurate investigation and early warning of unplanned operation, which improves the lean management level of construction safety. In addition, this study also designed the core functions of the transmission line progress management system, including data acquisition, processing, analysis, display and early warning, etc., which provided strong support for the monitoring of on-site operations of transmission and substation projects. This study not only solved the limitations of traditional control means in transmission line construction operations, but also provided an important guarantee for the safe and stable supply of electricity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most single-stream Transformer trackers lack a template updating strategy, relying solely on the first frame's image, leading to model drift with alterations in the target's visual characteristics and surroundings. During template updates, issues like poor quality, excessive updates, or improper timing can degrade tracking performance. To mitigate these obstacles, we propose DTUCTrack, a robust approach using dynamic template updates based on mean ensemble selection of high-confidence scores. This approach effectively adapts to alterations in the target's visual presentation and maintains stable tracking. Additionally, we introduce a mechanism to control the number of high-quality template candidates, avoiding issues of insufficient quality or excessive candidates. Extensive trials indicate that the efficacy of our approach outperforms baselines and attains state-of-the-art effectiveness in challenging GOT-10k and OTB100 datasets while maintaining real-time speeds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces an innovative classroom teaching platform that utilizes the YOLOv8 algorithm and real-time video surveillance technology to recognize and analyze the facial expressions of students and teachers. By analyzing the attention and emotional states of students during class, the system generates real-time feedback to help teachers adjust their teaching strategies, thereby improving classroom teaching efficiency. The practical application of the platform demonstrates its significant effectiveness in enhancing students' attention, classroom participation, and knowledge mastery. Additionally, the paper proposes further optimizations to the algorithm, hardware upgrades, and application expansion to enhance the platform's functionality and applicability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Point cloud data often has a large volume, and it is often sampled to reduce computational load and storage. Common sampling methods like Random Sampling (RS) and Farthest Point Sampling (FPS) can rapidly sample point cloud. However, these sampled points are not related to subsequent tasks. Current task-driven approaches generate sampled point cloud in batches regardless of contextual information. Moreover, they ignore the distance between sampled points, which results in dense distributions. To address these issues, we propose a point cloud downsampling network based on Long Short-Term Memory (LSTM), named CA-Net. It uses three-layer DGCNN and mixed pooling to achieve feature vectors, a LSTM network to catch contextual information, and two sampling loss functions to make the sampled points uniformly distributed. Experimental results on reconstruction and classification tasks demonstrate that the proposed downsampling network significantly outperforms existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The steel used in thermal power plants will undergo creep aging and performance degradation after long-term service in high-temperature and high-pressure environments. Regular metallographic inspection is required to ensure the safe and stable operation of the equipment. In response to the issues of low efficiency and poor repeatability in the evaluation of metallographic structure, which are easily influenced by human factors, this paper uses metallographic inspection images to establish a sample dataset and presents a deep learning-based metallographic structure evaluation method for thermal power steel based on the ConvNeXt-T convolutional neural network model. The performance of the constructed model on the validation set was evaluated using a confusion matrix. The accuracy of the model's spheroidization assessment for pearlite was 98.7%, the precision was 97.3%, the sensitivity was 97.2%, the specificity was 99.1%, and the F1-Score was 97.2%. This indicates that the method is capable of accurately assessing the metallographic structure of thermal power steel, overcoming human factors, improving rating efficiency, and forming an objective evaluation. It provides a new method for the intelligent assessment of the metallographic microstructure of thermal power steel, helping the power industry move towards digitalization and intelligence in metallographic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Presently, the approaches for detecting and suppressing various forms of spoofing interference are undergoing rapid development. The spoofing interference detection approach based on spatial information is capable of effectively detecting spoofing assaults originating from stationary locations. Nevertheless, this approach has notable limitations and downsides, since it necessitates the use of an antenna array for signal reception, which may be expensive and operationally intricate. In order to tackle these problems, we suggest a spoofing interference detection technique that relies on a rotating single antenna. This technique necessitates the use of a solitary spinning antenna to identify counterfeit signals that arrive from a direction that does not match the actual arrival direction. The distinguishing factor for detecting differentiation is the disparity between the direction of arrival of the spoofing signal and the actual signal. Spoofing interference detection is achieved by comparing the estimated value of the signal's arrival azimuth with the theoretical value acquired from ephemeris information. This is done to ensure consistency in the paired signal arrival azimuth discrepancies between the estimated and theoretical values. Moreover, the generalized likelihood ratio test is employed to identify numerous arrival directions of spoofing signals. This paper substantiates and assesses the suggested approach by means of software simulation analysis and experimental verification. The findings demonstrate that this strategy may successfully achieve the desired detection outcome.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting safety helmets and reflective clothing holds paramount importance in railroad safety operations. However, the intricate and dynamic nature of railroad construction environments often poses challenges for existing detection methods, resulting in issues such as false positives, misses, and suboptimal detection efficiency. Hence, we introduce the ODGSconv-Yolov8s algorithm for enhancing Yolov8s in safety helmet and reflective clothing detection. Initially, we present a highly efficient network architecture, ODGSconv, which reinforces the Neck component to ensure both accuracy and generalization while significantly reducing parameter count. Next, SimSPPF replaces the SPPF in the Backbone to integrate multi-scale features and mitigate the detection leakage rate. The experimental findings demonstrate that the algorithm achieves a mean Average Precision (mAP) of 92.0%, an accuracy of 91.9%, and a recall of 87.9%, and can accurately detect helmets and reflective clothing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the actual production operation, it is important to monitor the environmental temperature. Aiming at the problems of complex structure and electromagnetic interference of traditional electric sensors, a temperature sensor based on fiber Bragg grating (FBG) is designed to monitor the ambient temperature. The response model between the wavelength variation of FBG center and temperature is established by the neural network fitting tool. The experimental results show that the model has a minimum root-mean-square error of 0.83426, and the overall correlation between the predicted data and the real data reaches 0.98512, which meets the actual monitoring requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.