PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13416, including the Title Page, Copyright information, Table of Contents, and Committee Page..
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent Algorithm and Technological Innovation Application
Brake discs are the critical component of high-speed train braking systems. To address the issue of missing brake disc bolts in high-speed trainsets, this study introduces an enhanced and lightweight fault detection approach utilizing the YOLOv5 network. The network replaces the backbone of the YOLOv5s model with the FasterNet architecture to serve as the feature extraction network. Furthermore, the Pconv convolution is employed to replace the C3 module in the Neck layer, substantially diminishing the model's parameter count to fulfill lightweight objectives. In response to the scarcity of fault samples, this paper proposes the integration of a PSA mechanism and the Focal EIoU loss function. This approach is designed to counteract the imbalance between positive and negative samples within the dataset, thereby increasing accuracy. The experimental findings indicate that the model presented in this study attains a precision rate of 96.48% on the high-speed train brake disc bolt missing fault dataset, with mAP@0.5 of 92.06%. Relative to the YOLOv5s, enhancements of 2.77% and 1.4% were observed, respectively. The model contains 1,012,832 parameters and achieves a detection speed of 16.19 FPS.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For the pedestrian detection task from the perspective of unmanned aerial vehicle (UAV), the detection accuracy is low due to the effectiveness, complex background, large scale change and uneven distribution of detection targets. This study proposes an improved Yolov8x algorithm based on normalized Gaussian Wasserstein distance, which improves the accuracy, recall rate and mAP of pedestrian detection by 5%, 4% and 6%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a joint algorithm based on variational mode decomposition (VMD) algorithm and White Shark Optimizer (WSO) algorithm is proposed for non-contact vital signs detection. First, the phase information is obtained from the radar echo signal. Then, the respiratory and heartbeat components are separated using a bandpass filter. The separated respiratory component is processed by second-order differencing. At the same time, the heart rate is estimated by decomposing the separated heartbeat component using WSO-VMD. WSO algorithm makes VMD algorithm parameters adaptive, and filters out breathing harmonics from interfering with the heartbeat. Compared with the bandpass filtering method, the VMD method, and the WOA-VMD method, the mean relative error (MRE) is reduced by 3.61%, 2.99%, and 2.38% respectively. Therefore, the method in this paper can improve the detection accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Batch processes are widely used in biopharmaceuticals, chemical engineering and other repetitive tasks in industries. Generally, batch processes focus on the quality of the product. Therefore, it is necessary to study the quality prediction of batch processes. In this paper, a quality prediction method based on incremental support vector machine is proposed for the multi-phase and non-linear batch processes.The basic idea of this method is to select some of the process variables of the batch process, i.e., to determine the corresponding input variables affecting a certain output, and then build a support vector regression model and learn this model incrementally to obtain a new model , thereafter quality prediction is performed based on this new model. In the paper, the penicillin fermentation process is taken as the simulation object for quality prediction. The simulation results show that the proposed incremental support vector regression algorithm can describe the fermentation process well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Bird damage trip data may contain a lot of noise and outliers, which requires the detection algorithm to have strong data processing ability to accurately extract useful information from complex data. However, current detection algorithms may need to be improved in terms of data processing capabilities. Therefore, the Logistic chaotic detection algorithm of bird damage tripping probability of 10kV distribution line is designed. The fault current of 10kV distribution line is extracted, and the hierarchical model of bird hazard risk assessment based on geographical characteristics and tower structure characteristics is established. By using Logistic chaotic time series, the abnormal data component is extracted to realize the detection of bird damage tripping probability of 10kV distribution line. The experimental results show that the predicted results of the design method fit the real situation, the average relative error is reduced by 10%, and the average detection time is gradually reduced from 0.71 seconds in 10 iterations to 0.48 seconds in 50 iterations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the vehicle routing problem (VRP) in modern logistics system, this paper proposes a multi-constraint vehicle routing problem optimization model with time window, considering the diversity of cargo circulation and vehicle transportation efficiency, allowing vehicles to disperse operation at nodes and improve the transportation performance. In this paper, the improved honey badger algorithm (IHBA) is adopted to introduce Cubic chaotic mapping and random perturbation strategy, and the elite tangent search and differential variation strategy are adopted to enhance the global search and avoid premature convergence.IHBA Validates Potential to Solve Complex Vehicle routing Problems and Demonstrates Theoretical Practical Value of VRP Research and Logistics and Distribution at CEC2017.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The weights of various factors in air combat situation assessment are determined by experts, which has certain subjectivity and limitations. This manuscript uses heuristic optimization algorithms to optimize the weights. Firstly, the advantage functions of angle, speed, altitude, and distance are established, and the close-range air combat status information is quantified as situational description parameters. Secondly, use the physical heuristic nuclear reaction optimization (NRO) algorithm to optimize the situational function weights, achieving the optimal situational combination. Through a set of air combat situational weight optimization experiment, the results show that NRO is better than the selected comparative optimization algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the intelligent development of ironmaking technology, pellet size detection has gradually manifested intelligence. Aiming at the problem of on-line detection of pellet size, the pellet size detection results are obtained by CCD image acquisition, image Gabor filtering, image enhancement and improved adaptive watershed algorithm, and compared with the traditional watershed algorithm. The results show that the improved watershed algorithm proposed in this paper has better effectiveness and can accurately detect the pellet size, which is of great significance to ensure the stability of the blast furnace smelting process and improve the smelting efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of intelligent technology, the traditional open-pit mining car scheduling system has been unable to meet the needs of efficiency, cost and intelligence. Aiming at the problems of low efficiency and high operating cost of the mine car scheduling system, this paper proposed a mine car scheduling system based on fruit fly optimization algorithm and improved immune particle swarm optimization algorithm. By integrating the concentration regulation mechanism and combining the maximum particle concentration value, the number of sub-populations is adjusted, and the search range of vaccination is adjusted by using the maximum particle concentration value. The results show that the proposed algorithm has better convergence accuracy and global search ability, and can be well applied to the mining car scheduling system, which is of great significance for improving transportation efficiency, ensuring production safety, reducing operating costs, optimizing resource allocation and improving management level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to reduce the query load of nodes and improve the processing efficiency of network system, this paper proposes a network resource scheduling algorithm for cloud data center. According to different information sources, the confidence interval index of resource scheduling is obtained, and the prior distribution of resources is obtained according to the splitting history of network resource data nodes, and the network resource characteristics of cloud-side data center are extracted through link distance; Accurately encode the data of various resource states, task requirements and scheduling strategies, analyze the source code states, conversion processes and programming representations, and realize the network resource data encoding of the cloud data center; Linear decreasing is carried out, and the network resource scheduling structure of cloud data center is established to realize the research of resource scheduling algorithm. The experimental results show that the load balance coefficient achieved by this algorithm is stable above 0.9, which ensures that it can still maintain an efficient and stable running state under high load, with high occupancy rate of each index and good scheduling effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of 3D reconstruction technology, the 3D measurement of high-dynamic-range objects has always attracted the attention of many scholars. This paper proposes a exposure algorithm for high-dynamic-range objects structured light reconstruction based on the local homography matrix. By improving the traditional calibration method and introducing the concept of the homography matrix, this method significantly enhances the calibration accuracy, which in turn determines the reconstruction accuracy of high-dynamic-range objects. Then, by utilizing the functional relationship of the camera response curve to analyze the relationship between image pixel values, illumination values, and exposure time, the algorithm automatically calculates the exposure time series for image fusion. Combined with high-precision calibration results for 3D reconstruction, it can effectively restore the lost point cloud information in high-dynamic-range areas, thereby improving the accuracy and adaptability of the reconstruction. Experimental results show that this method greatly reduces the re-projection error, making the error distribution more concentrated. Through 3D reconstruction experiments on high-dynamic-range objects, the effectiveness of the algorithm in this paper is verified, which can solve the problems encountered in traditional 3D reconstruction when dealing with high-dynamic-range objects and greatly improve the level of automation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of deep learning, using object detection algorithms to detect defects in aerial insulator images has become the main way of insulator inspection. In response to the problems of low detection accuracy of small targets, weak representation ability of feature maps, and limited extraction of key information in traditional object detection algorithms, this paper introduces the ECA attention module into the YOLOv51 model, which can effectively enhance and suppress a large amount of feature information at the channel level of the feature map, improve the network's attention to defect areas, and fuse shallow and deep information of insulators to avoid the loss of feature information and reduce the incidence of missed detections. The improved model has improved the accuracy of insulator defect detection, especially in identifying small target defects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development and widespread application of cloud computing technology, cloud platforms are facing increasingly complex traffic management challenges, especially in randomly changing network environments. To effectively address these challenges, this paper proposes an intelligent traffic governance method for random environment cloud platforms based on meta heuristic algorithms. Introducing genetic algorithms and particle swarm optimization from metaheuristic algorithms, combined with the advantages of random algorithms and local search algorithms, to achieve intelligent management of cloud platform traffic. This method adaptively explores and utilizes the structure of cloud platform traffic data to quickly find the global or approximate optimal solution for traffic management. The experimental results show that this method can significantly improve the efficiency and effectiveness of cloud platform traffic governance in random environments, ensuring the stability and reliability of cloud services.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problems of low efficiency and difficulty in emergency evacuation of multi-store buildings, this paper proposes an emergency evacuation path planning method for multi-store buildings based on ant colony algorithm. This method improves the heuristic function, takes into account the environmental impact of fire and the behavior characteristics of evacuees, and overcomes the blindness of ant colony algorithm. The adaptive pheromone updating strategy based on the complexity of obstacles effectively compensates the shortcomings of traditional ant colony algorithm such as slow convergence speed and easy to fall into local optimality, so that the improved ant colony algorithm can converge faster in simple building structures and find a better path in complex building structures. The improved ant colony algorithm is applied to the emergency evacuation model of multi-store buildings to realize the spatial optimal emergency evacuation path planning. The simulation results show that the improved ant colony algorithm can efficiently plan the optimal path and improve the safety and efficiency of emergency evacuation of multi-store buildings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to complex backgrounds, varying object scales, and many small objects in Unmanned Aerial Vehicle (UAV) imagery, existing algorithms often have low detection accuracy, especially for small objects. To tackle these issues, we propose an improved YOLOv5-based algorithm. A Backbone Feature Weighted Fusion (BFWF) module is introduced to extract and fuse multi-scale features from the backbone. These extracted features are fused with the original neck output through cross-neck connections, enhancing fine-grained features. The neck structure is adjusted to improve the diversity and weighting of shallow fine-grained features. An Adaptive Spatial Feature Fusion (ASFF) module dynamically fuses features at different levels in the Path Aggregation Network (PANet), improving small object detection accuracy. By replacing the original loss function with the Efficient Intersection over Union (EIoU) loss function, we achieve more precise learning of small object dimensions across multi-scale scenarios. Experiments on the VisDrone2021 dataset show that the improved algorithm increases mAP50 by 3.7%, mAP75 by 2.5%, and mAP50:95 by 2.4% compared to the original YOLOv5 algorithm. This significantly enhances detection performance, making it more suitable for UAV imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Capsule Networks (CapsNets) were proposed by Geoffrey Hinton and his team, pioneers in the field of deep learning, in 2017 to address certain limitations of traditional Convolutional Neural Networks (CNNs). The core innovation of Capsule Networks lies in their representation of objects within images. They not only identify the presence of objects but also capture spatial relationships and pose information, thus offering a richer and more equivariant feature representation. However, existing literature in this area remains relatively scarce. This paper aims to conduct a brief preliminary exploration using Capsule Networks for a medical binary classification task on a pneumonia dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the acceleration of urbanisation and population densification, urban rail systems face path planning challenges, and traditional methods are difficult to cope with complex traffic. This study focuses on the application of multi-agent cooperative game algorithm in urban railway train path planning. Based on the theory of cooperative game, we design the algorithm framework, clarify the cooperation mechanism, game rules and benefit distribution among agents, and promote the optimisation of individuals and the whole. Experimental validation shows that compared with the traditional algorithm, the new algorithm significantly reduces the number of iterations (79.39%), path nodes (50%), path length (8.26%), and planning time (5.19%), which verifies its high efficiency and practicality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-access edge computing (MEC) can enhance the user experience by deploying caching infrastructure at the edge. But MEC's cache and computing resources are limited. So, selectively caching and transcoding video, determining the server to obtain video are crucial to improve video service quality and reduce resource rental costs. In this paper, a MEC cache, transcoding and transmission strategy based on improved differential evolution algorithm is proposed. The aim is to minimize resource rental costs and response delays for video content providers. In order to improve the convergence speed of multi-objective differential evolution algorithm, an adaptive parameter algorithm based on deep learning is proposed to control parameter. Through a lot of experiments, compared with the existing strategy, the strategy proposed in this paper has better performance in resource rental cost, response delay and convergence speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The detection of road injuries is crucial for preventing traffic accidents. Given the lack of lightweight in existing road damage detection algorithms, we propose an enhanced Yolov8 algorithm based on GhostNet. This approach effectively reduces the complexity of the network, in terms of both computing requirements and the number of parameters. Moreover, the incorporation of the CBAM attention mechanism module has led to notable enhancements in the accuracy and resilience of the detection process. Furthermore, the F-EIOU Loss is employed to further enhance speed of convergence and positioning accuracy of the model. Results on the RDD2022 dataset demonstrate that our method increases precision, recall, and mAP by 6.4%, 3.9%, and 5.4%, respectively. The parameter size is reduced by approximately 25%, thus confirming that our innovative improvements not only enhance detection accuracy significantly but also notably decrease quantity of parameters in road damage detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study introduces an enhanced Walk on Spheres (WOS) algorithm, integrated with Generative Adversarial Networks (GANs), to simulate convection-diffusion processes involving velocity fields. The traditional WOS algorithm handles isotropic diffusion well but struggles with convection-diffusion scenarios. By training GANs to generate spherical distribution data, the extended WOS algorithm adapts to varying boundary conditions and convection influences. Our approach segments the computational domain into multiple radial intervals, training a specific GAN model for each interval to capture particle dynamics accurately. This segmentation allows the algorithm to select the most suitable model in real-time, enhancing simulation accuracy and adaptability. Experimental results show significant improvements in computational efficiency and accuracy compared to traditional Monte Carlo methods, with the enhanced WOS algorithm reducing runtime by approximately 40% while maintaining high precision. The algorithm’s robustness and flexibility make it valuable for various applications, including ecological dynamics, disease modeling, and fluid dynamics. Future work will focus on optimizing the algorithm for highly irregular media and expanding its application to broader scientific and engineering problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The differential evolution algorithm has rich, successful experience in parameter settings. How to reasonably control strategies and parameters and effectively utilize feedback information from individuals in the population has become a hot topic of concern. In this paper, an adaptive differential evolution algorithm based on population feedback information (TtDE) is proposed. It uses multi-subpopulation selection methods to guide the direction of evolution. TtDE adopts a framework that combines multi-strategies and multi-parameter sets. One set of fixed strategies and parameter selections is the DE/Best/2 and adaptive parameters, and this method can utilize feedback information from individuals in the population effectively. In addition, during population iteration, the mutant subpopulation may contain better information about individuals than the test subpopulation, and the population diversity reasonably avoids early population convergence or stagnation. The proposed TtDE was evaluated on 23 testing functions of the CEC2005 benchmark suite, and the results showed that it is more competitive than multiple classical DE variants.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Focusing on the multi-objective optimization models, in order to solve the location problem of logistics distribution centers more satisfactorily, we propose an improved multi-objective salp swarm algorithm (IMSSA). Based on the original salp swarm algorithm (SSA), the following improvement measures are put forward. Firstly, in order to be suitable for solving multi-objective problems, we introduce the concepts of non-dominated sorting, crowding distance and external archives to update basic SSA into multi-objective SSA. Then, a leader position update strategy is adopted based on the idea of evolutionary game theory. In addition, simulated annealing mechanism is applied to improve the position update strategy of followers. Finally, the proposed IMSSA is employed to optimize the location of distribution centers of a bidirectional two-tier logistics network, and the comparative analysis of the Pareto solution sets verifies the feasibility and effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study addresses the scheduling and assignment issues of UAV nests in the context of electrical power line inspections, characterized by large optimization scales, complex constraints, and multiple objectives. It is proposed a nest scheduling optimization model that incorporates various constraints, and construct an adaptation function aimed at maximizing inspection efficiency and minimizing nest idle time. The particle swarm optimization algorithm is applied to optimize the nest allocation. Simulation experiments demonstrate that the proposed method can effectively managing the variations in UAV numbers and resource constraints, and ensuring efficient resource utilization. The proposed optimization strategy proves to be adaptable and effective across various application scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Influence Maximization (IM) problem aims to find a set of seed nodes to maximize the spread of their influence in social networks. This issue has a very important application background in viral marketing and has attracted extensive research in academia and industry. In order to find nodes with more widely influence and core position in the entire network, this paper proposes a new centrality ranking strategy named Influence Maximization Algorithm based on GC-Centrality. Firstly, GC-Centrality is defined to search for the core links of the network, and then the proposed indicators are used to select seed nodes while suppressing their neighbours. Finally, two experiments were conducted on real datasets, indicating that the proposed algorithm is superior to the comparison algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cooperative air defense deployment is a critical research area in land-based air defense command and control. To address air defense issues in a sector-shaped area, this study discretizes the deployment space and optimizes firepower coverage to achieve uniform distribution of fire units in depth. Fire unit dispersion is introduced to constrain deployment results, establishing an optimization model for cooperative air defense in sector-shaped areas. A multi-population genetic algorithm is employed for optimization, enhancing evolutionary speed and preventing premature convergence observed in single-population evolution. The algorithm and optimization model are validated through specific deployment cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate pose estimation of spacecraft is crucial for in orbit services and space debris cleaning missions. The current mainstream methods are mostly based on keypoint detection. But, accurately detecting the keypoints of spacecraft is a major challenge in complex space backgrounds. To address this issue, we introduced a segmentation mask attention interaction method to reduce the network's excessive focus on background information and enhance attention to the spacecraft subject. Besides, in order to reduce the deviation of keypoint positions, we add a keypoint refinement module, which regresses the offset value between the initial coordinates of keypoint and their true positions, thereby fine-tuning the keypoint coordinates and improving the convergence ability and accuracy of the network. Through these improvements, our method has shown significant performance in accurately detecting keypoints of spacecraft, providing reliable technical support for space exploration missions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Financial news serves as a primary conduit for conveying dynamics in financial markets and changes in economic policies. Fully capturing the information in financial news benefits investors, corporations, and governments in understanding and responding to market shifts. This paper addresses challenges encountered in the sentiment analysis of financial news, including slow model convergence and indirect transmission of long-distance dependent information, by proposing a novel deep learning model designed to effectively capture contextual information in financial news texts. The model incorporates a semantic attention mechanism to preliminarily extract semantic information from the text and learn the correlations between words, thereby enhancing model convergence speed. Considering that the model might lead to overly dispersed attention weights and hinder effective focus on crucial local information when processing long sequences, we introduce a sparse attention module capable of efficiently modeling local dependency relations. Experimental validation on a financial news dataset demonstrates that our model outperforms traditional methods in sentiment analysis tasks in terms of accuracy and generalization ability, confirming the effectiveness of the semantic attention mechanism and the sparse attention module.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the inefficiency, subjectivity, and traffic disruption caused by manual visual identification in current road maintenance department road signs detection, An improved YOLOv8n road signs detection algorithm has been proposed by this paper. The algorithm introduces the concept of dynamic convolution, designing and constructing a C2f-Dynamic module that enhances the parameter amount in C2f using dynamic convolution to strengthen the learning ability of small-scale networks on large-scale data. Furthermore, to strengthen the network's feature extraction capability and reduce interference from complex backgrounds, The LSKA attention mechanism is introduced in this paper. Lastly, To address the original model's limitations in detecting small targets, a P2 detection layer is incorporated into the neck of the network. The results demonstrate that on the CCTSDB-2021 dataset, the enhanced algorithm attained precision, recall, and mAP50 of 88.2%, 72.1%, and 80.1% respectively, which are 0.2%, 2.3%, and 4.1% higher than the original YOLOv8n algorithm, operating at a frame rate of 322.6 FPS. In summary, the algorithm has significantly higher precision than the original algorithm and can satisfies the real-time detection requirements, which is of great practical value for road maintenance departments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, the safety and efficiency of transformer substation operations have gained significant attention. Monitoring personnel behavior within these facilities is crucial for maintaining safety standards and optimizing performance. This paper presents a novel approach to personnel behavior recognition in transformer substations using an improved AlphaPose algorithm. By enhancing the AlphaPose algorithm with additional preprocessing steps, optimized model parameters, and integration with machine learning classifiers, our method achieves superior accuracy and robustness in recognizing various personnel activities. Experimental results demonstrate the effectiveness of the proposed method in real-world transformer substation environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of computer vision and machine learning, behavior recognition technology has become a research hotspot and is widely used in video surveillance, human-computer interaction, virtual reality and other fields. Traditional behavior recognition methods mostly rely on complex pre-processing and manual feature extraction, which has high computational cost and poor robustness. In order to overcome these limitations, this paper proposes a new behavior recognition algorithm based on human feature points. Firstly, the deep learning technology is used to realize the real-time positioning and tracking of the key feature points of the human body through the open pose framework. This technology can accurately identify the 34 key points of the human body under various postures and lighting conditions. Secondly, based on these key point data, we build a lightweight convolutional neural network model to analyze the dynamics of human behavior and capture the spatio-temporal characteristics of behavior. Through training and testing on the public human3.6m data set, our model achieves 92% accuracy in behavior recognition, and can process 30 frames of video stream per second, showing excellent real-time performance. This research not only improves the efficiency and accuracy of behavior recognition, but also provides a valuable reference for future deployment in practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problem of losing the original information of images in traditional image processing methods during denoising, this paper proposes an application method of machine learning algorithm in image processing denoising. This also means that it is necessary to continuously compare the processed image with the original image, which will increase the time required for denoising. However, deep learning techniques have played an important role in this regard. Through intelligent technology, especially supervised machine learning algorithms, information images can be better processed, enabling people to observe things more clearly and take correct actions. Specifically, we can use supervised machine learning algorithms to extract noise during image denoising. This method is based on setting separation nodes for image noise at multiple points, and then using clustering theory to filter out noise, thereby completing image denoising. The experimental results show that using animal images as the test object, we can detect the noise contained in them, and by comparing the effects of the original method and the new method, we found that the new method can obtain results consistent with the original image. More importantly, without introducing image distortion, the new method controls the denoising time to 1.15 seconds, while the original method takes an average of 20.21 to 15.13 seconds. This indicates that the new method can significantly improve the efficiency of denoising and has the potential for practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To tackle the issue of dense IC components that complicate the detection of wetting defects in pin solder joints of CQFP package devices, we propose an improved YOLOv5 algorithm, named YOLO-HP, specifically designed for detecting post-wetting defects in pin solder joints. Firstly, due to the high resolution of the images, an x-large detection layer is added on top of the existing three detection layers to enhance the detection of pin root information. Second, a hybrid pooled attention module is designed to capture both global and local feature map information using multiple pooling methods, thereby improving object detection performance. Finally, the Context Aggregation Block is introduced to aggregate contextual information through attention mechanisms, enhancing the model's perception of the object region. The improved algorithm achieves an average accuracy (mAP@0.5) of 96.9% on a custom pin wetting defect dataset, which is 6.5 percentage points higher than the original YOLOv5s baseline network. This improvement meets the needs of solder joint wetting defect detection and has practical value for future IC soldering defect detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tunnel crack detection is a critical application area in computer vision. However, existing detection algorithms face challenges such as low recognition rates, slow detection speeds and high costs in complex tunnel environments. To address these issues, we propose a fast tunnel crack detection algorithm model based on YOLOv8n, named DFL-YOLO. This model introduces a novel Feature Aggregation Pyramid Network (FAPN) that effectively handles variable textures and improves the detection success rate of small cracks. It also includes a Lightweight Detail-Enhanced Shared Convolution Detection Head (LDSC), which reduces the number of parameters and detection time through shared convolution calculations, thereby enhancing detection efficiency. Additionally, Detail-Enhanced convolutional composition C2f-DEConv module is incorporated into the C2f structure to enhance the crack feature extraction capability of the backbone network and thus improve the robustness. The improved DFL-YOLO model achieves an accuracy rate of 93.1%, a recall rate of 85% and mAP@50 of 91.7% for tunnel crack detection, while its computational intensity is 43.8% lower than that of the original YOLOv8 model. This makes DFL-YOLO more suitable for deployment on resource-constrained detection devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the existing issues of low detection accuracy, susceptibility to environmental interference, and frequent occurrences of missed detections and false alarms in road damage detection algorithms, we propose yolov8n-DS, a road damage detection algorithm based on YOLOv8n. Firstly, to enhance the capability of capturing sparse patterns with large kernels, we improve the Dilation-wise Residual (DWR) module of DWRSeg by leveraging the advantages of DilatedReparamBlock based on UniRepLKNet for feature extraction. We introduce a flexible sampling and feature extraction module, C2F-DRB-DWR. To address the interference of complex backgrounds on damage detection, we adopt Large Separable Kernel-Attention to enhance the original SPPF module's ability to extract global information. In the neck network section, we utilize the aggregation capability within a large receptive field of ContentAware ReAssembly of Features (CARAFE) to improve the neck network's upsampling and enhance computational efficiency. Finally, we introduce a unified Dynamic Head that integrates scale, spatial, and task attention mechanisms to replace the traditional detection head, further improving the network algorithm's generalization capability and overall performance. This paper conducts road damage detection experiments using the aforementioned methods and validates them on the RDD20 dataset. Experimental results demonstrate that the improved algorithm achieves an average precision (map) of 67.1%. The model size is 7.5MB, with a detection speed of 172FPS, showcasing superior comprehensive performance compared to other comparative algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate load forecasting is of great significance for the planning and operation of smart grids. In order to further improve the accuracy of short-term load forecasting, an AGPSO-NARX load forecasting model is proposed, and the input information of the neural network is selected. Firstly, the particle swarm optimization algorithm is optimized by introducing variable weights, variable learning factors, and selection, crossover, and mutation operations, and the initial thresholds and weights of the NARX neural network are optimized based on this algorithm. Secondly, through experiments, suitable influencing factors are selected as inputs to the network to further improve the prediction accuracy. Finally, two scenarios, winter and summer, are considered to test the predictive ability of the neural network model through examples. The results show that the overall error and single-point error of the model are significantly reduced, indicating a high prediction accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Gemmini is an open-source, full-stack DNN accelerator generator. For an DNN algorithm running on Gemmini, traditional optimizations focus on changing Gemmini's hardware configuration (e.g., changing the size of internal storage and the size of the systolic array). This work proposes an an optimization-test iteration route combining Spike simulation and FPGA simulation under the constraints of unacceptable running speed of verilator simulator and limited FPGA resources. According to this route, this work starts from two aspects of hardware optimization and algorithm optimization. The internal storage size and calculation scale of Gemmini are changed in hardware, and the convolutional layer of Resnet50 neural network algorithm is optimized in algorithm. The experimental results show that for Resnet50 DNN algorithm, Gemmini with basic configuration provides 2000x speedup compared with running on CPU. On the platform of Gemmini accelerator, the optimization of hardware configuration provides 1.59x speedup for Resnet50. On the basis of the hardware optimization, the algorithm optimization provides 1.012x speedup.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent advances in deep learning have significantly impacted weather forecasting, where accurate predictions are crucial for effective decision-making across various sectors. This paper presents a novel approach to weather forecasting by introducing an advanced transformer-based model, which harnesses the capabilities of the Focused Transformer (FOT) framework with significant adaptations for meteorological applications. Central to our method is the substitution of the exact k-nearest-neighbor (k-NN) lookup with an approximate k-NN method, enhancing the model’s efficiency and scalability. This modification enables the transformer to dynamically integrate expansive temporal and spatial data sequences more effectively, crucial for accurate weather predictions. We adopt the contrastive training technique to refine the context scaling, which is critical for processing the complex dynamics of weather patterns. Performance evaluations on several weather datasets demonstrate that our model achieves superior forecasting accuracy, indicated by improvements in root mean squared error and mean absolute error metrics, compared to existing state-of-the-art models. Our findings suggest that applying transformer models with approximate k-NN lookups offers a promising direction for developing more robust and efficient weather forecasting systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Network Intrusion Detection (NID) plays a very important role in the field of network security. This method utilizes an information gain-based feature selection approach and applies XGBoost for classification, optimizing model parameters. By conducting feature selection and removing irrelevant features, the computational complexity is reduced, training speed is accelerated, and detection rates are improved. The method was validated using the UNSW-NB15 dataset and demonstrated higher detection rates and F1 scores compared to four other algorithms in the same field. It exhibits high precision, fast convergence speed, and strong generalization performance. Effectively resolving the issue of multi-class imbalance in network intrusion detection data, this method presents a feasible approach for network intrusion detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
During object tracking process, if only the first frame is used as the matching template, changes about the target state will often lead to poor tracking results or even tracking failure of the classic Siamese tracker. To deal with this issue, UpdateNet uses the first frame as template, and regularly updates the template with combination of the previous accumulated template and the current predicted template. However, the combining of template tends to bring in background information which may pollute the template representation. For the purpose of obtaining accurate template and timely sensing the change of target, this article introduces the Squeeze-and-Excitation channel attention and selective mechanism to UpdateNet. The channel attention mechanism can sort the template information spliced by channels by adjusting the weight to highlight important information. The confidence score of the tracking predicted result of the Siamese network is used to determine whether the corresponding frame should participate in template accumulation, and a threshold is set to exclude severely contaminated predicted templates. The article also uses a more detailed parameter adjustment method to enable UpdateNet to achieve convergence faster and be more adaptive. We apply the improved UpdateNet into the DaSiamRPN tracker, and evaluations on the VOT2016 and VOT2018 datasets show that our methods can effectively improve the performance of UpdateNet.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Large knowledge graphs always have the problem of missing relationships or entities in triples, and missing this information can lead to inference failure. To solve this problem, this paper proposes a knowledge graph inference method based on deep reinforcement learning. This method uses the idea of combining reinforcement learning with meta learning contrastive learning to complete the triplet information of the knowledge graph, and optimizes the reward function to improve the model performance. Comparative experiments were conducted on the NELL-995 and FB15K-237 datasets, and the experimental results showed that the proposed model outperformed most previous models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To facilitate automation and intelligent scheduling of pivotal production line processes, a novel adaptive scheduling approach is introduced, leveraging fuzzy logic and reinforcement learning. It identifies key influencing factors— equipment status, raw material availability, order demands—as inputs for the fuzzy logic framework. By translating input values into fuzzy set membership degrees, it addresses production uncertainties. Reinforcement learning constructs an adaptive neural network model for scheduling, enacting actions guided by the strategy's generated instructions. Experimental outcomes demonstrate enhanced production efficiency, optimized resource utilization, cost reduction, and significant economic benefits for enterprises.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of natural language and neural network technology, intelligent satellite constellation management methods will have great research value. Traditional management methods rely heavily on the commanders' proficiency in the command system, and there are problems such as comprehension bias due to insufficient level of understanding, long time of task issuance, which bring many difficulties to the subsequent determination of requirements and collaborative planning, and have limitations such as high computation volume and slow response speed when dealing with large-scale constellations. Using natural language processing technology, a natural language processing based architecture is proposed for large-scale remote-sensing constellation management. On this basis, a neural network-based assisted decision-making algorithm is proposed to quickly provide the commander with selectable satellites, optimize the decisionmaking process and shorten the decision-making time, and assist the commander in monitoring the entire mission process. Simulation experiments show that the algorithm has high computational efficiency while the probability that the recommended satellite can actually perform the task reaches 98.52%, and the architecture proposed in this paper has good application prospects in the management of large-scale constellations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In container cloud platforms, elasticity scaling is a key characteristic. Recently, proactive elasticity scaling strategies have gained prominence. Proactive scaling involves predicting future resource needs ahead of time, allowing for timely preparation such as resource preallocation or instance adjustments before load changes occur, thus mitigating elasticity lag. This approach not only facilitates prompt response to load variations but also significantly boosts resource utilization. However, implementing effective proactive elasticity scaling is no easy feat, with the core challenge being accurate forecasting of future loads. To address this challenge, we devised a forecasting model integrating dilated convolutions and Long Short-Term Memory (LSTM) networks, designed to capture both short-term volatility and long-term dependencies in load patterns. The model employs a dilated convolution residual network to construct a feature extraction component, simultaneously learning short-term volatility, sustained change, and periodicity in load, followed by LSTM to deepen the understanding of long-term temporal dependencies. Experimental results on the Alibaba Cluster showed that our model outperformed comparative models in terms of performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current research on graph neural network-based text classification models faces challenges such as dependence on fixed corpora and overlooking hidden information between text sequences.In order to solve the shortcomings of previous studies, we construct a new graphical neural text classification model, Message passing Gated Model (MPGM).This model features a global graph with parameter-shared nodes for each input text, and leverages gate-controlled recurrent units (GRU) to learn local information within the texts while capturing semantic relationships between contexts. Experimental results on several data sets show that the proposed model is superior to other text classification methods based on graph neural networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a novel phase encoding scheme for spike trains in Spiking Neural Networks (SNNs) inspired by Fourier series and the Discrete Fourier Transform (DFT). The proposed method leverages complex exponential spiking neurons to represent frequency components, allowing for the efficient reconstruction of original time signals. We explore the time shifting property of the Fourier transform to demonstrate how time delays in impulse signals can be encoded as phase shifts in the frequency domain. Detailed mathematical formulations and illustrative examples highlight the relationship between impulse delays and phase patterns in SNNs. The primary objective of this research is to develop a streamlined and computationally efficient SNN architecture, enhancing the training process. Future work will expand this phase encoding method to various sequence patterns, aiming to improve the performance and versatility of SNNs in neuromorphic computing for complex information processing tasks. The results indicate that this approach holds promise for advancing the field by providing a robust framework for precise signal reconstruction and efficient neural network design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As wireless communication systems continue to evolve, the study of Radio Frequency (RF) Power Amplifiers (PAs) has attracted growing interest. Addressing the modeling issues of PAs can provide more convenient research methods and more efficient guidance in fields such as power amplifier predistortion, power amplifier design and evaluation, and the identification of subtle characteristics of RF radiation sources. This paper investigates RF PA nonlinear modeling methods based on neural networks, using BP, Elman, and RBF neural network algorithms to model and simulate five common nonlinear models, namely Saleh, Rapp, memoryless polynomial, Volterra, and memory polynomial models. This study not only demonstrates the performance differences of various algorithms across different models but also provides important references for the selection and optimization of algorithms in practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Machine learning classification models are exceedingly large, complex and lack universality, not only consuming significant resources and posing difficulties in training, but also lacking in complexity, cannot achieve unrestricted classification. The academic community has attempted to solve the problem of unrestricted classification through lifelong learning and meta-learning, but these methods have limitations. The unrestricted classification of images is a complex task, inherently possessing infinite complexity. Consequently, systems designed for unrestricted classification ought to possess infinite complexity as well. Therefore, the current large models without knowledge bases are actually insufficient in complexity for unrestricted classification. This paper addresses the issue of infinite complexity through the utilization of a knowledge base approach. Knowledge bases can be conveniently expanded and simply arranged to achieve infinite complexity. Based on the analogy principle, this article adopts two methods to build two image feature knowledge bases, and demonstrates the application of multimodal image unrestricted classification. This study conducts classification verification on a total of 270,000 images across 56 classes. The accuracy rates of both knowledge base construction methods are over 99%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visibility prediction is crucial for traffic safety and aerospace, especially under conditions of low visibility caused by fog, smoke, rain, and snow, which significantly affect visual range and driving safety. Accurate visibility prediction helps prevent accidents. However, traditional numerical weather forecasts exhibit low accuracy at small scales and are highly sensitive to initial conditions. Existing deep learning methods primarily focus on temporal changes. To address these challenges, this paper proposes the MSC-CGCRN model for visibility prediction, a graph neural network that integrates multi-scale temporal convolution and channel attention mechanisms. By integrating satellite data and ground meteorological station data, this model utilizes multi-source data for predictions. Experimental results demonstrate that the MSC-CGCRN model outperforms other benchmark models, and ablation experiments confirm the effectiveness of its components.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For the past few years, detecting object surface defects using deep learning has become an important tool in industry and a major research area for researchers involved. There is a wide variety of articles on object surface defect detection, this paper will help readers to better specialize in defect detection of optical components. This review focuses on the needs of optical component detection in industry, the acquisition of datasets for different kinds of special optical component parts, and focuses on the use of deep learning algorithms in the domain of optical component surface defect detection, introducing convolutional neural networks, attention mechanisms, self-encoders, and adversarial generative networks, and their applications in optical component defect detection, and listing the performance ratios of the pixel-level models for automated optical detection pixel-level model performance ratios. This review will give the reader a quick and basic understanding of optical component defect detection and the deep learning algorithms used today.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The development of Deep Learning technology has led to a qualitative leap in image recognition. However, the large-scale annotation of data and high-performance computing requirements limit the application of image recognition methods in practical tasks. The research on few shot image recognition emerged as a result. This article mainly discusses the metric learning methods and three network models based on convolutional neural networks, elaborating on their model structures and algorithm ideas. The performance of few shot image recognition under different network models was compared and analyzed through experiments on different datasets. The summary section provides prospects for future research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The relationship between the structures of neural networks and their dynamics is yet not well understood, and studies on this topic are still greatly needed. Changes in the parameters of a network can cause its state to shift between different regimes, and in this work, we studied a neural network of stochastic spiking neurons, whose dynamics change when the control parameter varies. More specifically, as showed by some numerical results, a bistable region appears and eventually disappears when the control parameter increases in an interval, and a critical-like transition appears. However, this kind of transition is always difficult to be detected. Here we used a new approach based on topological data analysis (TDA) that uses superlevel persistence to visualize this transition through a "homological bifurcation plot", which shows the changes of the 0th Betti numbers of the KDE (kernel density estimate) of the activities of the network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The majority of existing studies on dynamic hypergraphs focus on hypergraphs with a constant size but only dynamic hyperedges, yet numerous scenarios necessitate the understanding of a hypergraph's growth. This paper introduces the Variational Growing Hypergraph Learning (VGHL) method, which addresses the limitations of current studies that only consider hypergraphs with fixed sizes and dynamic hyperedges. The VGHL method is designed to simultaneously capture the evolving structure of an existing hypergraph and accommodate the integration of new nodes. The technique involves transforming hypergraph snapshots into line graphs and then adjusting the variational lower bound to facilitate the construction of a hypergraph sequence, which is crucial for downstream classification tasks. The paper demonstrates the efficacy of the VGHL method through experiments on various benchmark datasets, highlighting its potential for semi-supervised classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a complex data environment, many databases need to have efficient access to multi-dimensional datasets. Therefore, it is important to construct a multi-dimensional index that can effectively support the retrieval of highly multi-dimensional datasets. We propose a new multi-dimensional learned index called LDRML-index, which consists of a dimensionality reduction module and a learned index module. The first module contains a local dimension reduction (LDR) component and a Z-address calculation component that processes the data to make it applicable to later module. The second module includes a learned index component and a dynamic learned index framework, enabling the learned index to better adapt to the data distribution and improve query efficiency. The experimental results show that LDRML-index can effectively reduce space consumption and improve query performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Graph data is a natural form of unstructured data, with nodes and edges representing entities and the relationships between them respectively. This type of data can be directly applied to various real-world scenarios. Graph Neural Networks (GNNs) have a natural advantage in handling graph-structured data, effectively processing the hierarchical structure of graphs to extract rich information. However, current GNNs have limitations in dealing with local information. In this paper, we discover a new attention-based network module that outperforms general GNNs. To better learn global-local information and overall hierarchical information, we propose a Dual-Branch Subgraph Pattern Attention Graph Neural Network (DB-SP-GAT). Experimental results demonstrate that DB-SP-GAT achieves superior link prediction performance across five benchmark datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning methods provide strong support for side channel analysis, and a large number of research results prove the advantages of this method in the field of side channel applications. Using deep learning for side-channel analysis offers advantages that traditional methods lack, such as its ability to counter certain protective measures like masking, and it doesn't require complex feature extraction processes. These advantages have made deep learning methods the primary tool for side-channel analysis. However, this does not mean that the use of the method is without drawbacks. One of the biggest difficulties is to find the appropriate hyperparameters for the neural network to bring out the best performance of the side channel analysis. In this paper, we propose an automatic hyperparameter tuning method for deep learning based on Hyperband in the field of side-channel analysis. This method can speed up the hyperparameter search by adaptive resource allocation. Experiments show that regardless of the type of leakage model and neural network, the hyperparameter optimization scheme in this paper performs well. Compared with Bayesian optimization and random search, the proposed scheme has better tuning performance and saves more than half of the tuning time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the issue of low detection accuracy in existing network intrusion detection technologies caused by insufficient learning of data feature information, this paper proposes a network intrusion detection model based on Convolutional Autoencoder (CAE), Residual Network (ResNet), and Bidirectional Long Short-Term Memory (BiLSTM) networks. This method first performs preliminary feature extraction using a CAE, then enhances feature processing capabilities with ResNet, and finally captures time series information with BiLSTM. Classification is carried out through a fully connected layer and an output layer. The validation results on the NSL-KDD dataset and UNSW-NB15 dataset show that the model achieved accuracy rates of 99.34% and 82.47%, with false positive rates of 0.376% and 6.088%, respectively. The experimental results indicate that the proposed model effectively addresses issues such as low detection accuracy caused by insufficient learning of data feature information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Network intrusion detection has always been a pivotal area in the field of network security. This paper proposes a novel network intrusion detection model that integrates the K-means algorithm with multiple convolutional neural networks (Multi-CNNs). This model comprises two main modules: network attack learning and network attack recognition. The network attack learning module focuses on learning known traffic features. Initially, the K-means algorithm is utilized to cluster the traffic features. Subsequently, Multi-CNNs are constructed, each tailored to extract the spatial features of the data within its respective cluster. Through supervised learning, a corresponding relationship is established between the inputs and the attack patterns. The output of this module includes the K-means clustering centers and the trained CNN models, which serve as the foundation for network attack recognition. The network attack recognition module is responsible for detecting unknown traffic. Firstly, the unknown traffic features are assigned to their respective clusters according to the distance from them to the K-means clustering centers. Then, Multi-CNNs are employed to identify the attack pattern for each clustered feature independently. Finally, the detection outcomes from the Multi-CNNs are fused to achieve intrusion detection. The experimental results demonstrate that the integration of K-means clustering and Multi-CNNs can significantly enhance the performance of network intrusion detection. Specifically, for the UNSWNB15 and CICIDS2017 datasets, the proposed model achieves an improvement in F1-score by 11.11% and 0.27%, respectively, compared to the baseline model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recommendation systems (RS) can significantly boost the profitability of e-commerce platforms. Traditional RS methods rely on users' historical behaviors and product attributes but struggle with data sparsity, cold start problems, and complex user-item interactions. Graph neural networks (GNNs) have become popular in RS due to their ability to capture complex relationships and features within graph-structured data, improving accuracy and robustness. This paper proposes a model that combines product review information with GNNs to enhance recommendations. We design a heterogeneous graph neural network with multi-head attention integrating review information (HG-MHAR). This model constructs a user-item heterogeneous graph with ratings as edges and includes review information to build user and item feature embeddings. The model has two main components: the review feature aggregation graph learning module and the multi-head attention graph learning module. These components capture multiple relational patterns in the graph in parallel, improving the stability and robustness of feature representations. Experiments on five Amazon datasets show that HG-MHAR outperforms benchmark models, improving nearly 4% over traditional methods. This demonstrates the potential of GNNs in recommendation systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In computer vision, animals' health and physiological conditions can be analyzed by monitoring their behaviors. However, the utilization of accurate animal behavior recognition models frequently involves significant computational complexity. By optimizing the learning rate, one can prevent changes in model complexity. However, when using the SGD (Stochastic Gradient Descent) optimizer with faster training speed, there is a risk of being stuck in a local optimal solution. More concretely, this paper introduces a novel algorithm called Variable-Cycle Cosine Annealing Decay (VC-CAD) that addresses the issue of the model getting stuck in a local optimal solution. Additionally, this paper introduces a novel concept of efficiency value by integrating training cost and detection accuracy. It also formulates the efficiency equation of recognition to evaluate the recognition efficiency of the model. The experimental results demonstrate that the integration of VC-CAD with YOLOv5s and YOLOv5m resulted in an increase in mAP (mean Average Precision) of 1.7% and 1.6% respectively. The performance of YOLOv5s and YOLOv5m was improved by 3.4% and 0.8%, respectively, through a comparative analysis of different models with varying learning rates, thereby validating the performance enhancement achieved by the VC-CAD algorithm within the YOLOv5 framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
China is a country with a large population. Ensuring food security is related to China 's national economy and people's livelihood and social stability. Wheat is the most widely planted, the largest area and the most productive food crop in the world. The timely estimation of wheat yield has a significant impact on crop production, food prices and food security. Wheat yield is one of the important indicators to evaluate agricultural productivity. In view of the difficulty of manual estimation of wheat yield, it is proposed to apply convolutional neural network to wheat yield estimation, so as to provide reference for agricultural productivity estimation and guide agricultural production management decision-making. In this paper, Anhui Shuyu Ecological Farm and Changfeng Lixin Family Farm were selected as the research objects, and the wheat distribution map of the farm was obtained by using the convolutional neural network. It is estimated that the annual output of the farm in 2021 will be 317065kg and 790210kg, respectively, and the statistical data of 333750kg and 858920kg provided by Anhui Shuyu Ecological Agriculture Co., Ltd. and Changfeng Lixin Family Farm. The error is 4.9 % and 7.9 %, respectively. which verifies the effectiveness of the estimation method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, there have been significant advancements in path planning and obstacle avoidance for robotic manipulators. This paper introduces a novel approach to path planning for robotic manipulators by employing the Deep Deterministic Policy Gradient (DDPG) algorithm. The manipulator model is developed using SolidWorks software and integrated into the Simulink environment, incorporating two spherical obstacles. The experimental results demonstrate the efficacy of the DDPG algorithm in navigating complex environments and avoiding obstacles. The proposed method is validated through extensive simulations, demonstrating enhanced path efficiency and collision avoidance compared to traditional approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The physical fitness test scores of college students are an important means of understanding their physical health. Accurately and effectively modeling, predicting, and analyzing physical fitness test scores can help grasp the changing trends of college students' physical fitness, and thus better arrange their physical exercise. In this article, deep neural networks are applied to predict and analyze the physical fitness and health test data of college students in a certain vocational college in 2023. The experimental results show that compared with methods such as Radial Basis Function Neural Network, BP neural network, Softmax neural network, this prediction model has better prediction accuracy and related performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we aim to explore a method to predict mental health issues among vocational college students based on big data and deep learning. We collect various historical data, including student psychological survey data, social support statistics, and personal basic information. By utilizing data preprocessing and deep learning techniques, we employ a BP neural network to consider the heterogeneity of the data and explore potential correlations among these factors. Ultimately, accurate and in-depth predictions of psychological issues among higher vocational students are achieved, providing effective support for their mental health management.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, automated tomato picking methods have improved production efficiency, but it is still unavoidable for unripe and rotten tomatoes to be mixed in during the picking process, leading to a certain degree of resource waste. Therefore, it is necessary to effectively identify the ripeness of tomatoes before picking in order to select those with appropriate maturity. But challenges arise from varying geographical conditions, diverse planting technology, and data privacy concerns of data owners. Thus, this research endeavors to devise a strong federated learning framework with the intention of addressing the data silo issue and identifying tomato ripeness across various fields and regions. In this research, we assessed the capabilities of multiple pre-trained deep learning frameworks by utilizing a dataset specifically for tomato ripeness classification. The experiment simulated a environment with varying client sizes, ranging from 3 to 9. The study emulated a federated learning setup with clients varying in number from 3 to 9. Upon analyzing the outcomes, it was discovered that InceptionNet outperformed the others, attaining a remarkable success rate of 97.85% in determining tomato ripeness levels. This investigation illustrates that federated learning has the potential to improve the precision of tomato ripeness identification, providing significant information for the enhancement of agricultural methodologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identifying and pinpointing objects within images or videos is a core objective in the field of computer vision, known as object detection. This study delves into the most recent advancements in object detection, with an emphasis on four primary approaches: Two-Stage Detectors, One-Stage Detectors, Anchor-Free Detectors, and Transformer-based Detectors. Each approach possesses distinct strengths and weaknesses, and the selection is dictated by the specific needs of the application. As research progresses, the techniques for object detection are advancing in precision and efficiency, while also expanding their ability to manage a wide array of object categories and scenarios. These advancements are pivotal in numerous domains, such as autonomous vehicles, security monitoring, and medical diagnostics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most existing binocular stereo matching algorithms require a trade-off between accuracy and speed, unable to achieve both simultaneously. One reason lies in the complexity and variability of scenes that stereo matching tasks must handle, where disparities in heavily textured, weakly textured, and occluded areas are often difficult to infer correctly. Therefore, this paper proposes the Learnable Upsampling Bilateral Grid Refinement for Stereo Matching Network (LUGNet). Through learnable bilateral grid upsampling guided by the left image, LUGNet calculates offsets for cost volume upsampling, while simultaneously leveraging the network to automatically learn interpolation weights to accommodate features of different datasets. Ultimately, LUGNet achieves error rates comparable to high-precision networks with a parameter count of 2.6M and an inference time of 58ms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the excessive selection for body weight (BW) and breast muscle weight (BMW) in broiler chickens, the incidence of leg diseases has gradually increased, potentially leading to severe mortality, decreased productivity, and restricted growth. To address this issue, the yolov8 algorithm has been introduced, which demonstrates significant improvements in both speed and accuracy compared to other algorithms. However, due to the high degree of integration of the yolov8 algorithm, training and inference need to be conducted through command-line interfaces, posing significant challenges for developers. To address this challenge, this study designs and develops a graphical user interface (GUI) for yolov8, aiming to facilitate developers in completing yolov8 inference tasks more conveniently and thereby improving development efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Big Language Modeling and Information Processing Technology
Open Information Extraction from Semi-Structured Web Pages plays a pivotal role in constructing knowledge graphs and is a hot topic in the field of information extraction. Unlike traditional closed information extraction from semi-structured web pages, open information extraction can extract triples that do not adhere to predefined ontological relationships. Typically, open information extraction from semi-structured web pages involves first extracting relationships, followed by the corresponding entities, a pipeline approach that often leads to error propagation. To address this, our study introduces an end-to-end method for open information extraction from semi-structured web pages. This method models the extraction task as a cascading labeling task and employs a joint decoder to simultaneously extract both relationships and corresponding entities. The model was trained on websites from three domains: movies, universities, and NBA players, using an extended SWDE dataset. Experiments in zero-shot and few-shot extraction demonstrate that our method outperforms baseline models in open information extraction tasks across all three domains, as evidenced by higher F1 scores. Analysis of the results indicates that the proposed method not only maintains high performance but also exhibits robust generalizability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In view of the needs of human resource information management, this paper carries out data analysis based on the relevant data of XX enterprise human resource platform, and builds a human-post matching model combined with big data algorithm model technology. First of all, the unstructured data is standardized, and then the multi-attribute cross-feature calculation of personnel is realized to reduce dimension, and the historical data is analyzed to construct combined features, and the potential connection of features is explored by combining big data analysis technology. The evaluation index system of person-post matching was constructed, and the indexes at each level were modified, and relevant analysis such as correlation analysis and principal component analysis was carried out to finally determine the evaluation index of this study. To achieve a more scientific and efficient evaluation and control of talents, to ensure that the human resources evaluation work is targeted and efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Our research is based on the XGBOOST algorithm, which aims to build a model that can provide accurate early warning for traffic accidents. Our model incorporates machine learning algorithms and deep learning frameworks with the aim of providing effective early warning information by accurately predicting road traffic accidents.XGBOOST, as an efficient implementation of gradient boosting decision trees, dramatically improves the model's generalization ability and training speed through regularization techniques and cluster parallelization. The model parameters are tuned using the GridSearchCV exhaustive search method to ensure the optimal parameter configuration. In the experiments, the early warning model is constructed and optimized by classifying the severity of accidents and fusing multiple types of data using Yulin traffic accident data from 2015 to 2021. Eventually, the model performed well in terms of classification accuracy and stability, especially in recognizing major accident categories, with significantly higher recall and F1 scores. The study in this paper shows that the application of XGBOOST model in traffic accident early warning has high predictive performance and practical value, which provides important technical support for road safety.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to improve the efficiency of data management, ensure data security, and optimize resource allocation, a multi-dimensional power supply chain data classification method based on trusted similarity is proposed. Calculate the credibility of the data type and the credibility of the data score, taking into account both the credibility of the data type and the credibility of the score, to obtain the credibility similarity of the data. Using the random forest algorithm combined with trusted similarity for data classification, in the random forest algorithm, trusted similarity is added as an additional feature to the dataset. This feature can help algorithms better understand the relationships between data. In addition to serving as a feature, trustworthy similarity can also be used to adjust the weights of other features. Optimize and further adjust the linear function through cross validation to achieve the best classification performance. The experimental results show that the proposed method has a low information loss rate, high classification accuracy, and low time consumption in data classification, indicating that it can provide strong support for the optimization and management of the power supply chain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Text classification has always been an important task in natural language processing (NLP). In recent years, by mapping text data to graph structures, researchers can perform more complex data analysis to improve classification accuracy. However, existing methods have two major limitations: First, they rely on constructing corpus-level graph structures and use fixed-weighted edges, which limits the expressibility of edges; Secondly, these methods usually create edges only by capturing the co-occurrence relationship between words, unable to capture the deep semantic information between words, and ignore the context information to a certain extent. To overcome these limitations, we propose a Text Classification Based on Subgraph Fusion Graph Convolutional Network (TF-GCN) text classification model. The model divides the training set into k subgraphs according to class, and constructs the word co-occurrence graph of each subgraph and the semantic relation graph of subgraph to capture the context information. By fusing these subgraphs and applying the learning weight adaptation method, a more fine-grained subgraph feature representation is realized. Eventually, these subgraphs are combined into one comprehensive large graph for representation. Experiments on four benchmark datasets show that our proposed text classification method has advantages over existing techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to meet the diverse needs of user-side flexible resources and ensure interconnection, a flexible resource scheduling method based on differential expansion and differential translation is proposed. Analyze the reliability of information sources, obtain the confidence interval of resource scheduling, select the cluster head node, ensure the optimal position of other nodes in the same cluster, and extract the flexible resource characteristics of the user side; Designing a resource allocation mode, and according to the mode, constructing a flexible resource allocation index on the user side; The flexible resource scheduling steps of user side are analyzed under differential expansion and differential translation respectively. The experimental results show that the throughput of this method is always above 0.9Mbps, and the highest resource utilization rate is 99.65%, which can realize load optimization and better scheduling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The number of surface defects in steel has consistently been a pivotal criterion for evaluation in the context of steel production. Conventional detection techniques, such as the strobe method, are deficient in several respects, including a slow response time and a lack of precision. Nevertheless, object detection technology based on deep learning can effectively address these issues due to its robust real-time performance and high accuracy. The proposed method capitalizes on the rapid and efficient characteristics of the YOLO model, integrating a multi-scale feature extraction module to fuse feature maps of varying scales, thereby enhancing the detection capabilities for a spectrum of steel surface defects. Specifically, the convolutional neural network layer in the YOLO model is employed to extract multi-scale features in the image in a stepwise manner. These features are then integrated together through a feature fusion strategy, thereby facilitating the accurate identification and location of steel surface defects. Experimental analysis demonstrates that the proposed method is markedly superior to the traditional detection method in terms of detection speed and accuracy, and can effectively enhance the performance of steel surface defect identification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Time series data usually have complex dynamic characteristics, and it is difficult for a single prediction model to fully capture the various patterns contained. To alleviate the issue, different models obtaining different features, we propose a novel mode called MSMA having multi-submodel to capture complex dynamics and employ the attention mechanism to fit the weights. Experimental results indicate that the proposed MSMA surpasses the state-of the art model in terms of performance for time series prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Addressing the accuracy issues in detecting helmet-wearing personnel in power operation scenarios, there are challenges posed by safety helmets, which are small targets with features like reflection and shadow, resulting in limited available features and susceptibility to environmental interference. Several factors contribute to the challenge faced by detection models in accurately pinpointing and identifying whether personnel are wearing helmets correctly. To tackle these challenges, this paper introduces an enhanced helmet detection algorithm, utilizing YOLOv8n as its backbone. Firstly, Introducing a Global Attention Mechanism (GAM) into the YOLOv8n detection network, improving deep neural network performance, minimizing information loss, and enhancing global feature interaction. This improvement aids in capturing crucial features of targets, especially small targets like helmet straps. Secondly, the introduction of the P2 small object detection layer enables YOLOv8 to more effectively detect small target objects. Consequently, the model demonstrates enhanced feature extraction and adaptive generalization abilities for small targets exhibiting various deformations. Through the dedicated P2 layer, YOLOv8 can detect and locate small targets more sensitively, thereby enhancing the accuracy of small object detection. Experimental results demonstrate that the improved YOLOv8n achieves an average precision mAP0.5 of 87.5%, representing a 6.8% improvement over the original YOLOv8n. This enhancement effectively boosts the accuracy of helmet detection
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most existing medical image fusion methods, which are based on deep learning achieve satisfactory results by using complex network architectures and stacking numerous modules. However, these methods often overlook the application scenarios of multimodal medical image fusion. The complex model structure and large parameter load make deployment on mobile devices extremely challenging. Moreover, it is unreasonable to consume substantial computational resources at the low-level image processing stage if the method is to be applied to downstream computational tasks. We have innovatively designed a lightweight multi-branch feature fusion network for multimodal medical image fusion. This method has a lower parameter count and extremely fast forward inference speed. This is due to our designed multi-branch feature channel segmentation method, which divide-and-conquer feature extraction with different receptive fields. We also use a channel attention mechanism based on contrast awareness to fuse and reduce the dimensionality of the feature maps, preserving ample source image information while reducing computational load. Finally, the fusion image reconstruction is completed through a sliding window attention mechanism combined with long-range feature dependencies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-focus image fusion is the process of synthesizing multiple images with different focal points into a single image with higher clarity and a wider depth of field. However, existing methods face challenges when dealing with complex scenes, such as information loss, insufficient extraction of image details, and inaccurate fusion. Therefore, this paper proposes a multi-focus image fusion method based on hybrid feature extraction and interactive fusion. By enhancing the network's ability to capture image details through mixed extraction blocks, feature information is effectively extracted. Subsequently, interactive fusion blocks enhance the interaction between features from different sources, making feature fusion more natural and effective. Finally, a feature reconstruction decoder is designed using convolutional layers and reverse bottleneck structures to effectively avoid information loss and maintain the richness and integrity of features. Extensive experiments indicate that, compared to 10 other state-of-the-art methods, the proposed approach in this paper exhibits superior performance in multi-focus image fusion tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The ocean environment is a time-varying system function, and the instability effect of single-frequency signal in the channel limits its application. However, the linear frequency modulation (LFM) signal has a certain bandwidth. Even if the loss of a frequency point is too large, the other frequency points are still returned by the echo energy, which can overcome the limitation of the instability effect of single frequency. When affected by Doppler in the ocean, it may cause the peak value to drop and widen. The hyperbolic frequency modulation (HFM) signal has Doppler invariance. Therefore, this paper uses the HFM signal for the target echo simulation and construction, which has a multi-highlight structure. This paper uses the up-sampling and down-sampling methods to achieve accurate time-delay. The situation of target without Doppler and different sizes of Doppler and different depths of the target are made, and the results of the target multi-highlight are proposed. This method can be useful in engineering application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent and connected vehicles (ICVs) collect vast video footage, inadvertently capturing privacy-sensitive data. To address this, we introduce the IDD-ICV dataset, featuring diverse in-vehicle camera footage. To protect privacy, we propose a styleGAN3 based framework that integrates MHA (Multi-Head Attention) for detecting and desensitizing facial data in datasets. This approach ensures compliance with data security standards and enhances ICVs' privacy protection capabilities during the life cycle of data. Our study contributes to the development of secure and trustworthy autonomous transportation systems by demonstrating the effectiveness of our framework in safeguarding privacy-sensitive information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual object tracking is a critical and complex task in computer imagery. However, most Transformer-based tracking models do not consider the characteristics of tracking images, often being affected by various noises in video sequences, leading to boundary box jitter and inaccurate predictions. To mitigate these challenges, this work presents a robust Transformer tracking technique that combines non-local means denoising technique and a balanced window penalty function. The non-local means denoising technique is introduced during the feature extraction network stage as a preprocessing step, removing noise while preserving image details as much as possible, thus providing cleaner and more reliable inputs for subsequent tracking tasks. Additionally, this paper introduces the balanced window penalty function, which enhances the accuracy of target position control by adjusting the weight distribution between the inside and outside of the window. Experiments on the OTB100, TNL2K, UAV123, GOT-10k and LaSOT_ext datasets validate that this method significantly improves tracking accuracy and success rates in various complex scenarios. Especially in challenging scenes involving fast motion, occlusion, and cluttered backgrounds, this method exhibits strong robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
YOLOv8 is one of the most commonly used object detection algorithms. However, its network model has a large number of parameters, resulting in slow performance on embedded devices. One of the challenges in industrial applications of this algorithm is reducing the parameter size of the YOLOv8 model without significantly compromising its detection accuracy, thus enabling it to run efficiently on embedded devices. To address this, a structured pruning strategy based on Torch-Pruning was designed specifically for medium-sized YOLOv8 models like YOLOv8m.In this study, the model was trained on the COCO dataset, resulting in a computational workload of 39.6G and a parameter count of 25.9M. Thirteen pruning iterations were conducted with different pruning rates to systematically reduce the model's parameter count and identify the optimal pruned model. Comparative analysis with the unpruned model showed promising results: the computational workload decreased from 39.6G to 33.7G, a reduction of 14.9%; the parameter count decreased from 25.9M to 22.0M, a reduction of 15%; the average precision improved from 0.6 before pruning to 0.7 after fine-tuning the pruned model parameters; and the inference time per image decreased from 10.2ms before pruning to 9.5ms afterward.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traffic forecasting is challenging due to the complexity and variability of traffic systems. Multigraph spatiotemporal graph neural networks improve accuracy by capturing spatiotemporal correlations but struggle with nodes having few neighbors. We propose a pretrainable local augmentation module (LAM) to address this, capturing long-term historical data and using adjacency relationships to enrich node features. We also introduce a clustered feature correlation graph to uncover hidden correlations. Experiments on the METR-LA dataset show significant error reductions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study proposes a multi-view projection synthesis model, named SynNet3D, to improve the quality of sparse-view low-dose cone-beam computed tomography (CBCT) images. Based on an attention 3D Res-UNet architecture, the proposed SynNet3D model generated projections of missing views in a single stage. The full-view projections were restored from the 1/4 sparse-view projections using the SynNet3D method, and subsequently reconstructed using the Feldkamp-Davis-Kress (FDK) technique. Quantitative metrics includes pixel-level evaluation metrics, such as root-mean-square error (RMSE) and peak signal-to-noise ratio (PSNR). It also includes two metrics for evaluating the perceptual quality of images from the perspective of the human visual system, such as structural similarity (SSIM) and feature similarity (FSIM). On the test set, the SynNet3D model achieved averaged RMSE, PSNR, SSIM and FSIM values of 0.0416, 43.4 dB, 0.975 and 0.993 in the image domain, respectively. The SynNet3D model outperformed the existing single-view projection synthesis model, SynCNN, across all metrics. In conclusion, the SynNet3D model can effectively enhance the performance of sparse-view CBCT images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate and reliable forecasting of ocean current is important for the study of ocean activity. Aiming at the nonstationary and nonlinear characteristics of ocean current, a combined forecasting model of ocean current based on improved variational mode decomposition and least square support vector machine is proposed. Aiming at the problem that decomposition level and penalty factor of variational mode decomposition (VMD) must be set manually, VMD based on love evolution algorithm (LEA) is proposed, named LVMD. Aiming at the problem of solid randomness in the selection of penalty coefficient and kernel penalty parameter in least squares support vector machine (LSSVM), LSSVM based on LEA is proposed, named LEA-LSSVM. First, use LVMD to decompose ocean current data, and obtain intrinsic mode functions (IMFs). Then, use LEA-LSSVM to forecast IMFs. Finally, superimpose forecasting results of IMFs to obtain the final forecasting result. Ocean current data from January to February, 2022 in National Ocean Science Data Center are selected as experimental data, and seven comparison models are selected to compare with the proposed forecasting model. The results show that RMSE, MAE, MAPE and R2 are 0.6326, 0.4881, 0.0111 and 0.9990, respectively. The indexes of the proposed forecasting model are all better than those of the comparison model. It proves that the proposed forecasting model has higher prediction accuracy and provides a new method for ocean current forecasting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Salinity prediction in salt marsh wetland contributes to ecological conservation and management, guides agricultural and fishery activity, and promotes sustainable development of wetland ecosystem. Aiming at the characteristics of wetland salinity, which is nonlinear and susceptible to seasonal influence, prediction model of salt marsh wetland salinity based on improved variational mode decomposition and least square support vector machine is proposed. Aiming at the problem that the penalty factor and the number of decomposition layers of variational mode decomposition (VMD) must be set manually, VMD based on red-tailed hawk algorithm (RTH), named RTH-VMD, is proposed. Aiming at the problem of selecting the kernel function parameters and penalty factor of least square support vector machine (LSSVM), LSSVM based on red-tailed hawk algorithm (RTH), named RTH-LSSVM, is proposed. Firstly, decompose salt marsh wetland salinity by RTH-VMD to obtain several intrinsic mode functions (IMFs). Then, predict each IMFs by RTH-LSSVM and each IMFs prediction error by autoregressive integrated moving average (ARIMA). Finally, reconstruct the prediction result and error prediction result to obtain the final prediction result. Considering geographical location and data accuracy, the salinity data of natural and restored wetland stream in Cape Cod, Massachusetts in 2019 are selected for forecasting experiment. Experimental result shows that the performance of the proposed prediction model is significantly better than that of other prediction models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a result of driver’s visual fixation behaviours, experienced drivers are able to selectively focus on specific areas or objects within the scene, ensuring the safe performance of driving tasks. Therefore, modeling driver’s visual fixation behaviours is crucial for the development of autonomous driving systems (ADS). Research indicates that driver’s visual fixation is determined by both top-down and bottom-up mechanisms. This paper proposes a driver’s visual attention model that incorporates both top-down and bottom-up mechanisms. We consider expectancy, effort, and value as top-down factors and salience as a bottom-up factor. The DFF model has been developed with the objective of integrating the top-down factors, while existing models are used to represent the bottom-up factors. Subsequently, a fusion strategy is employed to integrate all features, thereby generating a driver’s visual attention map. The proposed model was trained with the DR(eye)VE dataset and subsequently compared with state-of-the-art (SOTA) models. The results demonstrate that the proposed model exhibits enhanced performance in terms of the Pearson correlation coefficient (CC) and similarity (SMI) metrics, with an improvement of 22.4% and 26.7%, respectively. Furthermore, the trained model was tested on the BDDA dataset to assess its cross-dataset generalisation capability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As the global population ages, neurodegenerative diseases such as Alzheimer's disease pose a serious threat to the health and quality of life of the elderly population[1].Studies have shown that shrinkage of hippocampal volume is closely associated with the emergence of diseases such as Alzheimer's disease, mild cognitive impairment and temporal lobe epilepsy. Therefore, accurate segmentation of the hippocampus has become a critical step in the diagnosis and study of these diseases. Aiming at the segmentation difficulties caused by the characteristics of the hippocampus, such as irregular shape, small volume and fuzzy edges, this paper proposes a deep learning-based hippocampus segmentation method.This method combines sequence learning and U-networks, and proposes a module based on multiple attention serial mechanisms (MAST), which incorporates the dependency information between image sequences into a 3D semantic segmentation network by introducing sequence learning to fully utilize the 3D contextual information of images. In addition, for the sample balancing problem, this paper incorporates a multi-layer decoupling mechanism (MLDM-multi-layer decoupling mechanism) in the jump-connection stage to improve the segmentation effect. Experiments were conducted on the Task04_Hippocampus dataset to verify the performance and stability of the method. The results of the comparison experiments with normal networks show that the introduction of sequence learning structure significantly improves the segmentation effect. All in all, the hippocampus segmentation method proposed in this paper not only improves the segmentation accuracy, but also provides strong support for the clinical diagnosis of neurodegenerative diseases. Future work will further optimize the algorithm to improve the segmentation efficiency and explore its potential application in the diagnosis of more diseases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Solely relying on textual data for sentiment analysis often fails to fully capture users' emotional information, limiting the accurate understanding of audience true feelings and comprehensive evaluations. Various modal information in multimodal data contains richer emotional cues, aiding in a more precise comprehension of audience reactions across multiple sensory dimensions. To better utilize multimodal data, this paper proposes a sentiment analysis approach integrating feature interaction and gate mechanisms. Initially, textual and acoustic features are interacted to capture dependencies through a neural network model. Subsequently, acknowledging the varying importance of text and audio in multimodal fusion, cross-modal interaction and gate mechanisms dynamically adjust weights during fusion. Finally, integrating interaction features, cross-modal attention mechanisms, and gate mechanisms enhances sentiment analysis capabilities. The model is tested on the CMU-MOSI dataset and shows superior performance compared to most advanced methods, significantly enhancing sentiment analysis results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic speaker verification (ASV) has become a widely used application of deep learning. However, many early positive findings were rooted in traditional methods like Gaussian mixture models (GMM) rather than deep learning. While the multi-taper spectrum estimator has proven effective in enhancing GMM-based ASV accuracy, the integration of traditional multi-tapers with modern deep learning models may not be seamless. To address this, we introduce the Channel Attention Concatenation Multi-taper Fbank (CCM-Fbank), which seamlessly integrates multi-taper spectral estimation with the popular ECAPA-TDNN model, resulting in improved accuracy and robustness. Additionally, we propose a deeper model named Double Block ECAPA-TDNN, which has just over half the number of parameters of ECAPA-TDNN (C=1024), and performs better with limited training samples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Small target detection is a crucial and difficult task in computer vision, which faces difficulties such as low resolution, low pixel share of objects, and the complexity of feature extraction. To alleviate the problems mentioned above, this paper proposes a small target detection network called CM-YOLO. It uses the one-stage detection algorithm YOLOv8s as the basic framework to make a series of improvements, which include: designing the contextual feature enhancement module(CFEM)that incorporates an efficient multiscale attention mechanism to enhance the capability to extract features of small targets; proposing the multi-level feature fusion structure (MFS) to improve the capability of multi-scale feature fusion through the bi-directional multiscale feature propagation and weighted fusion mechanism. To demonstrate the effectiveness of CM-YOLO, this paper conducts extensive experiments on VisDrone 2019 and PASCAL VOC datasets, and the experimental results show that CM-YOLO has obvious performance improvement in small target detection relative to most existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In practical production processes, wearing a safety helmet is a crucial guarantee of safe production. Helmet detection can effectively reduce the probability of safety accidents. However, existing algorithms often suffer from large parameter quantities, high computational complexity, and poor real-time performance. To address these issues, we propose a lightweight helmet-wearing detection algorithm based on the YOLO v8 framework, named LCA-YOLOv8. We adopt a lightweight CCFF module to reduce the parameters of the neck layer and use the ACMix module to increase network depth and enhance the neural network's ability to capture both shallow and deep semantic information, thereby improving feature representation capability. Experimental results show that on the public SHWD dataset, compared with YOLO v8 baseline, our algorithm not only reduces the parameter count by 30.4% and the floating-point operations by 13.6%, but also achieves a precision of 92.7%, effectively balancing high performance and real-time requirements to meet the needs of actual safe production.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of game theory, rationality has traditionally been equated with the maximization of expected payoffs. However, in real-world scenarios, players often have different degrees of risk aversion, requiring a more suitable approach. This paper proposes the novel concept of leafset according to the concept of information set. Furthermore, we introduce Risk-Averse Equilibrium (RAE), a equilibrium that accounts for players’ risk preferences by considering both expected utility and variance. We prove the existence of RAE and modify the utility function. Based on the Counterfactual Regret Minimization (CFR) algorithm, we introduce a variant of CFR called CFRRA (Risk-Averse) algorithm, which extends the traditional CFR to address the risk in the extensive-form games (EFGs) by the risk utility function. We show CFRRA converges to approximate RAEs through experiments in different EFGs such as Kuhn Poker and Leduc Poker, highlighting its potential in managing risk in multi-agent systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Utility boilers are important equipment in the thermal power generation industry. Ensuring their safe and stable operation is of great significance to social and economic development. The utility boiler system has complex components and numerous management processes. Massive multimodal data will be generated during operation and inspection of utility boilers. However, for unstructured information, traditional data analysis methods are not applicable, and it is necessary to rely on knowledge graph technology to fully mine and utilize its data value. This paper research on the named entity recognition technology in utility boiler domain, which is the core technology for establishing knowledge graph in utility boiler domain. Since utility boiler is a highly technical professional domain, general datasets cannot meet the requirements of model training. A special dataset and entity labeling system in utility boiler domain is established. At the same time, an improved CRF-BiLSTM-BERT model is proposed, which consists of three-layer structure, to efficiently implement named entity recognition in utility boiler domain. After experimental verification, the accuracy of entity recognition can reach more than 90%. In summary, the named entity extraction model and data-set, which is highly adaptable to utility boilers, established in this paper can meet the data requirements for the subsequent establishment of professional knowledge graph in utility boiler domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problem of insufficient samples of near-shore synthetic aperture radar data in ship detection, a ship synthetic aperture radar (SAR) image data augmentation model based on generative adversarial network was designed in this study. Specifically, this study combines image fusion and data enhancement to design an Image Fusion Concurrent-Single-Image-GAN model (IF-SinConGAN).This model first fuses offshore ship images with nearshore scenes, employing a dual-threshold sea-land segmentation method to seamlessly integrate offshore ships into nearshore water regions. These fused images are then used as input for training the ConSinGAN model. Compared to the original model, IF-SinConGAN significantly improves both the diversity and quality of generated SAR images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Neural language representation models such as GPT, pre-trained on large-scale corpora, can effectively capture rich semantic patterns from plain text and be fine-tuned to consistently improve natural language generation performance. However, existing pre-trained language models used to generate lyrics rarely consider rhyme information, which is crucial in lyrics. Using a pre-trained model directly results in poor performance. To enhance the rhyming quality of generated lyrics, we incorporate integrated rhyme information into our model, thereby improving lyric generation performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the explosive growth of streaming video data, describing or understanding these videos has become an interesting topic within the international academic community. However, existing methods have ignored the important information among the image, audio, and text, resulting in insufficient understanding of the video. In this paper, we propose a novel video understanding algorithm that incorporates the above neglected information. Firstly, this method combines speech recognition and a Large Language Model(LLM) to obtain the detailed textual descriptions of the video. Secondly, the image and textual descriptions are combined to obtain video keyframes. Finally, the textual descriptions and keyframes are concatenated to gain pivotal video understanding results. Extensive experiments have shown the superiority of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, there is a scarcity of datasets for anime avatars, compounded by copyright concerns. To address this issue, this paper proposes a method for generating anime avatars. Firstly, a wasserstein generative adversarial networks(WGAN) network is employed to construct the model, with RMSprop chosen as the optimizer. Subsequently, publicly available anime avatar samples are randomly collected for model training, and model hyperparameters are determined through grid search and personal experience. Finally, during the experimental simulation phase, after 300 epochs of training, the training errors of the generator and discriminator decrease and stabilize, and the generated image samples closely resemble real images. Experimental results demonstrate that this method can generate realistic anime avatars with excellent performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Under the background of the data age, the pace of digital transformation and upgrading of higher education has gradually accelerated, which has led to the rapid collection of educational big data, so that the limitations of traditional college students' performance management system in data processing and mining analysis have become increasingly prominent. In this regard, based on the actual application needs of colleges and universities, this paper will give full play to the efficiency, accuracy and adaptive advantages of artificial intelligence technology, and propose and construct a college students' performance prediction model based on BP neural network, in order to make up for the functional defects of the traditional college students' performance management system. Practice has proved that the model relies on the powerful nonlinear mapping ability of BP neural network, which can not only process the historical performance data of multiple students, multiple subjects and multiple semesters at the same time, but also integrate the attendance rate, completion rate, participation in extracurricular activities and other data as auxiliary features to improve the accuracy and scientificity of performance prediction, provide necessary support for realizing personalized teaching, optimizing the allocation of educational resources and strengthening the management of higher education, and help promote the highquality development of higher education.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mineral medicines are pivotal in traditional Chinese medicine, with their identification technology being a focal point of research in this field. Over the years, various identification methods for mineral medicines have emerged. Integrating computer vision technology and deep learning theories makes intelligent identification of mineral medicines feasible. This study employs a convolutional neural network (CNN) algorithm to successfully construct an image recognition model for halloysitum rubrum (Chinese medicine name: Chishizhi). Utilizing the TensorFlow deep learning framework, we preprocess and augment a limited dataset of mineral images, substantially expanding the data scale and addressing sample imbalances. Based on image analysis, accuracy, and loss are evaluated as performance metrics for image recognition. The results demonstrate a high recognition accuracy of 92% with a stable loss between 0.1 and 0.2. Furthermore, the optimized model exhibits faster convergence speed and lower training time, markedly enhancing the efficiency of network training and achieving intelligent recognition of halloysitum rubrum images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The application of neural network algorithms for multilingual translation is currently an important research field. Traditional sequence neural frameworks, such as RNN and its variant LSTM and GRU models, slow the training and inference speed, meanwhile, lower the accuracy of output translations when processing long sequences, as their inherent sequential processing mechanism limits the possibility of parallel processing. This report addresses the shortcomings of traditional sequential neural frameworks and establishes a transformer model that includes an attention mechanism encoder and decoder to make up the short brand. It combines self-attention with neural networks and systematically implements multi-lingual scientific translation to improve translation optimality on the basis of PyTorch. The experimental test results indicate that the BLUE value of the transformer model with attention mechanism on the field of scientific translation is improved to varying degrees compared with that of traditional sequential neural algorithms. This proves that the performance of the transformer algorithm model with attention mechanism is significantly better than that of traditional models and put forward improved suggestions on the basis of this qualified transformer algorithm model with attention mechanism.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When performing image and document information extraction tasks, OCR technology is the primary tool used to extract text and layout information from documents. Subsequently, deep learning models such as LSTM, Transformer or BERT are used in order to accomplish specific information extraction tasks. However, in the face of complex document layouts, existing open-source OCR tools fall short in recognizing and labeling complete sentences, often resulting in continuous sentences being incorrectly segmented into multiple parts. This segmentation affects the order of the model's input to the text during the information extraction task, thus indirectly affecting the efficiency of the extraction process. To address these issues, this study proposes a BERT-based language model, BertNPP, which leverages BERT's powerful capabilities in text comprehension to efficiently aggregate text lines by mining and exploiting text features through a specially designed pre-training task. Finally, experiments were conducted on real datasets to validate the effectiveness of the model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Inspiring by the atmosphere shattering model, we deduced a novel transform model for image dehazing. Based on the prosed model we designed a light easy-training CNN network for end-to-end image dehazing tasks. We trained our module on RESIDE dataset and tested it on O-HAZY, I-HAZY and NH-HAZE datasets. The results suggest that comparing with AOD-Net and Dehaze-Net our proposed method gives a better performance in image dehazing. With the experiments, we also analyzed the mechanism of blue shifting and explained how our module would help in solving the blue shifting problem. As a light model, it can be combined with other detection network such as YOLO or F-RCNN easily to maintain complex tasks in future researches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning models hold promise for sentiment analysis (SA), traditional CNN and LSTM methods are limited in capturing both local and temporal features, affecting accuracy. This paper introduces a ConvLSTM model with word-level attention to address these issues. By combining CNNs and LSTMs with an attention mechanism, the model improves sentiment classification by focusing on relevant features. Experiments on the IMDB, SD4A, and Sentimen140 datasets show that the proposed model outperforms baseline methods in precision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent Perception and Decision Optimization Technology
Job Shop Scheduling (JSP) and Flow Shop Scheduling (FSP) represent typical production scheduling problems. This paper delves into a JSP variant, Re-entrant Job Shop Scheduling (RJSP) with infinte buffers, introducing reentrance where certain jobs revisit machines at multiple steps. This paper devise a re-entrant job shop scheduling model with infinite buffers, aiming to minimize maximum completion time. Additionally, this paper propose an improve genetic algorithm for RJSP resolution, employing diverse initial solution generation methods and efficient crossover and mutation operators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to improve the mining accuracy of the electricity demand attribute dataset and ensure its application in power system management and scheduling, an improved bee colony algorithm based association rule mining algorithm for electricity demand attribute dataset is proposed. Establish Strongly correlated material rules for attribute data sets, use K-means clustering algorithm to Discretization the data with continuous values in the data set, and use the construction principle of MapReduce model to mine the attribute data set of power demand in parallel; Utilize bee colony algorithm to mine frequent itemsets, innovatively apply gravity algorithm to improve artificial bee colony algorithm, and optimize the mined frequent itemsets. The test results show that the acceleration ratio and non emptiness rate of using the algorithm in this paper for data mining reach 0.95 and 97.5%, respectively, and the directional mining accuracy reaches 98.5%, which has good practical application effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of artificial intelligence technology, solving power grid simulation analysis represented by data-driven methods has become a trend. Using machine learning methods to analyze the operation mode and prior knowledge of the power grid instead of manual tedious work makes it possible for machines to have a certain level of basic knowledge and experience in power system analysis. This article is based on the characteristics of power grid simulation analysis, Propose a task engine based power grid simulation analysis method, which makes tedious work mechanized and simplified, providing technical support for intelligent power grid simulation analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For improving the situation of incomplete utilization of gesture features in Static gesture recognition, this paper proposes a dual-streams convolutional neural network model using the RGB/grayscale and black-and-white images to detect and recognize static gestures for adequate learning of the images’ features and improvement of recognition accuracy rate. In this model, the K-means Clustering Method is used to remove the background, the Faster-RCNN model is used to process the image to get the gesture position, and the number of training samples is expanded to enhance the generalization of the training. Finally, two kinds of images are inputted into the network training and gotten integrated before the classification, which make full use of the color texture features and the edge shape features to improve the accuracy of recognition. Verified by Thomas Moselund and Jochen Triesch’s gesture database, the recognition accuracy reached 97.91% and 97.20%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the constraints of reduced radiation doses, low-dose computed tomography (LDCT) images frequently suffer from increased noise levels. To address this challenge, we developed the LCTU-Net, a network that incorporates a Lipschitz continuous transformer to enhance the capability of feature extraction. This new approach replaces traditional Transformer components, improving the efficiency of loss reduction and achieving lower loss levels. The U-Net architecture integrated within LCTU-Net plays a crucial role in effectively reducing noise interference in the images. Experimental results have demonstrated that LCTU-Net significantly outperforms existing denoising technologies, particularly in its ability to preserve intricate image details while effectively reducing noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The digital transformation has elevated data centers to critical infrastructure status, essential for the operation of various industries. The stability of power facilities within these centers directly affects their overall service reliability. Traditional manual inspection methods cannot keep up with the growing demands for operation and maintenance due to increasing scale and complexity. This paper introduces an intelligent inspection system for data center power facilities, leveraging an enhanced YOLOv8 algorithm and a Local Attention Mechanism (LAFFE). The proposed system integrates a deeper CSPDarknetX backbone network with a pyramid SPP structure and an optimized detection head featuring keypoint regression and IoU loss. The LAFFE module adaptively enhances salient features and suppresses redundant information, improving fine-grained recognition and real-time performance. Extensive experiments on a large-scale dataset demonstrate the superior accuracy and efficiency of our approach, particularly in detecting small objects amidst complex backgrounds. The system significantly improves inspection efficiency, reduces labor costs, and shows exceptional value in emergency scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous development of science and technology, the level of automation and intelligence in the agricultural field is also constantly improving. In order to improve agricultural production efficiency, reduce labor intensity and reduce resource waste, more and more agricultural robots have emerged. In this paper, an intelligent disease and pest detection robot based on ROS system is designed. The robot uses Yolov5s as the algorithm framework of crop disease and pest detection, and the fusion of SLAM and TEB algorithm as the algorithm framework of autonomous navigation, which improves the problems such as low precision of robot disease and pest detection and unstable path planning. The test results show that the robot can monitor farmland environmental parameters in real time, automatically identify pests and diseases, and have autonomous navigation and human-computer interaction. Providing relevant data and recommendations to assist farmers in pest control will help improve agricultural production efficiency and crop quality, reduce the use of pesticides, and promote sustainable agricultural development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The traditional PI control strategy is not only complex and cumbersome in the process of parameter tuning, but also shows poor adaptability in the face of dynamic changes of power grid. Therefore, this study proposes a new control framework, which can adjust the control parameters more accurately and enhance the response ability and stability of the system to the change of power grid state. In this work, an inverter power control strategy based on Long Short-Term Memory Network (LSTM) is proposed. This strategy uses the long-term memory ability of LSTM to model and predict the dynamic relationship between the grid and the inverter. By inputting historical power data, the LSTM model is able to learn and identify patterns and trends in power demand, and then optimize the output response of the inverter to adapt to changes in demand and state of the grid. Experimental analysis shows that the LSTM model significantly improves the adaptability and response speed of the inverter to complex power grid conditions, thereby effectively enhancing the stability and efficiency of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Titanium alloys are extensively utilized in the aerospace industry due to their exceptional mechanical properties and resistance to corrosion. The presence of surface defects in titanium alloys can significantly impact their performance. This paper investigates automatic classification and detection methods for surface defects of titanium alloy based on the YOLOv8 algorithm in deep learning. An optical detection system was designed for getting high-resolution images of the flaws information in titanium alloy samples, which were used to establish the data set of defect recognition. It was found that the proposed method can effectively achieve real-time detection of surface defects in titanium alloy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of multi-focus image fusion is to extract all the features of complementary images to obtain a fusion image with complete image information. The fusion method based on decision map is widely used in multi-focus image fusion, because it can retain the information of the original image to the maximum extent. However, the method based on decision map often has the problem of misjudgment at the edge of clear region and fuzzy region, which affects the accuracy of fusion image. In order to solve the problem of misjudgment in the edge region based on the decision map scheme, we introduce differential convolution to improve the sensitivity of the network to the edge gradient, improve the judgment ability of the network in the edge region, and improve the accuracy of the decision map. Moreover, an efficient feature fusion method is essential for improving network performance. To ensure the quality of feature fusion results, we introduce a mixture fusion expert to guide the image feature fusion process, whose unique gating mechanism guarantees the efficiency of feature fusion. Extensive experiments have demonstrated that our proposed method achieves superior results compared to most existing state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An experimental comparative analysis of the classic traditional edge detection operators was conducted. The 8-direction Sobel operator was used to improve the genetic simulated annealing algorithm, and the algorithm was applied to the silk screen detection of printed circuit boards; Finally, through the detection of white characters on printed circuit boards of different sizes, analyze test results. The results show that the improved genetic simulated annealing algorithm has high efficiency and accuracy in the white text detection process of printed circuit boards.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As the evolution of intelligent transportation systems progresses, the need for the automated identification and categorization of commercial vehicles is becoming more critical in areas such as traffic surveillance, logistics management, and security prevention. This publication presents a refined YOLOv7-driven approach for the automatic identification of trucks and pickup trucks, designed to tackle the deficiencies in existing detection frameworks for spotting diminutive and similarly-classified targets. Initially, this article dissects the architecture of YOLOv7, pinpointing the pivotal spots for enhancement. On this basis, The YOLOv7-BCI model enhanced in this paper boosts the model's efficacy by integrating the Convolutional Block Attention Module (CBAM), the Spatial Pyramid Pooling Layer (SPPF), and the incorporation of Involutional blocks; Simultaneously replacing CIoU with SIoU optimized the regression loss function. The empirical outcomes demonstrate that the improved algorithmic framework is effective on the image datasets of homemade trucks and pickups mAP@0.5 The indicator has reached 81.1%, an increase of 2.7% compared to the original model, which basically meets the needs of commercial vehicle classification and detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The real-time detection of water quality is of paramount importance for the maintenance of ecological balance and the promotion of regional development. In this study, the data obtained from 10 sampling points in the Yellow River Basin of Ningxia and Landsat 8 images were selected for inclusion in the dataset. The tuna algorithm (TSO) was employed to optimize the neural network of the least squares support vector machine (LSSVM), resulting in the construction of the TSO-LSSVM model. This model was utilized to invert turbidity (TUB), electrical conductivity (EC), dissolved oxygen (DO), and total nitrogen (TN), and was then compared with the LSSVM model. The results show that the TSO-LSSVM model has stronger global search ability, and the R2 of TUB, EC, DO, and TN is 0.84878, 0.6756, 0.62336, and 0.57783, respectively, and the inversion accuracy is improved by 2.922%, 3.24%, 1.763%, and 5.568% compared with the LSSVM neural network, respectively. The TSO-LSSVM model constructed in this paper demonstrates efficacy in the water quality inversion of the Yellow River Basin in Ningxia, offering a novel reference for the prediction of complex inland river water quality parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In view of the irregular shape of wheat ears, susceptibility to environmental light, and the dense and occluded state in the field, traditional methods of wheat ear recognition suffer from low accuracy, high error rates, and inefficiency. This paper proposes a GAM-YOLOv8n deep learning model for wheat ear detection. First, the GAM (Global Attention Module) is proposed to improve the capability of capturing features of wheat ears in the farmland. Next, the Wise IoU loss function is used to replace the original loss function of CIoU for a more precise measurement of the similarity between the real and predicted bounding boxes of the wheat ears. The improved show that compared with the initial YOLOv8n network, the improved network performance increases the Precison (P) by 3.7%, detection precision (PR) by 1.1%, and mean Average Precision (mAP) by 2.7%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cloud computing has become integral to information technology, offering a flexible and scalable way to access and utilize computing resources. In cloud computing, task scheduling policies can affect the resource usage efficiency of the underlying system. Hence, allocating user-input tasks to appropriate computing resources is an essential issue in cloud task scheduling. For this, many meta-heuristic algorithms have been introduced. This paper addresses the problem of cloud task scheduling using the exponential distribution optimizer (EDO) for the first time. Meanwhile, an enhanced EDO variant, called EEDO, is proposed to enhance the optimal solution search ability further. Specifically, a chaotic oppositional learning strategy is proposed to improve population diversity. Then, a guided solution weight adaptation technique is developed to augment the accuracy. Experimental findings indicate that the proposed EEDO makes better use of system resources for different tasks. Hence, EEDO can be considered as a promising scheduling method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Single Warehouse Multiple Traveling Salesman Problem (SDMTSP) is an extension of the Traveling Salesman Problem (TSP), widely used in fields such as logistics distribution and transportation planning. This article proposes a new method for solving SDMTSP based on the optimized Sand Cat Swarm Optimization (SCSO). Building on the traditional Sand Cat Algorithm, interval point operations, the pheromone mechanism from the Ant Colony Algorithm (ACA), and an improved t-adaptive mutation operation are introduced to enhance the algorithm's performance in solving multiple traveling salesman problems. Simulation experiment results show that the improved Sand Cat Algorithm performs well under the conditions of N=100, NSalesmen=4, NminTour=5, α=0, β=0.1, and δ=0.9. With 300 repeated iterations, the algorithm performed well on multiple standard test problems and effectively reduced the total travel distance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Classification is a crucial learning task in machine learning, which fundamentally involves predicting the category of test examples using a classifier generated from a training example set. However, many real-world applications have training sets with imbalanced class distributions, which often hinder the classification performance of learning algorithms. To address this, this paper proposes a feature-weighted adaptive boundary oversampling method. Initially, the method acquires neighbor information of sample points based on weighted Euclidean distance (WED), identifies minority class samples on the boundary according to the distribution of minority class neighbors, calculates the synthetic factor corresponding to boundary samples, and updates the number of samples to be generated for the sample based on its value. Finally, it randomly selects minority class samples from the neighbors to generate new samples according to the synthetic factor. The proposed method is compared with seven sampling methods on a decision tree classifier and 12 imbalanced datasets from KEEL. The results show that the proposed method achieves the best values in F1, G-mean, and AUC (Area under Curve) on most datasets, and has the best Friedman ranking, proving that compared with other sampling methods, it has better performance in handling classification problems in imbalanced data. By setting certain constraints and allocation strategies for the synthetic factor, it can provide ideas for similar research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Insulator defects in power transmission systems pose significant risks to grid stability and safety. This paper presents an improved method for rapid diagnosis of insulator defects using infrared images, based on an enhanced YOLOv8algorithm. Our approach incorporates Haar wavelet downsampling and a Polarized Self-Attention (PSA) mechanism to boost detection accuracy while maintaining real-time performance. The Haar wavelet downsampling Boosts the model's capability to grasp multi-scale features, concurrently enhancing the PSA mechanism's emphasis on pertinent thermal
patterns. We evaluate our method on a comprehensive dataset of 10,000 infrared images collected from various power transmission environments. Experimental results demonstrate that our improved YOLOv8 model achieves SOTA, with a mean Average Precision of 94.7%, outperforming the original YOLOv8 and other object detection algorithms. The proposed method offers a 3.2% increase in mAP compared to the original YOLOv8, while maintaining a processing speed of 42.8 frames per second. This research contributes significantly to the advancement of more efficient and precise outcomes power grid maintenance techniques, potentially reducing the need for manual inspections and enhancing overall system reliability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wood defect detection is a necessary process and link for efficient use of wood processing, which is of great significance for improving wood quality and enhancing the economic benefits of wood processing. Traditional detection techniques have disadvantages such as destructiveness, slow detection speed, and low accuracy. In this study, a single segment object detection algorithm was used to identify and locate surface defects on wood by combining 3D scanning technology and deep learning algorithms. The non-destructive testing of wood was carried out using 3D laser scanning technology. Firstly, the surface feature data of the wood was extracted using a 3D laser scanner, and a preliminary 3D model image of the wood was constructed; 3D reconstruction was performed using Geomagic Control software, and the surface of the wood was segmented using the triangular mesh algorithm; Based on the YOLOv5 network in deep learning, the point cloud data of wood is trained to obtain the surface defect area of wood and achieve wood defect detection; The research results indicate that the combination of 3D scanning technology and deep learning algorithms for wood surface defect detection can achieve efficient and accurate detection, effectively improving the intelligence level and production efficiency of wood processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Climate change-induced fluctuations in temperature and humidity create favorable conditions for the reproduction and spread of forest pests. Additionally, this unstable climate environment weakens the natural resistance of trees, making forest areas more susceptible to pest infestations. Detecting these pests is a challenging task. The diversity in the shape, color, and size of pests, along with variations in lighting conditions, significantly impacts the accuracy of detection results. To effectively tackle these issues, this paper enhances YOLOv8s by integrating a Dilation-wise Residual (DWR) module in the backbone to capture high-level multi-scale contextual information, and a Simple Inverted Residual (SIR) module to extract features from the lower layers of the network, thereby improving the feature extraction efficiency for real-time semantic segmentation. These improvements not only enhance the precision of detection but also significantly boost the computational efficiency of the model, making it more suitable for practical applications. Tests on a forestry pest dataset show that our model achieves superior performance, reaching an 0.815 AP detection accuracy, surpassing YOLOv8n by 3.4% AP.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The aim of this study is to develop a practical tennis swing recognition system that helps beginners to learn the correct swing technique, correct incorrect movements and provide functions for counting and evaluating swings. To achieve this goal, we created a dataset called TSAD that is specifically tailored to tennis swing actions, like the public UCF101 dataset, to support model training and evaluation. We used an extended PP-TSMv2 model with local temporal attention (LTA) replacing the original global attention mechanism in the training dataset. The model was trained and evaluated on both the public UCF101 dataset and TSAD and showed significantly improved performance over the original model. The results of this study indicate that the tennis swing recognition system based on the improved PP-TSMv2 model has potential practical value, providing effective training and guidance for tennis players and forming the basis for further research and applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identifying disease-specific biomarkers from gene microarray data is crucial for disease diagnosis. Genetic algorithms can effectively select important features through efficient search strategies. However, traditional genetic algorithms face challenges such as retaining too many features, slow convergence, and instability due to the high dimensionality and redundancy of the data, as well as multiple random factors in the algorithm. This paper proposes a hybrid algorithm combining Maximum Relevance Minimum Redundancy and a Population Clustering Binary Genetic Algorithm. The proposed method selects the most important biomarkers through an initialization strategy based on Opposition-Based Learning, clustering, and dynamic mutation. Experimental results demonstrate that the proposed method achieves higher classification accuracy with fewer features across six high-dimensional microarray datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid development of artificial intelligence and new generation information technology is driving the intelligence and networking of automotive products. Currently, driven by the comprehensive development of technology and industrial exploration practices, autonomous driving of automobiles has become a research focus and hotspot in the field of automotive technology. Based on this, this paper analyzes the development and research status of key technologies of auto driving, such as environmental awareness, planning and decision-making, and control execution, and focuses on the application of in-depth learning in the planning and decision-making of autonomous vehicle, providing some reference for the planning and decision-making of autonomous vehicle.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The celluloid style is usually characterized by clear lines, distinct color blocks, and sharp contrast between light and dark, etc. When it comes to celluloid-style cartoons, it involves colorizing the line-enclosed segments of line art frame by frame. In the past decades, with the popularization of computer technology, practitioners commonly utilize paint bucket tools to perform line art colorization tasks, based on RGB values predetermined by a color designer. Nevertheless, it is still laborious regarding diverse color segments, segment matching and the large number of frames. Concerning that, a number of automated methodologies have been devised. The methodology named inclusion matching proposed by a group in NTU is advanced and practical. To a large extent, it can effectively address issues like occlusion or wrinkles that arise among frames. The inclusion matching pipeline is based on deep neural networks. From coarse to fine, it starts to warp the line art for extracting features and then performs inclusion matching using the attention mechanism. However, this pipeline ignores the global information of line art. Inspired by the vision transformer, the present study introduces a new mechanism to enhance the inclusion matching module. Experiments depict the effectiveness of our techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The traditional LSTM network cannot predict the track of non-cooperative targets, because the data collected from non-cooperative targets has partial data loss, and it cannot form equal interval sampling data. The cubic spline method can be used to interpolate the data at equal intervals, but the contribution of the interpolated points to track prediction is much lower than that of the measured points. Therefore, the contribution matrix of track prediction is constructed by the Attention mechanism, and the weight matrix of LSTM network training is used to determine track prediction together. The experimental results show that the average accuracy of the test set can reach 96.25%, and the loss function value is about 0.125. The prediction accuracy is improved by 14.8% compared to traditional LSTM networks, which verifies the practicability and effectiveness of this method in non-cooperative target track prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Process mining uses event flow information from log files to generate process models for better management and optimization of target processes. Approximate conformance checking quantifies the deviation of process models from process logs and is important for process compliance checking, model quality assessment and business process optimization. Existing approximate conformance checking methods are mainly classified into rule-based methods, token replay methods, and alignment-based methods. These methods usually rely on known process structures and traditional machine learning and mathematical statistics knowledge, and cannot adapt to the situation where large-scale logs and process structures are unknown. The correlation information between different traces is not effectively extracted. The effect of different length traces on coding sparsity is also not well addressed. To address the above problems, we adopt machine learning for approximate conformance checking and propose a Mamba-based method MACC. MACC efficiently improves the approximate conformance checking performance of machine learning based regressors on large-scale system logs under the scenario of unknown process structure. Specifically, we introduce the feature embedding of Mamba based classifier on the trace embedding before input into regressor. Mamba is used as the base feature extraction module of classifier, and TCN is introduced to enhance the fine-grained feature mining capability, which effectively extracts the correlation information among different traces while maintaining the model recognition rate. Finally, a split-bucket strategy based on normal distribution is designed to dynamically adjust the length threshold division to reduce the coding sparsity problem caused by longer traces in fixed-length coding. Comparative experiments are conducted on multiple machine learning methods and vanilla fitness-obtaining methods, and the proposed method achieves better results with all metrics on the BPIC2019. Also, multiple ablation experiments are conducted to discuss the effectiveness of the model structure. Comprehensive experiments demonstrated the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The data curve in typical oil field scenarios is the most representative and the basis for system optimization scheduling. Therefore, a typical scene selection method of load-PV based on ordered clustering and load-wind speed based on K-means clustering is proposed in the paper. In view of the strong time sequence of electric load and PV intensity, the ordered clustering method is used to cluster electric load and light samples. Aiming at the randomness of load and wind speed, K-means method is adopted to cluster. The cluster scenarios of different data are combined, and the typical scenarios of oilfield micro-energy grid system including load, wind speed and PV data are obtained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
After a destructive earthquake, rapid evaluation of damaged buildings information is crucial for effective disaster relief efforts. Traditional data collecting techniques are often slow and insufficient to meet the urgent demands of earthquake response. To address this issue, this study introduces an improved algorithm derived from the you only look once version 8 (yolov8) model, tailored for the identification of damaged building components post-earthquake. In this study, the information extraction section of the backbone of YOLOv8 is improved. The Parallel Attention Mechanism Model (PAM) is introduced to improve the model's ability to deal with complex scenarios. Apart from that, the SimSPPF structure is introduced to optimize the feature pyramid layer, which can increase the speed. The results show the effectiveness of the improved YOLOv8 algorithm in identifying the damaged constructs of earthquake-damaged buildings. Average accuracy improved by 3.2% compared to the original model. The method can provide a valuable reference for the development of automatic analysis methods for earthquake information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To settle image multilevel segmentation question, a novel method was proposed based on the common steps to build up 2-D entropy and the properties of 2-D entropy in our paper. In our approach, the 2-D histogram was made by setting up the centre’s grayscale value via maximum grayscale value of rest four neighbour points in the same 4-neighbour, comparing with traditional method, experimental results of our method on almost all images of Matlab image toolbox, show that noise can be suppressed ideally without losing any necessary silhouette information about concerned objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In decision problems, single step decision making refers to making a decision at each time step based only on the current state, without considering the long-term state or future effects. This approach is suitable for those scenarios with immediate feedback and operational impact, but can be challenging when facing complex and long-term dependent environments. We will explore the advantages and disadvantages of single step decision making and how this strategy can be used to optimize the decision process in practice. This innovative algorithm integrates the memory capabilities of recurrent neural networks (RNNs) into deep reinforcement learning frameworks. Unlike traditional Deep Q Network (DQN) setups, where feedforward neural networks are typically used for the the RPP-LSTM employs an LSTM network as the Q-value network. This integration allows the Q network to retain memory of previous environmental states and actions, thereby addressing the myopic nature of decision-making prevalent in methods. By leveraging LSTM's ability to capture and utilize temporal dependencies, the RPP-LSTM algorithm enhances the UAV's path planning capability by considering a broader context of environmental changes and past decisions. This approach is particularly beneficial in dynamic environments where the immediate decision based solely on current state information may not be optimal. The LSTM-equipped Q-value network can effectively learn and adapt to varying environmental conditions, leading in tasks. Furthermore, the incorporates a stratified punishment and reward mechanism designed to optimize the rationality of UAV path planning. This function encourages the UAV to make decisions that not only achieve immediate goals but also contribute to long-term planning objectives, ensuring strategic adaptability in complex scenarios. Simulation results demonstrate the superiority of the RPP-LSTM algorithm over traditional approaches relying on feedforward neural networks (FNNs). It exhibits enhanced adaptability to complex environments and achieves superior performance in terms of both robustness and accuracy in real-time UAV path planning scenarios. This integration of LSTM with deep reinforcement learning represents a significant advancement towards more intelligent and effective autonomous UAV operations in dynamic and challenging environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, there have been frequent incidents of wasps attacking bees raised by beekeepers, resulting in significant losses. To address this issue, I chose to compare the acoustic characteristics of wasps and bees, thereby constructing a dataset. By using the cosine distance to compare the similarity of similar features we can determine whether the sound is from a wasp or a bee. Additionally, the Time Difference of Arrival (TDOA) algorithm is used for tracking and localization. Through the use of Generalized Cross-Correlation (GCC), noise is mitigated, achieving relatively accurate localization in different Signal-to-Noise Ratio (SNR) conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a study of adaptive fusion algorithms which based on Independent Component Correlation Algorithm (ICA). The algorithm process can be divided into two processes which are training and the integration. In the training phase, the two-dimensional image information is converted to one-dimensional information; one-dimensional vector will be decomposition by adopting ICA. Obtain separation matrix, the integration phase, based on adaptive algorithm, the separation factor obtained in the training process, to give a new integration coefficient by using the new fusion coefficient matrix to restore the image fusion, the resulting image will be clearer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the UAV visual navigation and positioning system, the topological relationship between road intersections can provide an important basis for remote sensing image matching and localization. Therefore, this study proposes a road intersection matching method based on triangular structure, which constructs a semantic expression library of road intersection targets as a benchmark database for road intersection matching; and then formulates a matching strategy based on the triangular similarity principle with the salient features between road intersections to complete the accurate and efficient localization of road intersections. The simulation experiment results show that this algorithm is stable and robust, and is not affected by brightness, noise and view angle. Under the 6Km*6Km large scene matching condition, it takes 7.8S from intersection detection to intersection geolocation localization, correctly matches 8 road intersections, and meets the UAV localization requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Through the collection and analysis of drivers' electroencephalogram (EEG) data, this study extracts characteristics of Error-related Potentials (ErrP) to real-time capture drivers' decision intentions. This enables rapid and accurate decision adjustments in emergency situations, establishing a closed-loop human-machine obstacle avoidance decision model. Three typical hazardous driving scenarios and ErrP experimental paradigms were designed. The integration of Prescan, Simulink, and Psychtoolbox was employed for joint simulation. EEG data from 15 subjects were collected and analyzed. Potential topography maps indicated that error events effectively elicited ErrP, containing error-related negativity (ERN) component around 150-200ms and error-related positivity (Pe) component around 400-450 ms after error event stimuli. Shrinkage Linear Discriminant Analysis (SKLDA), Linear Support Vector Machine (LSVM), and Probabilistic Kernel Support Vector Machine (PSVM) and fine-grained convolutional neural network (CNN) were used for ErrP classification. The average classification accuracy and area under curve (AUC) of the fine-grained CNN was (82.75±6.25)% and (85.48±4.46)% respectively, which demonstrated superior performance among the four algorithms. These results verify the feasibility of utilizing ErrP to infer drivers' intentions in emergency conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, gas leakage and explosion accidents in urban gas pipe network occur frequently, which poses a great threat to public security. The regional warehouse for emergency rescue is an important place used for the management of storage and distribution of emergency materials in the gas emergency rescue system and ensuring timely and efficient support for gas emergency materials. In this paper, K-means clustering algorithm is employed to determine the layout of the regional warehouse, according to the field investigation and opinions of experts. The optimized allocation of emergency materials is analyzed based on Delphi method and two rounds of investigation on material allocation. The results show that this method can determine the layout of the regional warehouse in a scientific way and allocate the type and quantity of emergency materials in a reasonable way, so as to optimize the layout of the regional warehouse and the structure of materials.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning plays a vital role in road crack detection, enabling improved detection accuracy, reduced costs, and facilitated automated maintenance, thus enhancing road safety and traffic efficiency. However, most of their remarkable performance relies on complex and costly computational resources, which often cannot meet the requirements for both speed and accuracy in mobile deployment terminals. In this paper, to address the trade-off between high accuracy and real-time performance, an efficient YOLOv8-improved network is proposed. This network not only reduces network redundancy but also significantly improves inference speed, achieving a balance between high accuracy and real-time performance. This paper employs LAMP pruning techniques to optimize the model as the student model in knowledge distillation, and further designs a teacher network that integrates the BAM attention module, C2f-DynamicConv, and CARAFE upsampling operator to provide feature knowledge distillation for the pruned model. The BAM module enhances the network's sensitivity to critical information, C2f-DynamicConv expands the receptive field to enhance feature extraction capabilities, and CARAFE, based on content-adaptive upsampling, aggregates contextual information to provide richer features for prediction tasks. Experimental data shows that our model achieves a significant 69.9% improvement in FPS and a 3.98% increase in map@50 accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chaotic systems, known for their unique dynamics, are used in cryptography, neural networks, control engineering, and secure communications. However, some classical and improved hyperchaotic systems may have security issues due to insufficient complexity. To address this problem, this study proposes a novel 3D hyperchaotic map named SLSS-3D. The SLSS-3D system derives from classical logistic, sine, and ICMIC maps through nonlinear coupling transformations. It includes two control parameters, α and β, offering a wider range of parameter selection. This enhances resistance to bruteforce attacks. This study evaluates the hyperchaotic behavior of the SLSS-3D system through analyses of phase diagrams, bifurcation diagrams, Lyapunov exponent spectra, sample entropy, the 0-1 test, and permutation entropy. The findings reveal that the SLSS-3D system demonstrates superior stability and significant chaotic properties. These characteristics suggest that SLSS-3D has substantial potential for applications in cryptography, artificial intelligence, and secure communications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Regarding the issue of determining whether a warship has entered an underwater threat zone, this article established an algorithm coordinate system, classified the location relationship between legs and underwater threat zones, and included the method of judging the classification for each situation, on this basis the judgment rules were established. The example verification preliminarily shown that the judgment algorithm proposed in the article could quickly and accurately determine whether a warship had entered the underwater threat zone, directly serving as a reference for warship maneuvering. After technical application research and verification, it can be embedded in the warship's combat command decision support system, which can provide practical auxiliary decision support for commanders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lesion area segmentation is usually an important means for medical images to locate diseases, analyze and identify, and extract key discriminatory features for diagnosis. In this paper, in order to solve the problems of inconsistent size of skin cancer lesions, fuzzy lesion subjects, and complex boundary features, an Enhanced Dynamic Convolution Module (ECCM) is proposed, which is composed of Dynamic Convolution and ParNet Attention (PNA) modules in serial mode, in which the CondConv module will replace the static convolution module in the network, which can promote the learning of discriminative features and effectively improve the quality of model feature extraction. PNA, on the other hand, reduces the loss of detailed information in the lesion area through multi-level feature fusion. Experiments show that the proposed method can not only improve the extraction ability of model features, but also effectively improve the phenomenon of inaccurate edge segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.