PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13162, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chess piece recognition poses a significant challenge in computer vision due to the complex visual patterns and occlusions involved in identifying each piece’s type. In recent years, deep learning, particularly convolutional neural networks (CNNs), has emerged as a promising approach for image recognition, achieving state-of-the-art performance across various visual recognition tasks. In this paper, we propose a CNN-based approach for accurate chess piece recognition, capable of identifying the type of chess piece on each square of a chessboard. Our approach utilizes a deep neural network architecture that combines convolutional and fully connected layers to extract relevant features from chessboard images and make precise predictions. To evaluate our approach, we employ a large and diverse dataset of labeled chessboard images and compare its performance against state-of-the-art methods for chess piece recognition. Experimental results demonstrate that our approach surpasses existing methods, achieving an impressive accuracy of 98.9% on the test dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of Artificial Intelligence (AI)-based language models, it is becoming pertinent and will become even more pertinent in the future to be able to distinguish between AI-based generated text and human-based generated text. The implications of having humans present work generated by AI language models and claiming it their own have serious implications at all levels with the basic being the ethical implication. In this paper, we propose the use of modified deep learning models using the Deep Recurrent Neural Network (DRNN) for the classification of text to be either AIgenerated or human-written. Two modified architectures are proposed DRNN-1 and DRNN-2. This led to the second contribution of this work which is the development of a dataset containing short answers to simple questions in Information Technology (IT), Cybersecurity, and Cryptography given to junior and senior students in Computer Engineering & Science, and IT to produce a total of 450 answers. The same questions were given to ChatGPT for a total of 450 answers. The combined dataset consisted of 900 answers in the three domains. Though both proposed architectures produced good results, the DRNN-2 achieved better results with a test accuracy of 83.78% using the cybersecurity questions alone and 88.52% using the combined total dataset. This is considered one of the very excellent results achieved in this new emerging field of research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As an important part of automobile safety system, distracted driving behavior recognition has important research value. By analyzing the limitations and difficulties of the existing distraction driving recognition methods, this paper proposes a two-stage dual-channel recognition network. In the first stage, the Alphapose key point detection network based on SF3D data set pre-training is used to obtain the driver 's key point information, and the key area heat map is generated based on the Gaussian heat map. It is combined with the original image to form the two-channel input of the second stage. The fusion feature is generated by the feature fusion module based on feature concatenation, and it is used as the input of the second stage ResNet-50 backbone recognition network for recognition. Finally, in order to enhance the recognition effect, this paper introduces spatial and channel attention mechanisms to enhance the learning of interest features. And comparison and ablation experiments are designed for the proposed method. Compared with the benchmark network model, the proposed method improves 2.6 points, which verifies the effectiveness of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Skin cancer remains a pressing global health concern, with millions of cases diagnosed annually. Early detection is vital, as survival rates vary significantly depending on the stage of diagnosis. Recent advances in dermatological practice have embraced dermoscopy, but its effectiveness often relies on practitioner experience, leading to diagnostic inconsistencies. In the realm of skin cancer classification, traditional machine learning methods gave way to deep learning, with vision transformers gaining prominence. This paper introduces a novel approach that leverages attention-weighted transformers for skin tumor classification. Attention weights gauge the significance of image patches, enabling precise region attention. Within our proposed framework, we introduce an enhanced transformer structure that capitalizes on the power of self-attention mechanisms. This architecture acquires discriminative region attention across multiple scales, enabling the model to effectively capture intricate image details and patterns. Experimental validation compares our method against Inception ResNet with soft attention and ViT-Base on the HAM10000 dataset. Data preparation involves duplicate removal, class rebalancing, and pixel-level augmentation. Evaluation metrics encompass accuracy, precision, sensitivity, specificity, and the F1 score. Results show our approach outperforms existing methods, achieving an accuracy of 93.75%. This work represents a significant stride toward accurate skin tumor classification, marrying innovative architecture with meticulous dataset preparation. The proposed approach holds potential to advance diagnostic tools for skin cancer, benefiting medical practitioners and patients alike.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, federated learning has gained significant attention for its ability to train models without centralizing clients’ data on a central server. This unique characteristic makes federated learning widely applicable in medical image analysis, a field where ensuring patients’ privacy is imperative for medical institutions. However, in compliance with privacy regulations in certain regions, medical institutions must mitigate the influence of their clients’ data on the global model. Existing machine unlearning methods cannot be straightforwardly applied in this scenario, as they require access to clients’ data. Therefore, federated unlearning becomes a necessary solution. The basic strategies of federated unlearning are excessively time-consuming to be practical, prompting an urgent need for a more cost-effective approach. While previous works have proposed various strategies, they often prove either too costly or unstable for real-world applicability. In this paper, we adopt an approach called importance-based selection based on FedEraser, which expedites the retraining process at the expense of storage space. We also attempt to enhance its storage efficiency by pruning less significant updates. We conducted experiments on two datasets in medical image analysis, and the results vividly demonstrate the effectiveness of removing the target client’s impacts. The time and storage consumption of our strategy are also consistent with expectations, emphasizing its practicality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a point cloud registration method based on deep learning, which achieves the purpose of point cloud registration by reducing the projection error of point cloud in feature space. When the two point clouds are basically aligned, the projection mapping of their deep features is also highly similar. Our framework has two pipeline branches. The main branch aims to register the point cloud. Its novelty lies in discarding the traditional calculation of point pair relations, mapping the point cloud into a deep feature map, and achieving registration by reducing the projection error of the two maps. The sub pipeline is a "teacher" who mainly trains the encoder, so that the model can be trained in an unsupervised way without expensive data marking. And compared with some recent unsupervised methods, our method does not only rely on global descriptors, but also attaches importance to the extraction of robust local feature descriptors. Experiments show the state-of-the-art performance of our method which is joint extracting high-level global and local representations in an unsupervised manner, requiring no labeled data or arduously searching correspondences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of AIGC technology, generative models have been applied to many fields such as images, speech, and text. The field of text image generation requires the background knowledge of NLP and CV, so it has become one of the fastest growing directions in the AIGC field. However, how to reduce the problem of generating malicious, violent, and pornographic images in such text image generation algorithms has always been one of the hot topics of research. This article combines NLP and deep learning image content classification technology to implement an auxiliary content generation method for text images. system. Through this system, the image of the text generation image system can be correctly generated and the compliance of the image content can be achieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we introduce a new multi-center instance segmentation model based on the deep learning technique, as a generalization of the classical polarmask model. In contrast to the original polarmask model which imposes a star-convexity shape to approximate the target region, we propose to establish a multi-center model which allows representing the target region via multiple star convex shapes. For this purpose, we extract a set of points, each of which is taken as the centers of star convex shapes, to compute multiple star convex shapes. As a consequence, the final segmentation mask can be naturally generated using the union of all of the detected star convex shapes. Experimental results show that the multi-center polarmask model can achieve more advanced performance on the COCO dataset. In addition, the introduced model provides the possibility for real-time applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The technology of Chinese dialect speech recognition contributes to the preservation and inheritance of regional culture, as well as providing more convenient and customized services, with broad application prospects. In recent years, end-to-end speech recognition methods have demonstrated strong performance in dialect recognition. However, training the model using only a single dialect dataset would cause the model to lose the commonalities in acoustics and linguistics at a broader level. On the other hand, directly training a single model with multiple dialects would overlook the differences between dialect texts, thus affecting the model’s performance. To address this issue, this paper proposes a Chinese multi-dialect speech recognition method based on instruction tuning. By adding different instruction sets before different dialect texts, the model can learn the commonalities among different dialects within the same language while preserving the differences between dialect texts. Additionally, this paper also attempts to enhance the model’s text generation capability by using an additional language model for rescoring the model outputs. We conducted tests on the Common Voice dataset using the Whisper model. The results show that compared to the method of direct mixed training, the instruction finetuning method and rescoring method reduced the Word Error Rate (WER) by 13.44% and 21.18% respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The aim of online clustering is to discover a structure in running data. Adding label constraints or pairwise constraints to this has shown to improve the clustering accuracy. In this study we present an analysis of how different hyperparameters – proportion of constraints, initial number of clusters, and batch window size – affect most recent and popular online constrained clustering methods, using three different metrics. Our results show that initial number of clusters and window size have an effect on clustering results, while the proportion of constraints does not. We also demonstrate that online clustering performs better than clustering of the whole data together. Our overall findings point at the need for new, more effective online constrained clustering methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The aviation industry is a vital part of modern global travel; it is obvious that aviation review is worthy of further study. In our study, we mainly intend to explore whether we can use sentiment representation of reviews instead of the raw sentences, to assist the evaluation of customer satisfaction. Two sets of experiments were conducted, one with the reviews and one with their sentiment representation. Both sets employ deep learning and machine learning techniques to guarantee comprehensive study. Results show that traditional machine learning models achieved more competitive performances than deep learning models in two tasks, and the model with Gradient Boost using sentiment representation gets the best performance. The study finds that sentiment representation could serve as a viable raw material substitute further showing that a simplified approach can be used to achieve efficiencies without sacrificing accuracy for practical applications. This provides a solid reference for future studies that intend to develop fast and accurate classification models for airline reviews.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, face restoration methods based on deep learning with or without GAN prior have two main problems: retaining less identity information of the original input image and insufficient utilization of facial structure information. In order to solve the mentioned problems, we propose an encoder-decoder architecture face restoration network with style modulation called EDSM. First, skip connection and channel attention module are added to the basic network and a lightweight style modulation module is introduced to make full use of the global and local information extracted from the low-resolution (LR) face image. Meanwhile, identity loss is introduced to preserve identity information and a multi-scale discriminator is added to constitute the EDSM-plus network. Experiments have shown that the proposed EDSM and EDSM-plus have good face restoration performance in the Helen dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The number of traffic accidents increases every year, and most of these accidents are caused by driver distraction. In countries with less developed road infrastructure, the number of accidents is considerably higher, just like in Brazil. Since distraction is one of the leading causes of accidents, there is a need for mechanisms that prevent drivers from becoming distracted. This paper shows the development of an intelligent image-based driver distraction detection system. Assuming interesting approaches considering neural networks (ANN) to solve the problem based on databases such as State Farm Distracted Driver Detection (SFD3) or AUC Distracted Driver V2 (AUCD2), this study aims to apply the transfer learning technique to obtain better performance and accuracy considering a smaller database. Assuming that the model must have a reduced architecture to be used in an embedded system, models based on convolutional neural networks (CNN) were chosen. Using transfer learning, it was possible to obtain a hit rate of 92.20% in AUCD2 and 64.47% considering the dataset proposed in this study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.