PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298401 (2024) https://doi.org/10.1117/12.3026225
This PDF file contains the front matter associated with SPIE Proceedings Volume 12984, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298402 (2024) https://doi.org/10.1117/12.3015658
Animal detection and recognition is a crucial task in computer vision. YOLOv5 has been widely used for animal identification in the past few years. However, it is still a challenging task due to the diverse array of animal types found in complex environments. In this paper, we introduce a new attention mechanism based on the CBAM attention mechanism to enhance the performance of the network model. Specifically, the attention mechanism enhances the interplay between globally pooled channel information, thereby bolstering the ability to detect and recognize animals with similar features in complex backgrounds. Experimental results on the Oxford-IIIT Pet validation dataset demonstrate the effectiveness of the proposed model's robustness and its ability to perform effectively in real-world scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298403 (2024) https://doi.org/10.1117/12.3017904
Marker recognition is vital in machine vision and applicable within many fields, such as vehicle automatic guidance, insect pest estimation, and UAV trajectory planning. The influence of illumination and complex backgrounds make such recognition applications very challenging. This paper describes a color Spatio-temporal decomposition algorithm as applied to video images to recognize markers. In the proposed method, the Vectorial Rudin-Osher-Fatemi model weakens the textural component of the image sequences to minimize background complexity for image segmentation. The impact of illumination is reduced by transforming the color space of the obtained image sequences into HSV and equalizing the histogram for the Value channel. Three different types of markers were tested under different light intensities and environments to verify the effectiveness of the algorithm. The proposed method improved the accuracy of edge detection in image segmentation and successfully minimized the interference of illumination. The algorithm also showed favorable good robustness under various vegetation density environments, with a recognition rate of about 95%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Carlos Ribeiro, Monica Figueiredo, Pedro Assunção, Lino Ferreira, João Gil, Xavier Bento
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298404 (2024) https://doi.org/10.1117/12.3015817
In many modern industries the production lines are very fast-paced environments with repetitive and intricate motions where humans and machines often co-exist. Manufacturers are always looking for ways to minimize breakdowns and failures to improve productivity and efficiency. This work is an outcome of the collaborative R&D project VIEXPAND AI - a real-time AI-boosted solution that complements and expands human supervision with 24/7 ‘smart eyes’ in a container glass industry application. The goal is to reduce production downtime, accidents, waste of raw materials and energy, as well as improve the industrial work conditions. To accomplish this, we propose an architecture where AI methods and techniques are implemented on the edge, to allow real-time supervision of multiple sites with centralized remote monitoring. FPGA System-on-Chip (SoC) devices are used to implement the video processing, multiplexing and encoding/decoding stages, as well as the AI engine used for object detection and classification. This heterogeneous technology allows us to distribute processing tasks over different hardware modules available on-chip (the multiprocessor unit, hard-cores and soft-cores), thus enabling real-time operation. This paper evaluates the use of YOLOX models in a Xilinx Zynq®UltraScale+TM Multiprocessor System-on-Chip (MPSoC) Deep-Learning Processing System (DPU). It presents a study on the performance of the models when trained with different input sizes and custom datasets, obtained on the factory floor. The impact of different design choices on performance metrics is reported and discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298405 (2024) https://doi.org/10.1117/12.3020928
Human-Object Interaction(HOI) recognition in videos aims to classify the interaction states of human and objects within each video segment of the activity. Each video segment is an atomic activity and these atomic activities can constitute a high-level activity according to a certain temporal relationship. The existence of the temporal relationship between segments indicates that there is a local constraint between a video segment and its following segment. The existing HOI recognition models do not consider the previous segment output when predicting the interaction state of the segment. So it is difficult to learn the accurate relationship between segments. Therefore, we propose a method that uses explicit knowledge to guide the networks to capture strict relations between segments. First, the transition relationships of interaction states between segments in the dataset are summarized and filtered as prior knowledge. Then we use graphs to express the extracted knowledge. We inject prior knowledge into transition matrices of conditional random fields to model this local constraint relationship between interaction states. In terms of micro and macro evaluation criteria, the knowledge guidance method proposed by us has achieved better results than the state-of-the-art on CAD-120 dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298406 (2024) https://doi.org/10.1117/12.3017909
The tracking algorithm for swine plays a pivotal role in efficiently extracting the movement trajectories and quantifying the motion patterns of pigs, thereby serving as an indicator of their physical well-being. Consequently, the task of swine tracking assumes paramount significance. Addressing the issues of low automation, significant error rates, and poor real-time performance in pig tracking, this study introduced a deep learning-based algorithm for swine tracking. It encompasses the development of a pig target detection model based on RetinaNet and introduces an innovative strategy for swine trajectory tracking incorporating time-series information, facilitating real-time tracking of multiple pig targets. The results from algorithm testing demonstrated the effectiveness of the swine target detection algorithm based on RetinaNet, with an AP50 of 0.998, AP75 of 0.907, AP90 of 0.606, and an operational speed of 42.3 tasks per second. This underscored the algorithm's capacity to proficiently detect pig target categories and delineate precise target bounding boxes. In terms of swine target detection, the multi-object trajectory tracking algorithm achieved an average Multi-Object Tracking Precision of 2.37 pixels, equivalent to approximately 1.83 cm in distance. Furthermore, it attained an average Multi-Object Tracking Accuracy of 97.44%, thus substantiating its aptitude for the effective tracking of multiple pig targets with an exceptional level of tracking precision and consistency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual Based Image Analysis and Data Visualization
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298407 (2024) https://doi.org/10.1117/12.3013289
A graph convolutional network (GCN) has demonstrated impressive success in hand pose and shape estimation, due to its high interpretability and powerful capability for dealing with non-Euclidean data. In traditional GCN-based hand pose and shape estimation methods, the Chebyshev spectral graph convolution is most widely-used, and it is directly introduced to a simple multilayer network. In terms of the form, this graph convolution does not resemble a standard 2D convolution on an image. In terms of the practical effect, this graph convolution equally treats a center node and its neighbors. Inspired by action recognition studies, we introduce an adaptive graph convolution to hand pose and shape estimation, which not only considers the difference between a center node and its neighbors, but also considers the edge importance. Based on the adaptive graph convolution, we design a multilayer graph residual network with a double-skip-connection architecture. Extensive ablation studies are conducted to demonstrate the improvements due to the use of the adaptive graph convolution and the advantages of the graph residual network. Our method outperforms recent baselines on the public FreiHAND hand pose and shape estimation dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298408 (2024) https://doi.org/10.1117/12.3015832
The amount of semantic contribution of each object in an image to understand the scene is different. Understanding the relative importance of objects in a scene for scene semantics is critical for various computer vision applications, such as scene recognition and image captioning. In this paper, we refer to the contribution of an object to scene semantics as the degree of its gist and propose a method for Estimating the degree of the Gist of an Instance (EGoI). In the EGoI method, an object gist degree is estimated by the semantic features comparison strategy and the semantic distance comparison strategy. In the first strategy, the image is represented as a scene graph first, then the aggregation features and the node features of different graph node combinations are calculated to estimate the instance gist that is not in the combination of the nodes. In the second strategy, the captions of the complete and incomplete images are generated, then the semantic distance of these captions is used to estimate the instance gist deleted in the scene. Different strategies for estimating the gist degree of instances are tested in the experiments. The results show that the proposed method can effectively quantify the contribution of an instance to scene semantics. Among these strategies, the method that compares semantic features has better discriminative power for the gist degree of various instances in the scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 1298409 (2024) https://doi.org/10.1117/12.3020879
Brain computer interface (BCI) is an emerging technology where the user can establish direct communication between the electrical device and himself without any physical exertion. The EEG signal is a noninvasive and low-cost method to extract brain signal from subject. The EEG signal contains different types of information including motor sensory information originating from the motor cortex region of the brain. Research and study have shown that motor cortex generates signals similar to the signals generated during deliberate limb movements. Therefore, motor imagery (MI) signals if extracted can be utilized to operate any electrical device establishing a BCI system. However, the EEG data can contain lots of artifacts. This degrades the signal quality and also cause false positive command to the connected device. Therefore, it is crucial to remove the artifacts from the EEG signal before classification. In this project, EEG data has been collected from 12 subjects who are instructed to perform MI activity. The EEG signal is then processed and an efficient artifact removal technique has been applied. The artifact removal method applies wavelet transform theorem and artifactual probability mapping method to detect artifactual epochs and eliminate it from the signal. Useful features are then extracted from the signal and artificial neural network (ANN) classifier is applied to it. The classification accuracy has been enhanced by 15-16% on average after removal of artifacts from the EEG recordings for MI-BCI experiments. Afterwards, performance evaluation such as finding signal to noise ratio has been done to evaluate the improvement in the signal after noise removal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 129840A (2024) https://doi.org/10.1117/12.3021373
With the development of high and new technologies such as 5G communication and unmanned driving, the road traffic environment is also changing quietly. The following model based on traditional traffic parameters can no longer accurately describe the following behavior of vehicles in the intelligent traffic environment. Based on the study of micro-traffic popularity in V2V environment and the reference to the traditional following up model and intelligent driving model, a micro-traffic flow model based on V2V environment is constructed. Based on the control theory, the Laplace transform of the model is carried out to obtain the transfer function under the following model, and the stability of the traffic flow system is studied through the analysis of the transfer function. Analysis results show that the thinking of the leading vehicle and following the speed of the car is poor and car impact on the stability of traffic flow, traffic flow stability critical sensitivity coefficient decreased, stable area, set up by the new model is of high stability, can be more comprehensive description of the V2V environment with chi behavior characteristics of microscopic traffic flow.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xian Xu, Yingchun Chen, Xiong Huang, Zhirong Han, Hang Chen
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 129840B (2024) https://doi.org/10.1117/12.3020864
Ice detection of supercooled large droplets(SLD) is a key technology for ensuring aircraft flight safety. With reference to the advisory circular and combination of the flight scenarios, this paper studies the visual cue technology of SLD using numerical simulation method based on the Common Research Model. Unstructured grids are generated for the entire model, and FENSAP-ICE software is used to obtain the droplet impingement characteristics. Sensitivity analysis of the icing and flight parameters is carried out. The droplet impingement characteristics after encountering normal droplets and SLD conditions are analyzed. The reference component and it’s specific visual location for visual cue are determined.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Computer Vision and Information Technology (CVIT 2023), 129840C (2024) https://doi.org/10.1117/12.3021812
Touch is an important way that a human being senses the surrounding environment, but in robotic applications, it is difficult to obtain static and dynamic tactile signals simultaneously. In this paper, we proposed a new vison-based tactile sensor to estimate pressure and slippage distance at the same time. The sensor recognizes the deformation degree of elastomer through image processing, and the pressure is estimated according to the radius of contact region. Sensor captures the surface of the contact object and tracks feature point to calculate optical flow. Then the slippage distance is estimated by Kalman filtering and integration of the optical flow. The sensor is realized in a small package, so it can be useful in wide range of scenarios. Here we also built a two-dimensional experimental platform to test the sensor, and the experimental results show that the average error of pressure estimation is 6.4%, and the average error of slippage estimation is 14.4% at 5mm/s, demonstrating a good sensing performance for providing pressure and slippage via the single contact surface.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.