With the advancement of industrial automation, crewless operation has become a primary requirement for achieving "dark factory" requirements. This paper introduces a video-based understanding algorithm for crane hoisting status detection within industrial settings. An I3D-ResNet network enhanced with optical flow features is employed to classify the hoisting status. This algorithmic process can precisely capture and understand the dynamic changes during the hoisting process, significantly improving event recognition accuracy. Compared to using the I3D-ResNet network alone, the optical flow-enhanced network has demonstrated through experimental results its effectiveness in accurately recognizing hoisting status, providing robust support for achieving crewless production in industrial environments in the near future.
The granularity of ore particles is crucial for the efficiency and sustainability of steel smelting operations. This study introduces an innovative segmentation technique for ore particles utilizing a depth camera to capture point cloud data on conveyor belts, which is then processed into a two-dimensional image for analysis. By employing a novel dual-encoding neural network structure, Swin-FUNet, for semantic segmentation, followed by morphological operations and concave point segmentation, the method significantly enhances the accuracy of particle segmentation. Comparative experiments confirm the effectiveness of this approach, offering potential improvements in material utilization, smelting efficiency, and equipment longevity for the steel industry.
Banks often need to collect and store customers' personal information due to the nature of their business. While this business model provides convenience to customers, it also increases the risk of personal information leakage. The widespread use of monitoring devices has led to rapid development in intelligent supervision technology. This research focuses on the actual work scenario at bank counters, aiming to intelligently recognize tellers' behavior in using mobile phones. Considering the requirements for both model accuracy and real-time performance, we propose our model based on YOLOv8 for the mobile phone detection task. To address the accuracy issues in the original model, the paper enhances the model's learning on hard samples and introduces an attention mechanism to improve accuracy. To tackle the real-time performance issues in original model, the paper optimizes convolution structures and adopts model channel pruning to enhance inference speed. Our model achieves a great balance between accuracy and real-time performance through comparative experiments before and after improvement.
Location identification is a research hot spot in computer vision. For a scene with the building, the location recognition method in this paper can accurately detect the building and identify the location. The specific method is to firstly determine the local feature extraction method to obtain more stable local features under different conditions. Secondly, encode image features effectively, build Scene codebook, establish image index, and compare image similarity. Fast and large-scale image retrieval can be achieved in this way. Then, in order to filter out the error matching results and choose the best matching result, a matching algorithm based on local spatial consistency is proposed. The shape model voting method with small calculation amount is proposed to obtain the position of the building in the scene picture. The experimental results show that the method can more accurately identify the location of the building, and the building images show good robustness and distinguishability when they are transformed.
Nowadays ground vehicle detection on airborne platforms is becoming very important for intelligent visual surveillance applications. Object detection using cascade structured classifiers is booming fast in recent decade, and very successful in real-time applications. However, most of them apply a sliding window on multi-scaled images which commonly need heavy computational expense, therefore, are only suitable for using simple features. In this paper, a biologically inspired object detection algorithm is proposed, which exploits image patch based feature learning and visual saliency detection. The image patch based local features are learnt by unsupervised learning to generate an object category specific visual dictionary. Visual saliency detection is performed to extract candidate object regions from a whole image using the learnt local features. Instead of a sliding window, a candidate object region is sent to an object classifier only when its features are salient on the whole image. Since the number of candidate object regions decreases dramatically, it allows to utilize much complex features to represent object images so that it can increase the descriptive capability of the learnt features. The experimental results on practical vehicle image datasets indicate that less computational expense and good detection performance can be achieved.
To tackle occlusions, a hierarchical part matching method based on a layered appearance model for object tracking is
presented in this paper, which integrates global and partial region matching together to search the target object in a
coarse to fine manner. In order to reduce the ambiguity of object localization, only the discriminative parts are selected
for similarity computing with respect to their cornerness measure. The similarity between parts is computed in a layerwised
manner, based on which the state of occlusions can be inferred correctly. When occluded partially, the object can
be localized accurately, when occluded completely, the historical information of motion is applied to predict its position
by a Kalman filter. The proposed tracking method is tested on practical video sequences, and the experimental results
show it can consistently provides accurate positions of the object for stable tracking, even under severe occlusions.
One of the major challenges of object tracking is to tackle appearance variations, possibly caused by the change of object
postures, size, and occlusions. In this paper an adaptive tracking system is presented, which integrates online semisupervised
classification and particle filter efficiently. To identify object pixels from background accurately, classifiers
are trained online using real Adaboost which performs much better than its discrete version. In the system, uncorrelated
features, color and texture are adopt to train two classifiers separately; the classifiers fused by voting generate confidence
score for each pixel measuring its belonging to object or background in candidate regions; accumulated scores in each
region are feed to particle filter for estimating object states; pixels with high scores augment the training set mutually and
further classifiers are updated by co-training. The system is applied to vehicle and pedestrian tracking in real world
scenarios and the experimental results show its robustness to large appearance variations and severe occlusions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.