PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
1Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB (Germany) 2Karlsruher Institut für Technologie (Germany) 3Karlsruher Institut für Technologie, Institute of Industrial Information Technology (Germany)
This PDF file contains the front matter associated with SPIE Proceedings Volume 12623, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Thermographic imaging is applied to measure the shear flow at a wind driven water surface, an essential parameter to understand exchange of momentum, heat and mass between the atmosphere and the oceans. Only a thin line less than 1 mm thick perpendicular to the wind direction is heated with a penetration depth matched to the thickness of the shear layer at the water surface. With pulsed irradiation the shear can be estimated, while continuous irradiation is suitable to measure the orbital velocities of the wind waves. Motion fields and shear are computed by a generalized optical flow approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new approach is described to image near-surface water side concentration fields at the air-water interface and wind waves at the Heidelberg Aeolotron wind-wave tank to study the transport mechanisms of air-sea gas exchange. The concentration fields are made visible by fluorescence imaging, stimulated by a 450 nm laser diode array, with 1-propylamine and pyranine. Light field imaging with seven cameras retrieves also the 3-D shape of the water surface. An additional laser line with 410 nm laser diodes is used to measure wind wave height directly and for precise camera alignment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fog dramatically compromises the overall visibility of any scene, critically affecting features such as objects' illumination, contrast, and contours. The decrease in visibility compromises the performance of Computer Vision algorithms such as pattern recognition and segmentation, some of them very relevant to decision-making in the rise of autonomous-driven vehicles. Many dehazing methods have been proposed. However, to the best of our knowledge, all currently used metrics do compare the defogged image to its ground truth, usually the same scene on a non-foggy day, or estimate physical parameters from the scene. This hinders progress in the field, as obtaining proper ground truth images is not always possible and becomes costly and time-consuming because physical parameters greatly depend on the scene conditions. This work aims to tackle this issue by proposing a real-time operating defogging network that only takes an RGB image of the fogged scene as input, performs the defogging, and uses a contour-based metric for Single Image Defogging evaluation even when the ground truth is not available, which is the most common situation. The proposed metric only requires the original hazy image and the image after the defogging procedure. We trained our network using a novel two-stage pipeline with the DENSE dataset and compared our method and metric with currently used metrics and other defogging techniques with the NTIRE 2018 defogging challenge to prove their effectiveness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rising vaccine production and complex visual characteristics of freeze-dried products have highlighted a critical need for accurate, high-speed automated quality control. Current inspection procedures, that rely on human vision or line cameras, have undesirable error rates. We propose a novel use of polarimetric imaging for defect capture and compare the performance of polarimetric imaging to RGB imaging for defect detection on vaccine vials with freeze-dried product. Vaccine vials with artificial defects (scratches and fibers) and without defects but with product appearance variations (streaks) are prepared. We capture a data set of RGB images and polarimetric images: Polarization Intensity (PI), Degree of Linear Polarization (DoLP), Angle of Polarization (AoP). We find that the differences between product variation and defects in RGB images are not statistically significant with α = 0.01 (t(8) = 2.088 for scratch vs. streak, t(8) = 2.789 for fiber vs. streak). In contrast, the differences between product variation and defects for polarimetric imaging are statistically significant for all polarization characteristics with α = 0.01 (PI: t(8) = 39.753 for scratch vs. streak, t(8) = 13.039 fiber vs. streak, DoLP: t(8) = 16.537 for scratch vs. streak, t(8) = 17.018 for fiber and streak, AoP: t(8) = 6.764 for scratch vs. streak, t(8) = 4.702 for fiber vs. streak). This indicates that polarimetric imaging may be used as a more effective technique than RGB imaging for defect detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Learning models from synthetic image data rendered from 3D models and applying them to real-world applications can reduce costs and improve performance when using deep learning for image processing in automated visual inspection tasks. However, sufficient generalization from synthetic to real-world data is challenging, because synthetic samples only approximate the inherent structure of real-world images and lack image properties present in real-world data, a phenomenon called domain gap. In this work, we propose to combine synthetic generation approaches with CycleGAN, a style transfer method based on Generative Adversarial Networks (GANs). CycleGAN learns the inherent structure from real-world samples and adapts the synthetic data accordingly. We investigate how synthetic data can be adapted for a use case of visual inspection of automotive cast iron parts and show that supervised deep object detectors trained on the adapted data can successfully generalize to real-world data and outperform object detectors trained on synthetic data alone. This demonstrates that generative domain adaptation helps to leverage synthetic data in deep learning-assisted inspection systems for automated visual inspection tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of automatic defect detection, one of the major challenges for training accurate classifiers using supervised learning is the insufficient and limited diversity of datasets. Obtaining an adequate amount of image data depicting defective surfaces in an industrial setting is costly and time-consuming. Furthermore, the collected dataset may suffer from selection bias, resulting in underrepresentation of certain defect classes. This research aims to tackle the issue of surface defect detection in titanium metal spacer rings by introducing a novel approach that leverages a digital twin framework. The behavior of the digital representation is optimized using a reinforcement learning algorithm. Subsequently, the optimized digital twin is utilized to generate synthetic data, which is then employed to train a spacer defect detection classifier. The performance of this classifier is evaluated using real-world data. The results illustrate that the model trained with synthetic data outperforms the one trained on a limited amount of real data. This work emphasizes the potential of digital twin-based synthetic data generation and reinforcement learning optimization in enhancing spacer surface defect detection and addressing the data scarcity challenge in the field. When the generated synthetic data and real data combined is used to train inspection network, the inspection background accuracy reaches 93.07% and defect detection accuracy reaches 94.2% surpassing the defect detection performance of inspection network trained only using real data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study presents a method to generate synthetic microscopic surface images by adapting the pre-trained latent diffusion model Stable Diffusion and the pre-trained text encoder OpenCLIP-ViT/H. A confocal laser scanning microscope was used to acquire the dataset for transfer learning. The measured samples include metallic surfaces processed with different abrasive machining methods like grinding, polishing, or honing. The network is trained to generate microtopographies with these machining methods, with different materials (for example, aluminum, PVC, and steel) and roughness values (for example, milling with Ra=0.4 to Ra =12.5). The performance of the network is evaluated through visual inspection, and the objective image quality measures Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Frechet Inception Distance (FID). The results demonstrate that the proposed method can generate realistic microtopographies, albeit with some limitations. These limitations may be due to the fact that the original training data for the Stable Diffusion network used mostly images from the Internet, which often show people or landscapes. It was also found that the lack of post-processing of the synthetic images may lead to a reduction in perceived sharpness and less finely detailed structures. Nevertheless, the performance of the model demonstrates a promising and effective approach to surface metrology and materials science, contributing to fields such as materials science and surface engineering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The automation of inspection processes in aircraft engines comprises challenging computer vision tasks. In particular, the inspection of coating damages in confined spaces with hand-held endoscopes is based on image data acquired under dynamic operating conditions (illumination, position and orientation of the sensor, etc.). In this study, 2D RGB video data is processed to quantify damages in large coating areas. Therefore, the video frames are pre-processed by feature tracking and stitching algorithms to generate high-resolution overview images. For the subsequent analysis of the whole coating area and to overcome the challenges posed by the diverse image data, Convolutional Neural Networks (CNNs) are applied. In a preliminary study, it was found that the image analysis is advantageous when executed on different scales. Here, one CNN is applied on small image patches without down-scaling, while a second CNN is applied on larger down-scaled image patches. This multi-scale approach raises the challenge to combine the predictions of both networks. Therefore, this study presents a novel method to increase the segmentation accuracy by interpreting the network results to derive a final segmentation mask. This ensemble method consists of a CNN, which is applied on the predictions of the given patches from the overview images. The evaluation of this method comprises different pre-processing techniques regarding the logit outputs of the preceding networks as well as additional information such as RGB image data. Further, different network structures are evaluated, which include own structures specifically designed for this task. Finally, these approaches are compared against state-of-the-art network structures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning techniques are commonly utilized to tackle various computer vision problems, including recognition, segmentation, and classification from RGB images. With the availability of a diverse range of sensors, industry-specific datasets are acquired to address specific challenges. These collected datasets have varied modalities, indicating that the images possess distinct channel numbers and pixel values that have different interpretations. Implementing deep learning methods to attain optimal outcomes on such multimodal data is a complicated procedure. To enhance the performance of classification tasks in this scenario, one feasible approach is to employ a data fusion technique. Data fusion aims to use all the available information from all sensors and integrate them to obtain an optimal outcome. This paper investigates early fusion, intermediate fusion, and late fusion in deep learning models for bulky waste image classification. For training and evaluation of the models, a multimodal dataset is used. The dataset consists of RGB, hyperspectral Near Infrared (NIR), Thermography, and Terahertz images of bulky waste. The results of this work show that multimodal sensor fusion can enhance classification accuracy compared to a single-sensor approach for the used dataset. Hereby, late fusion performed the best with an accuracy of 0.921 compared to intermediate and early fusion, on our test data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The majority of industrial production processes can be divided into a series of object manipulation and handling tasks that can be adapted for robots. Through significant advances in compliant grasping, sensing and actuation technologies, robots are now capable of carrying out human-like flexible and dexterous object manipulation tasks. During operation, robots are required to position objects within tolerances specified for every operation in an industrial process. The ability of a robot to meet these tolerances is the critical deciding factor that determines where the robot can be integrated and how proficient the robot can carry out high-precision tasks. Therefore, improving the positioning accuracy of robots can lead to new avenues for their integration into production industries. Given that tolerances in manufacturing processes are in the order of tens of micrometres or less, robots should guarantee high positioning accuracy when manipulating objects. The direct method of ensuring high accuracy is by introducing an additional measurement system(s) that can improve the inherent joint-angle-based robot position determination. In this paper, we present a High-Accuracy Robotic Pose Measurement (HARPM) system based on coordinate measurements from a multi-camera vision system. We also discuss the integration of measurements obtained by absolute distance interferometry and how the interferometric measurements can complement the vision system measurements. The performance of the HARPM system is evaluated using a laser interferometer to investigate robotic positions along a trajectory. The performance results show that the HARPM system can improve the positioning accuracy of robots from hundreds to a few tens of micrometres.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Differential perspective is a simple and cost-effective monocular distance measurement technique that works by taking two images from two different (axially separated) locations. The two images are then analysed using image processing in order to obtain the change of size for different objects within the scene. Based on this information the distances to the objects can be easily computed. We use this principle to realize a sensor for assisted driving where the camera takes two images separated by 0.32 seconds. Distances to objects (e.g. number plates, traffic signs) of up to 200 meters can be measured with satisfactory accuracy. In the presentation we explain the basic principle and the employed image processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present an extension of our automatic anomaly detection approach for the quality inspection of industrially manufactured parts. The sample under test is imaged from different perspectives simultaneously while it is in free fall to reduce inspection time and minimize part handling. Despite using a diffuse reflecting hollow sphere to achieve the best possible conditions for all camera perspectives, small artifacts from reflections on highly reflecting test specimens and drop shadows appear in the images. The presence of these artifacts leads to the appearance of type I errors. To address this issue, the state-of-the-art for anomaly detection PatchCore1 is modified to handle multiple perspectives at first. Second, a weighting step is added to the image evaluation pipeline. For this, the pose of the test sample is estimated, which is subsequently used to calculate a weight matrix per image. The weights correspond to the local viewing angle of the camera on the sample’s surface because the artifacts occur mainly at steep viewing angles. In addition, two datasets are created to evaluate the proposed approach containing sample data with single and multiple perspectives. The results show that the developed pipeline outperforms PatchCore and the original free-fall inspection setup algorithm. It reaches 95.9% AUROC for Object one and 85.7% AUROC for Object two on validation of multi-perspective datasets. Moreover, combining the proposed approach and the free-fall inspection algorithm improves the results for Object two, achieving 98% AUROC. The conducted experiments allow us to conclude that this approach has the potential to further increase robustness toward various anomalies and artifacts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vegetation on traffic routes is not only an aesthetic problem. On railways and roads, it poses a safety risk by reducing the elasticity of track beds or damaging road surfaces. Therefore, complex weed management is indispensable. This is currently achieved mainly through the extensive use of herbicides or manual removal, which pollutes the environment and incurs high costs. These negative impacts can be mitigated by an automated vegetation detection which allows efficient, targeted treatment and preventive long-term monitoring. A reliable method to achieve this is to exploit the characteristic spectral fingerprint of vegetation: Chlorophyll shows a high reflectivity in the green and infrared spectral region while strongly absorbing red and blue light. We present such a visual monitoring system comprising multiple cameras and an active illumination which is employed on railroads. The individual cameras address different spectral regions and are superimposed through a position-synchronized triggering to obtain a multi-spectral image. Multi-pixel binning greatly extends the dynamic range of the cameras and, in combination with active illumination and high-speed dark frame recording, allows operation during day and night without degradation from ambient light conditions. The system achieves about 5 mm effective resolution and can operate up to a speed of 100 km/h. This is possible through embedded real-time pre-processing and data reduction already in the camera units. A processing delay of less than 100 ms is the consequence which allows targeted actuation of weed-treatment methods (e.g., spray nozzles) during movement. In combination with GNSS-sensors geo-referenced documentation of the coverage rate is possible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we introduce an interactive multimodal vision-based robot teaching method. Here, a multimodal 3D image (color (RGB), thermal (T) and point cloud (3D)) was used to capture the temperature, texture and geometry information required to analyze human action. By our method, we only need to move our finger on an object surface, and then the heat traces left by the finger on the object surface will be recorded by the multimodal 3D sensor. By analyzing the multimodal point cloud dynamically, the accurate finger trace on the object is recognized. A robot trajectory is computed using this finger trace.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new optical system to simultaneously measure the pose, position, and surface normals of a target at high speed. In the computer vision field, it is a fundamental task to measure a motion of a target, and a high-speed motion measurement system has been researched for real-time applications. In most cases, a target is idealized as a rigid body, and the motion is described as six-DoF pose and position. However, general objects are not entirely rigid, and a non-rigid motion component is significant in some applications. We focus on dynamic projection mapping as an application example and propose an integration system of rigid and non-rigid motion sensing. In order to measure the non-rigid motion, non-rigid registration between 3D point clouds is often used, but it requires high computation cost, especially in three-dimensional nearest neighbor searches. Then, we optically measure surface normal vectors of a target from 2D images as a non-rigid motion component. To achieve a high-speed and non-interference measurement, we introduce a three-band infrared optical system and the lighting setup and evaluate the coupling efficiency by demonstrating the simultaneous measurement of a pose, position, and surface normal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Display and semiconductor manufacturing require inspection and repair process steps to increase the final product yield. To this end, it is necessary to divide into normal and defective images based on display and semiconductor images taken through an optical camera. This is a simple binary classification problem, but for the repair process, a more detailed classification technique is required. In order to automate this and solve it through deep learning, it is necessary to collect enough training data for each class. However, there are problems with certain defective classes that the deep learning model can't get enough to train. This greatly delays the time to apply the classification algorithm to the field, which adversely affects product mass production. In this paper, by using the deep learning method, sparse defective class images are naturally created, contributing to improving the performance of the final classification model. In addition, it is confirmed through experiments that artificially created images are made with the same shape and characteristics as non-made images of the same class.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To simultaneously reduce miss and over-detection in visual inspection for defect detection, asymmetric detection techniques practiced by skilled workers were helpful. A CNN-based asymmetric label smoothing method was developed to implement the techniques in a visual inspection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.