KEYWORDS: Visualization, Principal component analysis, Diagnostics, Visual process modeling, Visual analytics, Alzheimer's disease, Machine learning, Data processing, Data modeling
Visual attention and its modeling are getting more and more focus during the past decades. It has been used for several years in various fields, such as the automotive industry, robotics, or even in diagnostic medicine. So far, the research has focused mainly on the generalization of the collected data, although on the contrary, the identification of unique features of the visual attention of the individuals remains an open research topic. The aim of this paper is to propose a methodology which is able to cluster people into groups based on individualities in their visual attention patterns. Unlike the former research approaches focused on the classification problem where the class of the subjects is required to be known, we focus our work on the open research problem of unsupervised machine learning based on the measured data about subjects’ visual attention, solely. Our methodology is based on the clustering method which utilizes individual feature vectors created from measured visual attention data. Proposed feature vectors forming up the fingerprint of the attention of an individual are based on the direction of saccades of individuals. Our proposed methodology is designed to work with a limited set of the measured eye-tracking data without any additional information.
Breast cancer is one of the most widespread causes of women’s death worldwide. Successful treatment can be achieved only by the early and accurate tumor diagnosis. The main method of tissue diagnosis taken by biopsy is based on the observation of its significant structures. We propose a novel approach of classifying microscopy tissue images into 4 main cancer classes (normal, benign, In Situ and invasive). Our method is based on comparing and determining the similarity of the new tissue sample with previously by specialists annotated examples that are compiled in the collection with other labeled samples. The most probable class is statistically determined by comparing a new sample with several annotated samples. The usual problem of medical datasets is the small number of training images. We have applied suitable dataset augmentation techniques, using the fact that flipping or mirroring of the sample does not change the information about the diagnosis. Our other contribution is that we show the histopathologist the reason why the algorithm has classified tissue into the particular cancer class by ordering the collection of correctly annotated samples by their similarity to the input sample. Histopathologists can focus on searching for the key structures corresponding to the predicted classes.
In this work, we deal with a brain tumor segmentation problem from magnetic resonance imaging (MRI), considered financially and time demanding when carrying out manually. To tackle this specific and complex domain problem, convolutional networks have proved competent due to significantly better performance than standard segmentation approaches. Therefore, within our research, we propose an approach which is dealing with tumor segmentation. During the elaboration, we propose multiple architectures, training phases and evaluation metrics in order to facilitate reliable and automatic delineation of tumorous tissues. For this purpose, we proposed a novel adaptation of the Tversky index loss formula to avoid label imbalance.
There are numerous cues which influence human visual attention. Some of the cues cannot be explored by the conventional eye-tracking studies which makes use of a pictorial data presented to the observers on common displays. Depth perception occurs naturally in the real three-dimensional environment and, therefore, the depth cues are one of them. However, the eye-tracking studies in the real environment and their evaluation are complicated to carry out with a relevant number of participants while maintaining the laboratory conditions. We propose an experimental study methodology for exploring the depth perception tendencies during the free-viewing task on a widescreen display in a laboratory. This method is beyond the current hardware capabilities of the static eye-trackers mounted on the displays. Therefore, the eye-tracking glasses were used in the study to measure the attention data. We carried out the proposed study on a sample of 25 participants and created a novel dataset suitable for further visual attention research. The depth perception tendencies on a widescreen display were evaluated from the acquired data and the results were discussed in the context of the previous similar studies. Our results revealed some differences in the depth perception tendencies in comparison to the previous studies with the two-dimensional pictorial data and resembled some depth perception tendencies observed in the real environment.
The aim of this paper is to propose a novel method to explain, interpret and support the decision-making process of deep Convolutional Neural Network (CNN). This is achieved by analysing neuron activations of trained 3D-CNN on selected layers via Gaussian Mixture Model (GMM) and custom binary encoding of both training and test images based on their activation’s affiliation to computed GMM components. Based on the similarity of encoded image representations, the system is able to retrieve most activation-wise similar atlas (training) images for given test image and therefore support and clarify its decision. Possible uses of this method include mainly Computer-Aided Diagnosis (CAD) systems working with medical imaging data such as magnetic resonance (MRI) or computed tomography (CT) scans. Network’s decision interpretation in the form of similar domain examples (images) is natural to the work-flow of the system’s operating medical personnel.
An extensive research has been held in the field of the visual attention modelling throughout the past years. However, the egocentric visual attention in real environments has still not been thoroughly studied. We introduce a method proposal for conducting automated user studies on the egocentric visual attention in a laboratory. Goal of our method is to study distance of the objects from the observer (their depth) and its influence on the egocentric visual attention. The user studies based on the method proposal were conducted on a sample of 37 participants and our own egocentric dataset was created. The whole experimental and evaluation process was designed and realized using advanced methods of computer vision. Results of our research are ground-truth values of the egocentric visual attention and their relation to the depth of the scene approximated as a depth-weighting saliency function. The depth-weighting function was applied on the state-of-the-art models and evaluated. Our enhanced models provided better results than the current depthweighting saliency models.
Many domain specific challenges for feature matching and similarity learning in computer vision have been relying on labelled data, either using heuristic or more recent approaches via deep learning. While aiming for precise solutions, we need to process larger number of features which may result in higher computational complexity. This paper proposes a novel method of similarity learning through two-part cost function as it could be done using heuristic approaches in original feature space in an unsupervised manner, while also reducing feature complexity. The features are encoded on the lower dimensionality manifold which preserve original structure of data. This approach takes advantage of siamese networks and autoencoders to obtain compressed features while maintaining same distance properties as in the original feature space. This is done by introducing new loss function with two terms, which aims for good reconstruction as well as learning the similar data point neighborhood from encoded and reconstructed feature space.
Computational models predicting stimulus-driven human visual attention usually incorporate simple visual features, such as intensity, color and orientation. However, saliency of shapes and their contour segments influence attention too. Therefore, we built 30 own shape saliency models based on existing shape representation and matching techniques and compared them with 5 existing saliency methods. Since available fixation datasets were usually recorded on natural scenes where various factors of attention are present, we performed a novel eye-tracking experiment that primarily focuses on shape and contour saliency. Fixations from 47 participants who looked at silhouettes of abstract and realworld objects were used to evaluate the accuracy of proposed saliency models and investigate which shape properties are most attentive. The results showed that visual attention integrates local contour saliency, saliency of global shape features and shape dissimilarities. Fixation data also showed that intensity and orientation contrasts play an important role in shape perception. We found that humans tend to fixate first irregular geometrical shapes and objects whose similarity to a circle is different from other objects.
KEYWORDS: Visualization, Video, Glasses, Data modeling, RGB color model, Visual process modeling, 3D image processing, 3D modeling, Eye, Image segmentation
Most of the existing solutions predicting visual attention focus solely on referenced 2D images and disregard any depth information. This aspect has always represented a weak point since the depth is an inseparable part of the biological vision. This paper presents a novel method of saliency map generation based on results of our experiments with egocentric visual attention and investigation of its correlation with perceived depth. We propose a model to predict the attention using superpixel representation with an assumption that contrast objects are usually salient and have a sparser spatial distribution of superpixels than their background. To incorporate depth information into this model, we propose three different depth techniques. The evaluation is done on our new RGB-D dataset created by SMI eye-tracker glasses and KinectV2 device.
Image oversegmentation creates small, compact, and irregularly shaped regions subject to further clustering. Consideration of texture characteristics can improve the resulting quality of the clustering process. Existing methods based on an orthogonal transform into frequency domain can extract texture features of arbitrarily shaped regions only from inscribed rectangles. We propose a method for extracting texture features of entire arbitrarily shaped image regions using orthogonal transforms. Furthermore, we introduce a mathematically correct method for unifying spectral dimensions that is necessary for accurate comparison and classification of spectra with different dimensions. The proposed method is particularly suitable for classifying areas with periodic and quasiperiodic textures. Our approach exploits the texture periodification property of certain orthogonal transforms that is based on insertion of zeros into the spectrum. We identified some of those orthogonal transforms which possess this important property and also provide mathematical proofs of our claims. Last, we show that inclusion of luminance and chrominance components into the feature vector increases the precision of the proposed method which then becomes suitable for natural scene images as well.
In this paper, we propose an enhanced method of 3D object description and recognition based on local descriptors using RGB image and depth information (D) acquired by Kinect sensor. Our main contribution is focused on an extension of the SIFT feature vector by the 3D information derived from the depth map (SIFT-D). We also propose a novel local depth descriptor (DD) that includes a 3D description of the key point neighborhood. Thus defined the 3D descriptor can then enter the decision-making process. Two different approaches have been proposed, tested and evaluated in this paper. First approach deals with the object recognition system using the original SIFT descriptor in combination with our novel proposed 3D descriptor, where the proposed 3D descriptor is responsible for the pre-selection of the objects. Second approach demonstrates the object recognition using an extension of the SIFT feature vector by the local depth description. In this paper, we present the results of two experiments for the evaluation of the proposed depth descriptors. The results show an improvement in accuracy of the recognition system that includes the 3D local description compared with the same system without the 3D local description. Our experimental system of object recognition is working near real-time.
The transportation of hazardous goods in public streets systems can pose severe safety threats in case of accidents.
One of the solutions for these problems is an automatic detection and registration of vehicles which are marked
with dangerous goods signs. We present a prototype system which can detect a trained set of signs in high
resolution images under real-world conditions. This paper compares two different methods for the detection:
bag of visual words (BoW) procedure and our approach presented as pairs of visual words with Hough voting.
The results of an extended series of experiments are provided in this paper. The experiments show that the
size of visual vocabulary is crucial and can significantly affect the recognition success rate. Different code-book
sizes have been evaluated for this detection task. The best result of the first method BoW was 67% successfully
recognized hazardous signs, whereas the second method proposed in this paper - pairs of visual words and Hough
voting - reached 94% of correctly detected signs. The experiments are designed to verify the usability of the two
proposed approaches in a real-world scenario.
The detection of pose invariant planar patterns has many practical applications in computer vision and surveillance
systems. The recognition of company logos is used in market studies to examine the visibility and frequency of logos in
advertisement. Danger signs on vehicles could be detected to trigger warning systems in tunnels, or brand detection on
transport vehicles can be used to count company-specific traffic. We present the results of a study on planar pattern
detection which is based on keypoint detection and matching of distortion invariant 2d feature descriptors. Specifically
we look at the keypoint detectors of type: i) Lowe's DoG approximation from the SURF algorithm, ii) the Harris Corner
Detector, iii) the FAST Corner Detector and iv) Lepetit's keypoint detector. Our study then compares the feature
descriptors SURF and compact signatures based on Random Ferns: we use 3 sets of sample images to detect and match
3 logos of different structure to find out which combinations of keypoint detector/feature descriptors work well.
A real-world test tries to detect vehicles with a distinctive logo in an outdoor environment under realistic lighting and
weather conditions: a camera was mounted on a suitable location for observing the entrance to a parking area so that
incoming vehicles could be monitored. In this 2 hour long recording we can successfully detect a specific company logo
without false positives.
Grain size of forged nickel alloy is an important feature for the mechanical properties of the material. For fully automatic grain size evaluation in images of micrographs it is necessary to detect the boundaries of each grain. This grain boundary detection is influenced directly by artifacts like scratches and twins. Twins can be seen as parallel lines inside one grain, whereas a scratch can be identified as a sequence of collinear line segments that can be spread over the whole image. Both kinds of artifacts introduce artificial boundaries inside grains. To avoid wrong grain size evaluation, it is necessary to remove these artifacts prior to the size evaluation process. For the generation of boundary images various algorithms have been tested. The most stable results were achieved by grayscale reconstruction and a subsequent watershed segmentation. A modified line Hough transform with a third dimension in the Hough accumulator space, describing the distance of the parallel lines, is used to directly detect twins. Scratch detection is done by applying the standard line Hough transform followed by a rule based segment detection along the found Hough lines. The results of these operations give a detection rate of more than 90 percent for twins and more than 50 percent for scratches.
A novel solution for automatic hardwood inspection is presented. A sophisticated multi sensor system is required for reliable results. Our system works on a data stream of more than 50 MByte/Sec in input and up to 100 MByte/Sec inside the processing queue. The algorithm is divided into multiple steps. Along a fixed grid the images are decomposed into small squares. 55 texture- and color features are computed for each square. A Maximum Likelihood classifier assigns each square to one out of 12 defect classes with a recognition rate better than 97%. Depending on the defect type a dedicated threshold operation is performed for segmentation. Threshold levels and the selection of the input channel (RGB + filtered images) is the result of the former classification step. A fast algorithm computes bounding rectangles from blobs. Defect type dependent rules are used to combine rectangles. Two additional fast high resolution 3D measurement systems add board shape and 3D defect information. All defect rectangles are passing an additional plausibility check in the last data fusion process before they are delivered to the optimization computer. To guarantee a short response time, image acquisition and image processing are performed in parallel on parallel computing hardware.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.