Embedded processing architectures are often integrated into devices to develop novel functions in a cost-effective medical system. In order to integrate neural networks in medical equipment, these models require specialized optimizations for preparing their integration in a high-efficiency and power-constrained environment. In this paper, we research the feasibility of quantized networks with limited memory for the detection of Barrett’s neoplasia. An Efficientnet-lite1+Deeplabv3 architecture is proposed, which is trained using a quantizationaware training scheme, in order to achieve an 8-bit integer-based model. The performance of the quantized model is comparable with float32 precision models. We show that the quantized model with only 5-MB memory is capable of reaching the same performance scores with 95% Area Under the Curve (AUC), compared to a fullprecision U-Net architecture, which is 10× larger. We have also optimized the segmentation head for efficiency and reduced the output to a resolution of 32×32 pixels. The results show that this resolution captures sufficient segmentation detail to reach a DICE score of 66.51%, which is comparable to the full floating-point model. The proposed lightweight approach also makes the model quite energy-efficient, since it can be real-time executed on a 2-Watt Coral Edge TPU. The obtained low power consumption of the lightweight Barrett’s esophagus neoplasia detection and segmentation system enables the direct integration into standard endoscopic equipment.
Computer-Aided Diagnosis (CADx) systems for characterization of Narrow-Band Imaging (NBI) videos of suspected lesions in Barrett’s Esophagus (BE) can assist endoscopists during endoscopic surveillance. The real clinical value and application of such CADx systems lies in real-time analysis of endoscopic videos inside the endoscopy suite, placing demands on robustness in decision making and insightful classification matching with the clinical opinions. In this paper, we propose a lightweight int8-based quantized neural network architecture supplemented with an efficient stability function on the output for real-time classification of NBI videos. The proposed int8-architecture has low-memory footprint (4.8 MB), enabling operation on a range of edge devices and even existing endoscopy equipment. Moreover, the stability function ensures robust inclusion of temporal information from the video to provide a continuously stable video classification. The algorithm is trained, validated and tested with a total of 3,799 images and 284 videos of in total 598 patients, collected from 7 international centers. Several stability functions are experimented with, some of them being clinically inspired by weighing low-confidence predictions. For the detection of early BE neoplasia, the proposed algorithm achieves a performance of 92.8% accuracy, 95.7% sensitivity, and 91.4% specificity, while only 5.6% of the videos are without a final video classification. This work shows a robust, lightweight and effective deep learning-based CADx system for accurate automated real-time endoscopic video analysis, suited for embedding in endoscopy clinical practice.
The majority of the encouraging experimental results published on AI-based endoscopic Computer-Aided Detection (CAD) systems have not yet been reproduced in clinical settings, mainly due to highly curated datasets used throughout the experimental phase of the research. In a realistic clinical environment, these necessary high image-quality standards cannot be guaranteed, and the CAD system performance may degrade. While several studies have previously presented impressive outcomes with Frame Informativeness Assessment (FIA) algorithms, the current-state of the art implies sequential use of FIA and CAD systems, affecting the time performance of both algorithms. Since these algorithms are often trained on similar datasets, we hypothesise that part of the learned feature representations can be leveraged for both systems, enabling a more efficient implementation. This paper explores this case for early Barrett cancer detection by integrating the FIA algorithm within the CAD system. Sharing the weights between two tasks reduces the number of parameters from 16 to 11 million and the number of floating-point operations from 502 to 452 million. Due to the lower complexity of the architecture, the proposed model leads to inference time up to 2 times faster than the state-of-the-art sequential implementation while retaining the classification performance.
Computer-aided detection (CAD) approaches have shown promising results for early esophageal cancer detection using Volumetric Laser Endoscopy (VLE) imagery. However, the relatively slow and computationally costly tissue segmentation employed in these approaches hamper their clinical applicability. In this paper, we propose to reframe the 2D tissue segmentation problem into a 1D tissue boundary detection problem. Instead of using an encoder-decoder architecture, we propose to follow the tissue boundary using a Recurrent Neural Network (RNN), exploiting the spatio-temporal relations within VLE frames. We demonstrate a near state-of-the-art performance using 18 times less floating point operations, enabling real-time execution in clinical practice.
Barrett's Esophagus is a precursor of esophageal adenocarcinoma, one of the most lethal forms of cancer. Volumetric laser endomicroscopy (VLE) is a relatively new technology used for early detection of abnormal cells in BE by imaging the inner tissue layers of the esophagus. Computer-Aided Detection (CAD) shows great promise in analyzing the VLE frames due to the advances in deep learning. However, a full VLE scan produces 1,200 scans of 4,096 x 2,048 pixels, making automated pre-processing for the tissue of interest extraction necessary. This paper explores an object detection for tissue detection in VLE scans. We show that this can be achieved in real time with very low inference time, using single-stage object detection like YOLO. Our best performing model achieves a value of 98.23% for the mean average precision of bounding boxes correctly predicting the tissue of interest. Additionally, we have found that the tiny YOLO with Partial Residual Networks architecture further reduces the inference speed with a factor of 10, while only sacrificing less than 1% of accuracy. This proposed method does not only segment the tissue of interest in real time without any latency, but it can also achieve this efficiently using limited GPU resources, rendering it attractive for embedded applications. Our paper is the first to introduce object detection as a new approach for VLE-data tissue segmentation and paves the way for real-time VLE-based detection of early cancer in BE.
The most common type of cancer are carcinomas: cancers which originate from the epithelial tissue lining the outer surfaces of organs. To detect carcinomas at an early stage, techniques are required with small sampling volumes. Single Fiber Reflectance spectroscopy (SFR) is a promising technique to detect early-stage carcinomas since it has a measurement volume in the order of hundreds of microns. SFR uses a single fiber to emit and collect broadband light. The model from Kanick et al. to relate SFR reflectance to tissue optical properties provided accurate results for tissue with a modified Henyey Greenstein phase function only. However, in many tissues other types of phase functions have been measured. We have developed a new model to relate SFR reflectance to the scattering and absorption properties of tissue, which provides accurate results for a large range of tissue phase functions. SFR measurements fall into the sub-diffuse regime. We, therefore, describe the SFR reflectance as a diffuse plus a semi-ballistic component. An accurate description of the diffuse SFR signal requires double integration of spatially resolved reflectance over the fiber surface. We use approaches from Geometric Probability and have derived the first analytic solution for the diffuse contribution to SFR. For the semi-ballistic contribution to the SFR signal we introduce a new phase function dependent parameter, psb, to describe the semi-ballistic part of the SFR signal. We will use the model to derive optical properties from SFR measurements performed endoscopically in patients with Barrett’s esophagus. These patients are at an increased risk to develop esophageal adenocarcinoma and, therefore, undergo regular endoscopic surveillance. When detected at an early stage, endoscopic treatment is possible, thereby avoiding extensive surgery. Based on the concept of field cancerization, we investigated whether SFR could be used as a tool to identify which patients are developing esophageal adenocarcinoma.
Routine surveillance endoscopies are currently used to detect dysplasia in patient with Barrett's Esophagus (BE). However, most of these procedures are performed by non-expert endoscopists in community hospitals. Leading to many missed dysplastic lesions, which can progress into advanced esophageal adenocarcinoma if left untreated.1 In recent years, several successful algorithms have been proposed for the detection of cancer in BE using high-quality overview images. This work addresses the first steps towards clinical application on endoscopic surveillance videos. Several challenges are identified that occur when moving from image-based to video-based analysis. (1) It is shown that algorithms trained on high-quality overview images do not naively transfer to endoscopic videos due to e.g. non-informative frames. (2) Video quality is shown to be an important factor in algorithm performance. Specifically, temporal location performance is highly correlated with video quality. (3) When moving to real-time algorithms, the additional compute necessary to address the challenges in videos will become a burden on the computational budget. However, in addition to challenges, videos also bring new opportunities not available in the current image-based methods such as the inclusion of temporal information. This work shows that a multi-frame approach increases performance compared to a naive single-image method when the above challenges are addressed.
Gastroenterologists are estimated to misdiagnose up to 25% of esophageal adenocarcinomas in Barrett's Esophagus patients. This prompts the need for more sensitive and objective tools to aid clinicians with lesion detection. Artificial Intelligence (AI) can make examinations more objective and will therefore help to mitigate the observer dependency. Since these models are trained with good-quality endoscopic video frames to attain high efficacy, high-quality images are also needed for inference. Therefore, we aim to develop a framework that is able to distinguish good image quality by a-priori informativeness classification which leads to high inference robustness. We show that we can maintain informativeness over the temporal domain using recurrent neural networks, yielding a higher performance on non-informativeness detection compared to classifying individual images. Furthermore, it is also found that by using Gradient weighted Class Activation Map (Grad-CAM), we can better localize informativeness within a frame. We have developed a customized Resnet18 feature extractor with 3 classifiers, consisting of a Fully-Connected (FC), Long-Short-Term-Memory (LSTM) and a Gated-Recurrent-Unit (GRU) classifier. Experimental results are based on 4,349 frames from 20 pullback videos of the esophagus. Our results demonstrate that the algorithm achieves comparative performance with the current state-of-the-art. The FC and LSTM classifier reach an F1 score of 91% and 91%. We found that the LSTM classifier based Grad-CAMs represent the origin of non-informativeness the best as 85% of the images were found to be highlighting the correct area.
The benefit of our novel implementation for endoscopic informativeness classification is that it is trained end- to-end, incorporates the spatiotemporal domain in the decision making for robustness, and makes the model decisions of the model insightful with the use of Grad-CAMs.
Volumetric Laser Endomicroscopy (VLE) is a promising balloon-based imaging technique for detecting early neoplasia in Barretts Esophagus. Especially Computer Aided Detection (CAD) techniques show great promise compared to medical doctors, who cannot reliably find disease patterns in the noisy VLE signal. However, an essential pre-processing step for the CAD system is tissue segmentation. At present, tissue is segmented manually but is not scalable for the entire VLE scan consisting of 1,200 frames of 4,096 × 2,048 pixels. Furthermore, the current CAD methods cannot use the VLE scans to their full potential, as only a small segment of the esophagus is selected for further processing, while an automated segmentation system results in significantly more available data. This paper explores the possibility of automatically segmenting relevant tissue for VLE scans using FusionNet and a domain-specific loss function. The contribution of this work is threefold. First, we propose a tissue segmentation algorithm for VLE scans. Second, we introduce a weighted ground truth that exploits the signal-to-noise ratio characteristics of the VLE data. Third, we compare our algorithm segmentation against two additional VLE experts. The results show that our algorithm annotations are indistinguishable from the expert annotations and therefore the algorithm can be used as a preprocessing step for further classification of the tissue.
Volumetric laser endomicroscopy (VLE) is an advanced imaging system offering a promising solution for the detection of early Barrett’s esophagus (BE) neoplasia. BE is a known precursor lesion for esophageal adenocarcinoma and is often missed during regular endoscopic surveillance of BE patients. VLE provides a circumferential scan of near-microscopic resolution of the esophageal wall up to 3-mm depth, yielding a large amount of data that is hard to interpret in real time. In a preliminary study on an automated analysis system for ex vivo VLE scans, novel quantitative image features were developed for two previously identified clinical VLE features predictive for BE neoplasia, showing promising results. This paper proposes a novel quantitative image feature for a missing third clinical VLE feature. The novel gland-based image feature called “gland statistics” (GS), is compared to several generic image analysis features and the most promising clinically-inspired feature “layer histogram” (LH). All features are evaluated on a clinical, validated data set consisting of 88 non-dysplastic BE and 34 neoplastic in vivo VLE images for eight different widely-used machine learning methods. The new clinically-inspired feature has on average superior classification accuracy (0.84 AUC) compared to the generic image analysis features (0.61 AUC), as well as comparable performance to the LH feature (0.86 AUC). Also, the LH feature achieves superior classification accuracy compared to the generic image analysis features in vivo, confirming previous ex vivo results. Combining the LH and the novel GS features provides even further improvement of the performance (0.88 AUC), showing great promise for the clinical utility of this algorithm to detect early BE neoplasia.
Optical coherence tomography (OCT) is an imaging technique optically analogous to ultrasound that can generate depth-resolved images with micrometer-scale resolution. Advances in fiber optics and miniaturized actuation technologies allow OCT imaging of the human body and further expand OCT utilization in applications including but not limited to cardiology and gastroenterology. This review article provides an overview of current OCT development and its clinical utility in the gastrointestinal tract, including disease detection/differentiation and endoscopic therapy guidance, as well as a discussion of its future applications.
Early neoplasia in Barrett’s esophagus (BE) is difficult to detect. Volumetric laser endomicroscopy (VLE) incorporates optical coherence tomography, providing a circumferential scan of the esophageal wall layers. The attenuation coefficient (μVLE) quantifies decay of detected backscattered light versus depth, and could potentially improve BE neoplasia detection. The aim is to investigate feasibility of μVLE for identification of early BE neoplasia. In vivo and ex vivo VLE scans with histological correlation from BE patients ± neoplasia were used. Quantification by μVLE was performed manually on areas of interest (AoIs) to differentiate neoplasia from nondysplastic (ND)BE. From ex vivo VLE scans from 16 patients (13 with neoplasia), 68 AoIs were analyzed. Median μVLE values (mm−1) were 3.7 [2.1 to 4.4 interquartile range (IQR)] for NDBE and 4.0 (2.5 to 4.9 IQR) for neoplasia, not statistically different (p=0.82). Fourteen in vivo scans were used: nine from neoplastic and five from NDBE patients. Median μVLE values were 1.8 (1.5 to 2.6 IQR) for NDBE and 2.1 (1.9 to 2.6 IQR) for neoplasia, with no statistically significant difference (p=0.37). In conclusion, there was no significant difference in μVLE values in VLE scans from early neoplasia versus NDBE. Future studies with a larger sample size should explore other quantitative methods for detection of neoplasia during BE surveillance.
Volumetric Laser Endomicroscopy (VLE) is a promising technique for the detection of early neoplasia in Barrett’s Esophagus (BE). VLE generates hundreds of high resolution, grayscale, cross-sectional images of the esophagus. However, at present, classifying these images is a time consuming and cumbersome effort performed by an expert using a clinical prediction model. This paper explores the feasibility of using computer vision techniques to accurately predict the presence of dysplastic tissue in VLE BE images. Our contribution is threefold. First, a benchmarking is performed for widely applied machine learning techniques and feature extraction methods. Second, three new features based on the clinical detection model are proposed, having superior classification accuracy and speed, compared to earlier work. Third, we evaluate automated parameter tuning by applying simple grid search and feature selection methods. The results are evaluated on a clinically validated dataset of 30 dysplastic and 30 non-dysplastic VLE images. Optimal classification accuracy is obtained by applying a support vector machine and using our modified Haralick features and optimal image cropping, obtaining an area under the receiver operating characteristic of 0.95 compared to the clinical prediction model at 0.81. Optimal execution time is achieved using a proposed mean and median feature, which is extracted at least factor 2.5 faster than alternative features with comparable performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.