Annotation of true ground truth is a difficult task for many computational pathology problems. Types of ground truth labels in the field include bounding boxes, text labels, binary class labels, and full tissue maps. The compounding issue is when multiple different pathologists label the same image, and there is disagreement between them. In this work, we investigate multiply reannotated tumor maps for squamous cell carcinoma, and if different annotation fusion methods have an impact on tumor segmentation. We find in this work that tumor label maps with an average annotation similarity of 0.759, do not have a significant quantitative difference in tumor segmentation.
Active Learning (AL) is an artificial intelligence (AI) training paradigm that improves training efficiency in cases where labeled training is hard to obtain. In AL, unlabeled samples are selected for annotation using a bootstrap classifier to identify samples whose informational content is not represented in the current training set. Given a small number of samples, this optimizes training by focusing annotation on “informative” samples. For computational pathology, identifying the most-informative samples is non-trivial, particularly for segmentation. In this work, we develop a feature-driven approach to identifying informative samples. We use a feature extraction pipeline operating on segmentation results to find “outlier” samples which are likely incorrectly segmented. This process allows us to automatically flag samples for re-annotation based on architecture of segmentation (compared with less robust confidence-based approaches). We apply this process to the problem of segmenting oral cavity cancer (OCC) H&E stained whole-slide images (WSIs), where the architecture of OCC tumor growth is an aggressive pathological indicator. Improving segmentation requires costly annotation of WSIs; thus, we seek to employ an AL approach to improve annotation efficiency. Our results show that, while outlier features alone are not sufficient to flag samples for re-annotation, we can identify some WSIs which fail segmentation.
In our previous work, we have demonstrated that it is possible to use a small bootstrap set of fully annotated regions of interest (ROIs) to generate segmentation results on the WSI scale. In this work, pathologists were asked to edit the previously generated annotations on 150 WSIs, focusing on only the tumor class. Of these re-annotated WSIs, 21 were then sampled from, and used to train a new version of the classifier. Segmentation results were then generated for the remainder of the images. This work demonstrates an improvement in segmentation of the tumor class.
Deep learning for digital pathology is a challenging problem. Small patient datasets limit generalizability of trained deep learning models, while the large size of whole slide images (WSIs) represents a bottleneck for training. Additionally, annotations are difficult to obtain at scale due to image size and the volume of samples needed for accurate and generalizable training. We have investigated the use of Active Leaning (AL) to alleviate this burden; AL is a training approach where a small subset of samples is used to create a bootstrap classifier, which in turn selects new samples for annotation to maximize the performance gain for each additional training sample. In our previous work, we have found AL to be more efficient than the more common Random Learning (RL) approach in terms of segmentation performance per training sample. In the current work, we extend our investigation of AL by using our region-of-interest (ROI) trained classifier and perform WSI-level segmentation of multiple classes. We compare the results of the AL- to RL-based training, and generate inference results for a dataset of 75 WSIs spanning 61 patients. After four rounds of training, AL yielded a validation loss 0.566 lower as well as dice coefficients an average of 0.022 higher for classes present in images for the holdout testing set. This work demonstrates the generalizability of AL from patch-based segmentation to WSI-based, and provides a path forward for rapid development of complex digital pathology datasets in deep learning.
Utilizing Artificial Intelligence (AI) generated tissue maps for outcome prediction would aid in reducing the exhaustive workload on pathologists. But how quantitatively analogous are these maps to pathologist labeled maps must be studied. Another area that interested us was to understand how the "satellite tumor" definition in tissue label maps affects the features extracted. Our work was motivated from these ideas. This work aids in understanding the impact on feature values extracted when an automatic relabeling is applied on both hand-annotated and AI tumor maps This would be a first step towards investigating if the AI maps can be reliable for recurrence risk prediction in early stage oral cavity cancer patients.
Recently in the field of digital pathology, there have been promising advances with regards to deep learning for pathological images. These methods are often considered “black boxes”, where tracing inputs to outputs and diagnosing errors is a difficult task. This is important as neural networks are fragile, and dataset variation, which in digital pathology is attributed to biological variance, can cause low accuracy. In deep learning, this is typically addressed by adding data to the training set. However, training is costly and time-consuming to create and may not address all variation seen in these images. Digitized histology carries a great deal of variation across many dimensions (color / stain variation, lighting intensity, presentation of a disease, etc.), and some of these “low-level” image variations may cause a deep network to break due to their fragility. In this work, we use a unique dataset – cases of serially-registered H and E tissue samples from oral cavity cancer (OCC) patients – to explore the errors of a classifier trained to identify and segment different tissue types. Registered serial sections allow us to eliminate variability due to biological structure and focus on image variability including staining and lighting, and try to identify sources of error that may cause deep learning to fail. We find that perceptually-insignificant changes in an image (minor lighting and color shifts) can result in extremely poor classification performance, even when the training process tries to prevent overfitting. This suggests that great care must be taken to augment and normalize datasets to prevent errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.