Poster + Paper
7 April 2023 Iterative K-means clustering for disease subtype discovery
Author Affiliations +
Conference Poster
Abstract
The goal of personalized medicine is to tailor disease prevention, diagnosis and treatment to each individual, while considering their genes, environment and lifestyles. This emerging field recognizes the limitations of the one-size-fits-all approach to treating the “average patient”, which ignores the numerous, sometimes subtle differences between patients. Variations include the fact that diseases manifest differently amongst human beings. To address a more robust diagnosis across patients, researchers are exploring the topic of disease subtypes discovery or the discovery of different characterizations of the same diseases. From a machine learning perspective, this problem translates into a clustering (unsupervised grouping of data) task in a supervised setting. In other words, the superclasses are known but there exist various unknown substructures in the data within each superclass that improve the discrimination of the original classes. We apply the idea of subtype discovery to the diagnosis of lung cancer nodules by attempting to discover different types of malignant and benign nodules in imaging data. Early detection/diagnosis could improve patient survival rates for this type of cancer, which is the leading cause of cancer death in the U.S. Our method 1) finds robust—homogeneous by design and differientiable—subtypes of lung nodules through iterative K-means clustering that help classify them and 2) leaves some data unclustered. This set of unclustered or “hard” data represents images that cannot confidently be assigned to any subtypes and may require more resources (e.g., time or radiologists) to diagnose. Our approach is applied to the Lung Image Database Consortium (LIDC) data set. We hypothesize that our subtypes classification will outperform the classification of the original classes and produce quantitatively and qualitatively more meaningful representations of the diseases when compared not only to the original classes but also to subtypes produces by simply overclustering the data (i.e., producing more clusters than necessary to capture original classes or minimizing the clustering loss function without checking the content of clusters). We improve the performance by 11% over the original classification and provide a detail evaluation of our newly discovered sub-types.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Katherine Aubert, Catherine Huber, Jacob Furst, Daniela Stan Raicu, and Roselyne Tchoua "Iterative K-means clustering for disease subtype discovery", Proc. SPIE 12465, Medical Imaging 2023: Computer-Aided Diagnosis, 124652E (7 April 2023); https://doi.org/10.1117/12.2653973
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Diseases and disorders

Machine learning

Education and training

Image classification

Medicine

Performance modeling

Pulmonary disorders

Back to Top