Iterative K-means clustering for disease subtype discovery

Katherine Aubert; Catherine Huber; Jacob Furst; Daniela Stan Raicu; Roselyne Tchoua

doi:10.1117/12.2653973

7 April 2023 Iterative K-means clustering for disease subtype discovery

Katherine Aubert, Catherine Huber, Jacob Furst, Daniela Stan Raicu, Roselyne Tchoua

Proceedings Volume 12465, Medical Imaging 2023: Computer-Aided Diagnosis; 124652E (2023) https://doi.org/10.1117/12.2653973
Event: SPIE Medical Imaging, 2023, San Diego, California, United States

Conference Poster

Abstract

The goal of personalized medicine is to tailor disease prevention, diagnosis and treatment to each individual, while considering their genes, environment and lifestyles. This emerging field recognizes the limitations of the one-size-fits-all approach to treating the “average patient”, which ignores the numerous, sometimes subtle differences between patients. Variations include the fact that diseases manifest differently amongst human beings. To address a more robust diagnosis across patients, researchers are exploring the topic of disease subtypes discovery or the discovery of different characterizations of the same diseases. From a machine learning perspective, this problem translates into a clustering (unsupervised grouping of data) task in a supervised setting. In other words, the superclasses are known but there exist various unknown substructures in the data within each superclass that improve the discrimination of the original classes. We apply the idea of subtype discovery to the diagnosis of lung cancer nodules by attempting to discover different types of malignant and benign nodules in imaging data. Early detection/diagnosis could improve patient survival rates for this type of cancer, which is the leading cause of cancer death in the U.S. Our method 1) finds robust—homogeneous by design and differientiable—subtypes of lung nodules through iterative K-means clustering that help classify them and 2) leaves some data unclustered. This set of unclustered or “hard” data represents images that cannot confidently be assigned to any subtypes and may require more resources (e.g., time or radiologists) to diagnose. Our approach is applied to the Lung Image Database Consortium (LIDC) data set. We hypothesize that our subtypes classification will outperform the classification of the original classes and produce quantitatively and qualitatively more meaningful representations of the diseases when compared not only to the original classes but also to subtypes produces by simply overclustering the data (i.e., producing more clusters than necessary to capture original classes or minimizing the clustering loss function without checking the content of clusters). We improve the performance by 11% over the original classification and provide a detail evaluation of our newly discovered sub-types.

Citation Download Citation

Katherine Aubert, Catherine Huber, Jacob Furst, Daniela Stan Raicu, and Roselyne Tchoua "Iterative K-means clustering for disease subtype discovery", Proc. SPIE 12465, Medical Imaging 2023: Computer-Aided Diagnosis, 124652E (7 April 2023); https://doi.org/10.1117/12.2653973

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available