The difficulty of obtaining sufficient number of appropriately labelled samples is a major obstacle to learning class discriminating features by Machine Learning (ML) algorithms for tumor diagnostics from Ultrasound (US) images. This is often mitigated by sample augmentation, whereby new samples are generated from existing samples by rotation and flipping operations, Singular Value Decomposition (SVD) or generating synthetic image by Generative Adversarial Networks (GANs). The first approach does not generate new genuine samples, SVD generates images may not be easy to recognize as US tumor scans, and while GANs generate images are visually convincing their use for diagnostics may lead to overfitting and subject to adversarial attacks. We propose an innovative sample augmentation approach that utilizes our recently developed Tumor Margin Appending (TMA) scheme. The TMA scheme constructs the Convex Hull (CH) of the tumor region using a small set of radiologist marked tumor boundary points and crops the image at different radial expansion ratios of the CH onto surrounding tissue. Various ML algorithms, handcrafted features and Convolutional Neural Network (CNN), trained with TMA images at different ratios achieved acceptable diagnostic accuracies. In this paper, our sample augmentation scheme expands the ML training datasets by including TMA samples at several expansion ratios. Results of experiments on training CNN tumor diagnostic schemes for breast tumors yield improved classification performance with additional benefits, including robustness against different inadvertently practiced cropping at different hospitals, serves as a regularizer to reduce model overfitting when tested on unseen datasets obtained using unknown tumor segmentation and cropping procedure.
In tumor diagnostics from Ultrasound scan images, the region of interest is often determined by marking the boundary of the suspect mass by experts, simply by clicking on sufficient number of tumor boundary points. To determine whether the tumor is malignant or benign, clinical experts who are trained for long time on how to interpret image information from the marked tumor region and from the surrounding area. In contrast, in designing automatic computer aided diagnosis system using both traditional and conventional machine learning, the relevant image features are generally obtained by cropping the tumor as region of interest (RoI) without considering the periphery of the tumor that might contain important discriminative information for better classification accuracy. In this work, we investigate the impact on classification accuracy of different types of tumors by the cropping strategy where the tumor area will be augmented by a proportion of the surrounding region of the ROI. The required optimal proportion need to be determined so that the cropped ROIs encapsulate information about posterior echo and shadow of the tumor in addition to internal texture and echo that has mainly been used as classification indicators. Recently proposed cropping techniques use the best fitting ellipse of the tumor and examine the proportion by which the ellipse is expanded to improve accuracy. Unfortunately, the fitting ellipse may not reflect the shape of the tumor. Here, we investigate a number of alternative approaches of cropping the ROIs using the concept of convex hull shape(s) determined from the tumor boundary points selected by radiologists. Initially, we check several expansion ratio scales of the convex hull ranging from 0.6 to 4.0 against the cropped tumor without margin. Several classification methods including handcrafted features and deep learning methods are adopted for breast and liver tumors using ultrasound images. We shall demonstrate the importance of optimal cropping for breast and liver ultrasound tumor classification. Furthermore, optimal margin depends on the cancer type and classification method as well.
In most pattern recognition applications, the object of interest is represented by a very high dimensional data-vector. High dimensionality of modeling vectors poses serious challenges related to the efficiency of retrieval, analysis and classifying the pattern of interest. The Curse of Dimension is a general reference to these challenges and commonly addressed by Dimension Reduction (DR) techniques. The most commonly used DR schemes are data-dependent like Principal Component Analysis (PCA). However, we may expect over-fitting and biasness of the adaptive models to the training sets as consequences of low sample density ratio to dimension. Therefore, data-independent DR schemes such as Random Projections (RP) are more desirable. In this paper, we investigate and test the performance of differently constructed overcomplete Hadamard-based mxn (m<<n) sub-matrices using Walsh-Paley (WP) matrices as a DR scheme for Gait-based Gender Classification (GBGC). In particular, we shall demonstrate that these Hadamard-based RPs perform as well as, if not better, PCA and Gaussian-based RPs. Moreover, we shall show that Walsh-Paley Structured Matrices (WPSM) perform better than Walsh-Paley Random Matrices (WPRM).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.