|
1.IntroductionBiomedical research has been revolutionized by the new types of information generated from various “omics” projects, beginning with the genome sequencing projects. The genome drafts completed so far have enabled us, for the first time, to discover and compare all possible genes in a number of organisms. To uncover proteome differences in a given organism, expression arrays and protein chips have been used to study the transcription and expression characteristics of all possible proteins in different tissues, at different developmental stages, and under various disease types.1 2 High-throughput pipelines in structural proteomics have automated protein structure determination by integrating target purification, crystallization, data acquisition, and final assignment.3 Location proteomics, one of the latest subfields of proteomics, has the goal of providing an exact description of the subcellular distribution for each protein in a given cell type.4 5 6 7 All of these methods provide valuable information for determining how a protein functions and how its functioning is regulated. Knowledge of a protein’s subcellular distribution can contribute to a complete understanding of its function in a number of different ways. The normal subcellular distribution of a protein provides a scope for its function. For instance, a protein localized in the mitochondrial membrane can be inferred to function in energy metabolism. If a protein has a close subcellular localization pattern to a known protein, there exists a high chance that they form a functional complex protein. The dynamic properties of protein subcellular distribution under different environmental conditions can also provide important information about protein function. If a protein changes its subcellular location from cytoplasm to cell nucleus after treating the cell with a certain drug, it suggests that the protein might play an important role in signal transduction and possibly work as a transcription factor directly. The current widespread application of biomedical optics was made possible by the invention of quantitative optical instruments. When the microscope was invented more than 300 years ago, the analog signal reflected from the specimen had to be recorded with a hand-drawing. The development of cameras permitted creation of still microscope images, but visual inspection was still the only way to interpret results generated from a microscope at that time. After the invention of the digital camera and other optical detectors, the analog signal from a microscope could be recorded at high density in digital media. With the application of digital signal processing techniques, automated analysis of microscope images, which could only be imagined before, became possible. For example, pioneering work on numerical description of microscope image patterns was done for chromosome distributions.8 9 The goal of the work reviewed here has been to develop automated methods applicable to all major subcellular patterns. 2.Quantitative Fluorescence Microscopy and Location ProteomicsCompared to other approaches for determining protein subcellular location such as electron microscopy and subcellular fractionation, fluorescence microscopy permits rapid collection of images with excellent resolution between cell compartments. These properties, along with high specificity methods for targeting fluorescent probes to specific proteins, make fluorescence microscopy the optimal choice for studying the subcellular distribution of a proteome. The choice of different fluorescence microscopy methods, however, depends on the application. Obviously, the signal-to-noise ratio is the most important factor in using quantitative fluorescence microscopy. The noise in fluorescence microscopy mostly comes from out-of-focus fluorescence and quantization errors in the camera.10 Although the second source can be reduced dramatically by using expensive charge-coupled device (CCD) cameras, out-of-focus fluorescence is handled differently by different fluorescence microscope systems.10 11 Inexpensive wide-field microscope systems collect fluorescence emitted from the entire 3-D specimen in the field of view, requiring computational removal of out-of-focus fluorescence (deconvolution) after image collection. Deconvolution can be computationally costly and requires an accurate model of the point-spread function for a particular microscope. Confocal laser scanning microscopes collect fluorescence from individual small regions of the specimen, illuminated by a laser scanning beam. Out-of-focus fluorescence is removed by employing a pinhole on the light collection path. Compared to wide-field microscopes, confocal laser scanning microscopes have a much lower acquisition rate, but no deconvolution is normally needed. A variation of the confocal laser scanning microscope, the spinning disk confocal microscope, circumvents the speed limit by using a rotating pinhole array, which enables fast focusing and image collection. For thin specimens, wide-field microscopes perform best; while for thick specimens, it is recommended to use a confocal laser scanning microscope.10 Fully automated microscopes also have tremendous promise for acquiring the large numbers of images required for systematic analysis of subcellular patterns.12 To collect fluorescence microscope images of a target protein, two methods are typically used to add a fluorescence tag to a protein of interest. Immunofluorescence employs antibodies that specifically bind to a target protein. It is not suitable for live cell imaging, because cells need to be fixed and permeabilized before antibodies can enter. Fluorescence dyes can be bound directly to antibodies, or to secondary antibodies directed against the primary antibodies. The other method is gene tagging, of which there are many variant approaches.13 14 15 16 17 A particularly useful approach is CD-tagging, which introduces a DNA sequence encoding a fluorescent protein such as green fluorescent protein (GFP) into an intron of a target gene. Gene tagging can also be applied randomly throughout a genome without targeting a specific protein, with the assumption that the probabilities of inserting the DNA tag into all genes are roughly equal. For a given cell type, random gene tagging coupled with high-throughput fluorescence microscopy can generate images depicting the subcellular location patterns of all or most expressed proteins. We coined the term location proteomics to describe the combination of tagging, imaging, and automated image interpretation to enable a proteome-wide study of subcellular location.5 The necessity of having an automated analysis system stems from need for an objective approach that generates repeatable analysis results, a high-throughput method that can analyze tens of thousands of images per day, and lastly, for a more accurate approach than visual examination. In the following sections, we first describe numerical features that can be used to capture the subcellular patterns in digital fluorescence microscope images. Summaries of feature reduction and classification methods are discussed next. (These sections can be skipped by readers primarily interested in learning the types of automated analyses that can be carried out on microscope images.) We then evaluate the image features for the tasks of supervised classification and unsupervised clustering by using various image datasets collected in our group and from our colleagues. Lastly, we describe a few other uses of image features in practical biomedical research. 3.Automated Interpretation of Images3.1.Image FeaturesGiven a combination of a protein expression level, a tagging approach, and a microscope system that yields a sufficiently high signal-to-noise ratio, we can obtain a precise digital representation of the subcellular location pattern of that protein. The next step, automated interpretation of that pattern, requires extracting informative features from the images that represent subcellular location patterns better than the values of the individual pixels. We have therefore designed and implemented a number of feature extraction methods for single cell images.5 18 19 20 To be useful for analyzing cells grown on a slide, cover slip, or dish, we require that these features be invariant to translation and rotation of the cell in the plane of the microscope stage, and robust across different microscopy methods and cell types. One approach to developing features for this purpose is to computationally capture the aspects of image patterns that human experts describe. We have used a number of features of this type, especially those derived from morphological image processing. An alternative, however, is to use less intuitive features that seek a more detailed mathematical representation of the frequencies present in an image and its gray-level distribution. These features capture information that a human observer may neglect, and may allow an automated classifier to perform better than a human one. We have therefore used features of this type, such as texture measures, as well. The feature extraction methods we have used are described briefly for 2-D and/or 3-D single cell images. 3.1.1.2-D featuresZernike moment features. A filter bank of Zernike polynomials can be used to describe the gray-level pixel distribution in each fluorescence microscope image.21 An image to be analyzed is first transformed to the unit circle by subtracting the coordinates of the center of fluorescence from those of each pixel, and dividing all coordinates by a user-specified cell radius r. A Zernike moment is calculated as the correlation between the transformed image f(x,y) (x2+y2⩽1), and a specific Zernike polynomial. The magnitude of the Zernike moment is used as a feature describing the similarity of the gray-level pixel distribution of an image to that Zernike polynomial. We calculate 49 Zernike moment features by using the Zernike polynomials up to order 12.22 23 Since an image is first normalized to the unit circle and only the magnitude of Zernike moments is used, this group of features satisfies the requirements of rotation and translation invariance. Haralick texture features. Haralick texture features provide statistical summaries of the spatial frequency information in an image.24 First, a gray-level co-occurrence matrix is generated by calculating the probability that a pixel of each gray level is found adjacent to a pixel of all other gray levels. Given a total number of gray levels Ng in an image, the co-occurrence matrix is Ng×Ng. For 2-D images, there are four possible co-occurrence matrixes that measure the pixel adjacency statistics in horizontal, vertical, and two diagonal directions, respectively. To satisfy the requirements of rotation and translation invariance, the four matrixes are averaged and used to calculate 13 intrinsic statistics, including angular second moment, contrast, correlation, sum of squares, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance, difference entropy, information measure of correlation 1, and information measure of correlation 2.23 One restriction of Haralick features is that they are not invariant to the total gray level used as well as the pixel size in an image. To address this, a series of experiments were conducted to find optimal gray levels and coarsest pixel size for use in HeLa cells under all microscopy conditions.18 The most discriminative Haralick features were obtained when images were resampled to 1.15 microns/pixel and quantized using 256 gray levels. Resampling to these settings for HeLa cells can be used to calculate Haralick features on a common frame of reference for varying microscope objectives and cameras.18 Whether this resolution is optimal for other cell types remains to be determined. Wavelet features. Wavelet transform features can also be used to capture frequency information in an image. To extract features from the wavelet transform of an image, a multiresolution scheme is often used.25 An image can be convolved with wavelets of different scales, and statistics of the pixel intensity in the resulting images (such as mean, standard deviation, and average energy) are often used as features. Here we describe two sets of recently applied wavelet features derived from the Gabor wavelet transform and the Daubechies four-wavelet transform. Since wavelet transforms are not invariant to cell translation and rotation, each image is pivoted at its center of fluorescence and rotated to align its primary axis with the y axis in the image plane before feature extraction. Alignment of the secondary axis can be achieved by conducting an extra 180-deg rotation if necessary to make the third central moment of x positive. Daubechies four-wavelet features. The Daubechies wavelet family is one of the most frequently used wavelet transforms in image analysis.26 Each wavelet transform consists of a scale function and a wavelet function, which can be regarded as a low-pass and high-pass filter, respectively.25 Given the Daubechies four-wavelet transform with its scale and wavelet functions, an image is sequentially convolved column- and row-wise by these two filters, respectively. The four convolved images carry different frequency information extracted from the original image. Three of them contain high-frequency information in the x, y, and diagonal directions of the original image, respectively. The last one contains low-frequency information and can be regarded as a smoothed version of the original image. Further decomposition on the smoothed image will give us finer information on lower frequency bands. We used Daubechies four-wavelet transform to decompose an image up to level 10, and the average energies of the three high-frequency images at each level were used as features. In total, 30 wavelet features can be obtained that represent the frequency information in the original image best captured by Daubechies four-wavelet transform. Gabor wavelet features. The Gabor function has been used as an important filtering technique in computer vision, since it was found to be able to model receptive field profiles of cortical simple cells.27 The information captured by the nonorthogonal Gabor wavelet is mostly the derivative information of an image such as edges.28 A Gabor filter bank can be generated using Gabor filters with different orientations and scales. The mean and standard deviations of the pixel intensity in a convolved image are often used as features, which represent the frequency information in the original image best captured by the Gabor wavelet transform. We have used 60 Gabor wavelet features from a filter bank composed of six different orientations and five different scales. Morphological features. Image morphology describes various characteristics of objects, edges, and the entire image, such as the average size of each object, the edge intensity homogeneity, and the convex hull of the entire image. Unlike some natural scene images, fluorescence microscope images can be well characterized by their mathematical morphology.18 19 Morphological information of an image represents group statistics, intrinsically invariant to cell rotation and translation. The morphological features we have used include 14 features derived from finding objects (connected components after automated thresholding), five features from edges, and three features from the convex hull of the entire image.18 19 Since multichannel imaging has become routine in fluorescence microscopy, additional channels can be added to improve the recognition of the subcellular location pattern of a target protein. A commonly used reference in our experiments is the distribution of a DNA-binding probe that labels the cell nucleus.19 The DNA channel image introduces an extra pivot in images for studying protein subcellular location. We have therefore used six additional object features to describe the relative location of the protein channel to the DNA channel. Subcellular location feature nomenclature. We have created a systematic nomenclature for referring to the image features used to describe subcellular location patterns, which we term subcellular location feature (SLF) sets.18 19 Each set found to be useful for classification or comparison is assigned an SLF set number. Each feature in that set has the prefix SLF, followed by the set index and the index of the feature in that set. For instance, SLF1.7, which is the variance of object distances from the center of fluorescence, is the seventh feature in feature set SLF1. Table 1 gives a summary of all current 2-D features grouped by various feature sets. The features derived from a parallel DNA channel for a target protein are included in the feature sets SLF2, SLF4, SLF5, and SLF13.18 19 Table 1
3.1.2.3-D features3-D morphological features. As an initial approach to describing the 3-D distribution of proteins in cells, we used a direct extension of some of the 2-D features to 3-D.20 Converting 2-D features that depend on area to 3-D counterparts using volume is straightforward. However, because of the asymmetry between the slide plane and the microscope axis, directly converting features measuring 2-D to 3-D distances would lose important information present in 3-D image collection. While protein distribution in the plane of the slide can be considered to be rotationally equivalent, the distribution for adherent cells along the microscope axis is not (since some proteins are distributed preferentially near the bottom or top of the cell). The distance computation in 3-D images was therefore separated into two components, one in the slide plane and the other along the microscope axis. While 2-D edge features can be extended to 3-D directly, for computational convenience, two new features were designed from 2-D edges found in each 2-D slice of a 3-D image.5 Table 2 shows all current 3-D features. Table 2
Haralick texture features. Although Haralick texture features were originally designed for 2-D images, the idea of extracting pixel adjacency statistics can be easily extended to voxel adjacency in 3-D images.5 Instead of four directional adjacencies for 2-D pixels, there are 13 directional adjacencies for 3-D voxels. The same 13 statistics used as 2-D Haralick texture features can be computed from each of the 13 3-D co-occurrence matrixes, and the average and range of the 13 statistics can be used as 3-D Haralick texture features.5 Feature set SLF11 combines these with the 3-D morphological and edge features. 3.1.3.Feature normalizationSince each feature has its own scale, any calculations involving more than one feature will be dominated by features with larger ranges, unless steps are taken to avoid it. There are many possible means for mapping diverse features into a more homogeneous space, and we have chosen to use the simplest approach in which each feature in the training data is normalized to have zero mean and unit variance before training a classifier. The test data is normalized accordingly by using the mean and variance of each feature from the training data. Note that since this is done merely to establish a scaling transform using factors that are fixed prior to training, it does not assume that each feature follows a Gaussian distribution (the distribution of a feature across all classes is not in fact Gaussian but rather typically a mixture of Gaussians). 3.2.Feature ReductionWhile the different kinds of SLF features are intended to capture different types of information from an image, they might, however, still contain redundancy. In addition, some of the features might not contain any useful information for a given set of subcellular patterns. More often than not, it has been observed that reducing the size of a feature set by eliminating uninformative and redundant features can speed up the training and testing of a classifier and improve its classification accuracy. We have extensively studied two types of feature reduction methods, namely feature recombination and feature selection, in the context of subcellular pattern analysis.29 Feature recombination methods generate a linearly or nonlinearly transformed feature set from the original features, and feature selection methods generate a feature subset from the original features by explicit selection. Four methods of each type are described next. 3.2.1.Feature recombination
3.2.2.Feature selection
3.3.State-of-the-Art Classifiers3.3.1.Neural networksNeural networks model a feed-forward system in which all layers except for the input layer serve as an activator that takes the outputs from the previous layer, combines them linearly, and emits its activation via a nonlinear mapping (sigmoid) function.33 The training of a neural network is the same as fitting optimal parameters for a cost function that measures the correspondence between the actual and desired network outputs. We can define a cost function, such as the classification error rate, and train a neural network using various algorithms such as gradient descent back-propagation, conjugate gradient, and Newton’s method.30 Different training algorithms generate different locally optimal solutions. There have been many techniques invented to alleviate overtraining of a neural network such as momentum and learning rate.33 3.3.2.Support vector machinesSimilar to neural networks, support vector machines (SVMs) are a set of classifiers that employ linear classifiers as building blocks. Instead of organizing linear classifiers in a network hierarchy, SVMs generalize linear classifiers using kernel functions and the maximum-margin criterion.37 The lightweight linear classifier is often a good choice in a simple problem setting, while the linear decision boundary hypothesis is challenged in more complex problems. In addition, choosing from a group of equally good linear classifiers is sometimes error prone. As described in KPCA, a nonlinear kernel function can be employed to transform the original feature space to a very high, sometimes unbounded, dimensional space. SVMs train linear classifiers in this very high dimensional space, in that the nonlinear decision boundary can be regarded as linear after the kernel mapping. To address the difficulty of making a decision among equally behaved linear classifiers, SVMs choose the maximum-margin hyperplane as the decision boundary, which in theory minimizes the structural risk of a classifier, the upper bound on the expected test error.38 High dimensional space is not a problem for representing the decision boundary, in that only those training data points lying on the maximum-margin hyperplane, which are called support vectors, are needed. SVMs were originally characterized for two-class problems. There have been a few methods to expand them for K -class problems.38 39 40 The max-win method employs a one-versus-others strategy, in which K binary SVMs are trained to separate each class from all other classes. Given a test data point, the class with the highest output score is selected as the prediction. The pair-wise method employs a one-versus-one strategy, in which K(K−1)/2 binary SVMs are created for all possible class pairs. Each classifier gives a vote to one class given a test data point, and the class with the most votes is selected as the output. Alternatively, the K(K−1)/2 binary SVMs can be put in a rooted binary directed acyclic graph (DAG), where a data point is classified as not- i at each node when i is the loser class. The only class left when a leaf node is reached will be selected as the prediction. Multiclass SVMs can employ different kernel functions to differentiate protein location patterns nonlinearly. 3.3.3.AdaBoostThe training of a classifier may result in a decision boundary that performs well for a majority cluster of training data points but poorly for others. AdaBoost addresses this problem by focusing classifier training on hard examples in an iterative scheme.41 A base classifier generator keeps generating simple classifiers such as a decision tree or one-hidden-layer neural networks. At each iteration, a simple classifier is trained with a different distribution of the entire training data with more weight associated with those points incorrectly classified from the previous iteration. By balancing the performance between correctly and incorrectly classified data, we obtain a series of classifiers, each of which remedies some errors from its predecessor while possibly introducing some new errors. The final classifier is generated by linearly combining all trained simple classifiers inversely weighted by their error rates. AdaBoost was originally characterized for two-class problems, and a few expansion methods have been proposed to apply it to K -class problems.42 43 3.3.4.BaggingInstead of weighting the entire training data iteratively, the bagging approach samples the training data randomly using bootstrap replacement.44 Each random sample contains on average 63.2 of the entire training data. A preselected classifier is trained repeatedly using different samples and the final classifier is an unweighted average of all trained classifiers. The motivation for bagging is the observation that many classifiers, such as neural networks and decision trees, are significantly affected by slightly skewed training data. Bagging stabilizes the selected classifier by smoothing out all possible variances, and makes the expected prediction robust. 3.3.5.Mixtures of expertsSimilar to AdaBoost’s idea of focusing a classifier on hard training examples, “mixtures of experts” goes one step further by training individual classifiers, also called local experts, at different data partitions and combining the results from multiple classifiers in a trainable way.45 46 In “mixtures of experts,” a gating network is employed to assign local experts to different data partitions, and the local experts, which can be various classifiers, take the input data and make predictions. The gating network then combines the outputs from the local experts to form the final prediction. Both the gating network and local experts are trainable. Increasing the number of local experts in mixtures of experts will increase the complexity of the classifier in modeling the entire training data. 3.3.6.Majority-voting classifier ensembleThere are a large number of classifiers available in the machine learning community, each of which has its own theoretical justification. More often than not, the best performing classifier on one dataset will not be the best on another dataset. Given limited training data, all classifiers also suffer from overfitting. One way to alleviate these problems is to form a classifier ensemble in which different classifiers can combine their strengths and overcome their weaknesses, assuming the error sources of their prediction are not fully correlated.44 The most straightforward way of fusing classifiers is the simple majority-voting model. Compared to other trainable voting models, it is the fastest and performs as well as other trainable methods.47 In summary, the classification methods vary in the complexity of the decision boundaries they can generate, the amount of training data needed, and their sensitivity to uninformative features. Differences in their performance can therefore be expected. 3.4.Automated Interpretation of Fluorescence Microscope Images3.4.1.Image datasetsThe goal of designing good image features and classifiers is to achieve accurate and fast automated interpretation of images. The goodness of the image features, various feature reduction methods, and classifiers must be evaluated using diverse image datasets. We therefore created several image sets in our lab and also obtained images from our colleagues. These sets contain both 2-D and 3-D fluorescence microscope images taken from different cell types, as well as different microscopy methods. Table 3 summarizes the four image sets we used for the learning tasks described in this review. Table 3
The 2-D CHO dataset was collected for five location patterns in Chinese hamster ovary cells.23 The proteins were NOP4 in the nucleus, giantin in the Golgi complex, tubulin in the cytoskeleton, and LAMP2 in lysosomes, each of which was labeled by a specific antibody. Nuclear DNA was also labeled in parallel to each protein. The four protein classes as well as the DNA class contain different numbers of images ranging from 33 to 97. An approximate correction for out of focus fluorescence was made by nearest-neighbor deconvolution using images taken 0.23 μm above and below the chosen plane of focus.48 Since most images were taken from a field with only one cell, manual cropping was done on these images to remove any partial cells on the image boundary. The resulting images were then background subtracted using the most common nonzero pixel intensity and thresholded at a value three times higher than the background intensity. The DNA channel in this image set was not used for calculating features, but for forming a fifth location class. Figure 2 shows typical images from different cells from each class of the 2-D CHO dataset after preprocessing. The other collection of 2-D images we have used is the 2-D HeLa dataset.19 It contains ten location patterns from nine sets of images taken from the human HeLa cell line by using the same wide-field, deconvolution approach used for the CHO set. More antibodies are available for the well-studied HeLa cell line, and better 2-D images can be obtained from the larger, flatter HeLa cells. This image set covers all major subcellular structures using antibodies against giantin and gpp130 in the Golgi apparatus, actin and tubulin from the cytoskeleton, a protein from the endoplasmic reticulum membrane, LAMP2 in lysosomes, a transferrin receptor in endosomes, nucleolin in the nucleus, and a protein from mitochondria outer membrane.19 The goal of including two similar proteins, giantin and gpp130, in this set was to test the ability of our system to distinguish similar location patterns. A secondary DNA channel was used both as an additional class and for feature calculation. Between 78 and 98 images were obtained for each class. Following the same cropping and background subtraction steps, each image was further filtered using an automatically selected threshold49 calculated from the image. Figure 3 shows typical images from each class of the 2-D HeLa dataset after preprocessing. 2-D images represent a single slice from the subcellular distribution of a protein, which may ignore differences in location pattern at other positions in a cell. For unpolarized cells, 2-D images are usually sufficient to capture the subcellular distribution of a protein because of the flatness of the cells. For polarized cells, however, 3-D images are preferred to describe what may be different location patterns of a protein at the “top” (apical) and “bottom” (basolateral) domains of a cell. Even for unpolarized cells, additional information may be present in a complete 3-D image. We therefore collected a 3-D HeLa image set using probes for the same nine proteins used for the 2-D HeLa set.20 A three-laser confocal scanning microscope was used. Two parallel channels, to detect total DNA and total protein, were added for each protein, resulting in a total of 11 classes, each of which had from 50 to 58 images. Each 3-D image contained a stack of 14 to 24 2-D slices, and the resolution of each voxel was 0.049×0.049×0.2 μm (this represents oversampling relative to the Nyquist requirement by about a factor of 2 in each direction). The total protein channel was not only used as an additional class representing a predominantly “cytoplasmic” location pattern, it was also used for automated cell segmentation by a seeded watershed algorithm using filtering of the DNA channel to create “seeds” for each nucleus20 (the cells on each slide are reasonably well separated from each other, and this seeding method was therefore observed to perform very well). Finally, background subtraction and automated thresholding were conducted on the segmented images. Figure 4 shows typical images from each class of the 3-D HeLa dataset after preprocessing. The last image set used in our analysis was collected as part of a project to demonstrate the feasibility and utility of using CD-tagging13 to tag large numbers of proteins in a cultured cell line. A set of mouse NIH 3T3 cell clones expressing different GFP-tagged proteins was generated using a retroviral vector and the identity of the tagged gene found using reverse transcription polymerase chain reaction amplification and BLAST searches.16 A number of 3-D images of live cells from each clone were collected using a spinning-disk laser scanning microscope.5 The 3-D 3T3 dataset we used contained images for 46 clones, with 16 to 33 images for each clone (the size of each voxel was 0.11×0.11×0.5 μm) . Each image was further processed by manual cropping to isolate single cells, background subtraction, and automatic thresholding. Figure 5 shows typical images from some of the classes in the 3-D 3T3 dataset after preprocessing. 3.4.2.Supervised classification of fluorescence microscope imagesClassifying 2-D images. The first task in building our automated image interpretation system was to classify 2-D fluorescence microscope images. The initial classifier we used was a neural network with one hidden layer and 20 hidden nodes. We evaluated this classifier using various feature sets and image sets. Table 4 shows the performance of this classifier for various feature sets on both 2-D CHO and 2D HeLa datasets. The training of the neural network classifier was conducted on a training dataset, and the training was stopped when the error of the classifier on a separate stop set no longer decreased. We evaluated the performance of the classifier using eight-fold cross validation on the 2-D CHO set using both the Zernike and Haralick feature sets.22 23 (n -fold cross validation involves randomly dividing the available images into n groups, using the first n−1 of these as training data and the last group as test data, repeating this with each group as the test data, and averaging classifier performance over all n test groups.) The performance using these two feature sets was similar and much higher than a random classifier (which would have been expected to give 20 average performance on this five-class dataset). The same classifier was then evaluated using ten-fold cross-validation on the 2-D HeLa set using various 2-D feature sets.18 19 The morphological and DNA features in SLF2 gave an average accuracy of 76 on the ten location patterns. By adding both Zernike and Haralick features to SLF2 to create feature set SLF4, a 5 improvement in this performance was achieved (to 81). Removing the six DNA features to create set SLF3 resulted in a 2 decrease, suggesting that having information on the location of the nucleus provides only a modest increase in the overall ability to classify the major organelle patterns, although performance for specific classes improves more than this (data not shown). Table 4
Adding the six new features defined in SLF7 (SLF7.79 to 7.84), we observed a 5 decrease in accuracy compared to SLF3 alone.18 Since all of the information present in SLF3 should be present in SLF7, the results suggested that the larger number of features interfered with the ability of the classifier to learn appropriate decision boundaries (since it required it to learn more network weights). This can be overcome by eliminating uninformative or redundant features using any of a variety of feature reduction methods. Our preliminary results for feature selection using stepwise discriminant analysis (SDA) showed anywhere from 2 improvement (SLF5 versus SLF4) to 12 improvement (SLF8 versus SLF7). Comparing the performances of SLF13 (which includes DNA features) and SLF8 (which does not) confirms the prior conclusion that including the DNA features provides an improvement of approximately 2. Since feature selection improved classification accuracy in the previous experiments, we conducted a comparison of eight different feature reduction methods (described in Sec. 3.2) on the feature set SLF7 using the 2-D HeLa image set.29 To facilitate feature subset evaluation, a faster classifier, a multiclass support vector machine with a Gaussian kernel, was used to evaluate each of the resulting feature subsets using ten-fold cross-validation.29 Table 5 shows the results of the eight feature reduction methods. First, about 11 accuracy improvement was achieved by simply changing the neural network classifier to the support vector machine classifier using the same feature set SLF7. Although the four feature selection methods performed better than the four feature recombination methods in general, only the genetic algorithm and SDA gave statistically better results over SLF7 alone. Considering the overall accuracy and the running time required, the best performance among the eight methods was by SDA. In subsequent work, we therefore used SDA as our feature selection method. SDA returns a set of features that are considered to discriminate between the classes at some specified confidence level, ranked in decreasing order of the F statistic. To determine how many of these to use for a specific classification task, we routinely train classifiers with sets of features where the i’th set consists of the first i features returned by SDA, and then choose the set giving the best performance. Table 5
To further improve the classification accuracy on the 2-D HeLa image set, we evaluated eight different classifiers, as described in Sec. 3.3, using the feature subsets SLF13 and SLF8 (which were the best feature subsets with and without DNA features, respectively). All parameters were considered changeable in these eight classifiers, and the optimal ones were selected by ten-fold cross-validation. Since each classifier has its own constraints and suffers from overfitting given limited data, instead of choosing the optimal single classifier for each feature subset, we constructed an optimal majority-voting classifier ensemble by considering all possible combinations of the eight evaluated classifiers. The average performance of this majority-voting classifier was 3 higher than the neural network classifier for both SLF8 and SLF13 (Table 4). The features used to obtain the results described so far are of a variety of types that were chosen to capture different aspects of the protein patterns. To determine whether the performance could be improved further, we explored adding a large set of new features that might duplicate those already used, and employing SDA to find the best discriminative features. We therefore added 60 Gabor texture features and 30 Daubechies four-wavelet features, as described in the Wavelet features paragraph of Sec. 3.1.1, to feature set SLF7. SDA was performed on the combined set with and without DNA features, and the ranked features were evaluated incrementally by using the optimal majority-voting classifiers for SLF13 and SLF8, respectively. This resulted in two new feature sets, SLF16, which contains the best 47 features selected from the entire feature set, including DNA features, and SLF15, which contains the best 44 features selected from the entire feature set, excluding DNA features. The same strategy of constructing the optimal majority-voting classifier was conducted on these two new feature subsets. As seen in Table 4, the result was a small improvement in classification accuracy (to 92), and the same accuracy was obtained with and without the DNA features (indicating that some of the new features captured approximately the same information). The results in Table 4 summarize extensive work to optimize the classification of protein patterns in 2-D images, but the overall accuracy does not fully capture the ability of the systems to distinguish similar patterns. This can be displayed using a confusion matrix, which shows the percentages of images known to be in one class that are assigned by the system to each of the classes (since all of the images were acquired from coverslips, for which the antibody used was known, the “ground truth” is known). Table 6 shows such a matrix for the best system we have developed to date, the optimal majority-voting classifier using SLF16. Superimposed on that matrix are results for human classification of the same images.18 These results were obtained after computer-supervised training and testing. The subject was a biologist who was well aware of cellular structure and organelle shape, but without prior experience in analyzing fluorescence microscope images. The training program displayed a series of randomly chosen images from each class, and informed the subject of its class. During the testing phase, the human subject was asked to classify randomly chosen unseen images from each class, and the responses were recorded. The training and testing were repeated until the performance of the human subject stopped improving. The final average performance across the ten location patterns was 83, much lower than the performance of the automated system. Except for small improvements on a couple of classes such as mitochondria and endosome, the human classifier performed worse than the automated system, especially for the two closely related classes giantin and gpp130. The experiment indicates that a human classifier is unable to differentiate between these two “visually indistinguishable” patterns, while our methods were able to provide over 80 differentiation. Table 6
Classifying 3-D images. Given the encouraging results for classifying 2-D fluorescence microscope images, we extended the evaluation to 3-D fluorescence microscope images. The 3-D HeLa dataset we used contains 11 subcellular location patterns, the ten patterns in the 2-D HeLa dataset, plus a total protein (or “cytoplasmic”) pattern. For this dataset we first evaluated the neural network classifier with one hidden layer and 20 hidden nodes using a new SLF9 feature set modeled on the morphological features of SLF2.20 As shown in Table 7, the average accuracy over 11 classes was 91 after 50 cross-validation trials, which was close to the best 2-D result. SLF9 contains morphological features derived from both the protein image and parallel DNA images. To determine the value of the DNA features, the 14 features that require a parallel DNA image were removed from SLF9, and the remaining 14 features were defined as SLF14. The same neural network was trained using SLF14 on the 3-D HeLa image set, and the average accuracy achieved was 84, 7 lower than for SLF9. The greater benefit from DNA features for 3-D images than for 2-D images could be due to at least two reasons. The first is that at least some of the nonmorphological features in the larger 2-D feature sets capture information that duplicates information available by reference to a DNA image, and since only morphological features were used for the 3-D analysis, that information was not available without the DNA features. The second is that the DNA reference provides more information in 3-D space than in a 2-D plane. Table 7
As before, we applied stepwise discriminant analysis on SLF9 and selected the best nine features to form the subset SLF10, for which 94 overall accuracy was achieved by employing the neural network classifier on the same image set.20 To further improve the classification accuracy, we employed the same strategy used for 2-D images by creating optimal majority-voting classifiers for both SLF10 and SLF14. About 6 and 2 performance improvements over the previously configured neural network classifier were observed for SLF14 and SLF10, respectively. The confusion matrix of the optimal majority-voting classifier for SLF10 on the 3-D HeLa image set is shown in Table 8. Compared to the confusion matrix in Table 6, the recognition rates of most location patterns were significantly improved. The two closely related patterns, giantin and gpp130, now could be distinguished over 96 of the time, 14 higher than the best 2-D results. It suggests that 3-D fluorescence microscope images do capture more information about protein subcellular distribution than 2-D images, even for unpolarized cells. Table 8
Implications and cost-performance analysis. As discussed before, the three properties of a desirable automated image interpretation system are objectivity, accuracy, and speed. The first two properties have been demonstrated extensively, and we now turn to the computational time required for classifying images using our system. The time spent on each analysis task can be divided into three parts: image preprocessing, feature calculation, and final analysis. The preprocessing steps for both 2-D and 3-D images include segmentation, background subtraction, and thresholding. To calculate the cost of each feature set, we consider both the setup cost (a group of related features may share a common setup cost) and the incremental cost for each feature. Table 9 shows the times for typical classification tasks using various feature sets. Preprocessing of 2-D images needs fewer resources than the actual feature calculation. In contrast, the preprocessing step occupies the largest portion of the feature costs for 3-D images. The cost of training and testing a classifier largely depends on the implementation of the specific classifier. We therefore used a support vector machine with Gaussian kernel function as an example classifier for each feature set, which performed reasonably well and was ranked as one of the top classifiers for each feature set. Comparing all three cost components, feature calculation dominates the classification task of 2-D images and image preprocessing dominates that of 3-D images. Figure 6 displays the best performance of each feature set as a function of its computational cost. Using the feature set SLF13, we can expect to process about 8000 (six images per minute over 24 h) 2-D fluorescence microscope images per day with approximately 92 average accuracy over ten major subcellular location patterns. Of course, the calculation of many of the features we have used can potentially be speeded up dramatically by generating optimized, compiled code rather than using Matlab scripts. Table 9
The approaches described here can be used as a roadmap for building automated systems to recognize essentially any combination of subcellular patterns in any cell type. We have described over 170 2-D features and 42 3-D features that can be used in combination with various feature selection and classification strategies. Classifying sets of images. Cell biologists rarely draw conclusions about protein subcellular location by inspecting an image of only a single cell. Instead, a conclusion is usually drawn by examining multiple cells from one or more slides. We can improve the overall classification accuracy of automated systems in a similar manner by classifying sets of images drawn from the same class using plurality voting.19 Theoretically, we should observe a much higher recognition rate given a classifier performing reasonably well on individual images. Two factors influence the accuracy of this approach: the number of images in each set and the number of features used for classification. Increasing the set size should enhance the accuracy, such that a smaller set of features would be good enough for essentially perfect classification. On the other hand, given a larger set of good features, a smaller set size would be sufficient for accurate recognition. We have evaluated this tradeoff for the 2-D and 3-D HeLa datasets (Figs. 7 and 8). For each feature set, random sets of a given size were drawn from the test image set for a given classifier (all images in the set were drawn from the same class), and each image was classified using the optimal majority-voting classifier for that feature set. The class receiving the most votes was assigned to that random set. This process was repeated for 1000 trials for each class. The results showed that the smallest image set size for an overall 99 accuracy was seven 2-D images for SLF13 and five 3-D images for SLF10, respectively (Fig. 7). The fewest features to achieve an average 99 accuracy given a ten-image set were the first nine features from SLF16 on 2-D images and the first six features from SLF10 on 3-D images, respectively (Fig. 8). The higher recognition rate for SLF10 on 3-D HeLa images accounts for both the smaller set size and the smaller number of features required for essentially perfect classification. This approach of using an imperfect single cell classifier to achieve nearly perfect accuracy on small sets of images is anticipated to be especially useful for classifying patterns in single wells via high-throughput microscopy. 3.4.3.Unsupervised clustering of fluorescence microscope imagesWe have reviewed the prior work on supervised learning of subcellular location patterns in a number of image sets taken from different types of cells and microscopy methods. The results demonstrate not only the feasibility of training such systems for new patterns and cell types, but also demonstrate that the numerical features used are sufficient to capture the essential characteristics of protein patterns without being overly sensitive to cell size, shape, and orientation. The value of these features for learning known patterns suggests that they can also be valuable for analyzing patterns for proteins whose location is unknown (or not completely known). In this section, we describe results for such unsupervised clustering of fluorescence microscope images according to their location similarity. By definition, no ground truth is available for evaluating results from unsupervised clustering, and the goodness of clustering results can only be evaluated empirically. One of the most popular clustering algorithms is hierarchical clustering, which organizes the clusters in a tree structure. Hierarchical clustering is often conducted agglomeratively by starting with all instances as separate clusters and merging the closest two clusters at each iteration until only one cluster is left. The distance between each cluster pair can be calculated using different measures, such as the Euclidean distance and the Mahalanobis distance (which normalizes for variation within each feature and correlation between features). An average-link agglomerative hierarchical clustering algorithm was first applied for SLF8 on the ten-class 2-D HeLa image set.50 Each class was represented by the mean feature vector calculated from all images in that class. Mahalanobis distances were computed between two classes using their feature covariance matrix. The resulting tree (subcellular location tree) is shown in Fig. 9. This tree first groups giantin and gpp130, and then the endosome and lysosome patterns, the two most difficult pattern pairs to distinguish in supervised learning. Just as protein family trees have been created that group all proteins by their sequence characteristics,51 we can also create a subcellular location tree (SLT) that groups all proteins expressed in a certain cell type by their subcellular location. The data required to create comprehensive SLTs can be obtained from projects such as the CD-tagging project started a few years ago,13 16 the goal of which was to tag all possible genes in mouse 3T3 cells and collect fluorescence microscope images of the tagged proteins. Preliminary results on clustering 3-D images of the first 46 proteins to be tagged have been described.5 The approach used is parallel to that for classification: feature selection and then selection of a clustering method. To select the optimal features for clustering, SDA was conducted starting from feature set SLF11 (which contains 42 3-D image features). For this purpose, each clone was considered to be a separate class, even though some clones might show the same location pattern. The rationale was that any feature that could distinguish any two clones would be ranked highly by SDA. To decide how many of the features returned by SDA to use, a neural network classifier with one hidden layer and 20 hidden nodes was used to measure overall classification accuracy for increasing numbers of the selected features (Fig. 10). The first 10 to 14 best features selected by SDA give an overall accuracy close to 70 on the 46 proteins (since some of the clones may have the same pattern, we do not expect to achieve the same high accuracy that we obtained earlier when the classes were known to be distinct). We therefore applied the agglomerative hierarchical clustering algorithm on the 3-D 3T3 image set using the first ten features selected from SLF11. The features were normalized to have zero mean and unit variance (z scores), and Euclidean distances between each clone were computed from their mean feature vectors. The resulting SLT is shown in Fig. 11. Evaluation of trees such as this can be difficult, since if the exact location of each protein was known, clustering would not be necessary. However, we can examine images from various branches from the tree to determine whether the results are at least consistent with visual interpretation. For example, two clusters of nuclear proteins can be seen in the tree: Hmga1-1, Hmga1-2, Unknown-9, Ewsh, Hmgn2-1 in one, and Unknown-11, SimilarToSiahbp1, and Unknown-7 in another. By inspecting two example images selected from these two clusters, as shown in Fig. 12, it is obvious that the former cluster represents proteins uniquely localized in the nucleus, and the latter cluster represents proteins localized in both the nucleus and the cytoplasm near the nucleus. This type of empirical comparison can heighten confidence that the tree represents an objective grouping of the location patterns. 3.4.4.Other important applicationsThe automated system described so far provides a validated converter that transforms the information on a protein subcellular distribution in a digital image into a set of numbers (features) that are informative enough to replace the image itself. Many off-the-shelf statistical analysis tools can be directly applied to this numerical image representation, and help us to draw statistically sound conclusions for protein patterns. Typical image selection. An example is to obtain the most typical image from a set of fluorescence microscope images. Typical image selection is often encountered in a situation when a very small number of images have to be selected from a large image collection. Traditionally, visual inspection is used, which is both subjective and unrepeatable given different inspectors. We have described methods that provide an objective and biologically meaningful way of ranking images by their typicality from a collection.52 The images in a collection can be represented as a group of multidimensional data points in the feature space. The centroid of this group can be calculated by taking the mean feature vector of all data points. Distances, such as Euclidean and Mahalanobis distances, can be computed between each data point and the centroid. All images in the collection can be ranked by their distances to the centroid in the feature space, and the most typical image would be the one on the top of the list.52 To obtain the most reliable centroid, we found that outlier rejection was very helpful and provided better results than other methods. Various experiments on finding most typical images from contaminated image sets have been conducted, and the results showed that the Mahalanobis distance function was better than the Euclidean distance function. Figure 13 shows results from one of the experiments. The most typical Golgi images are characterized by compact structure, while the least typical ones are characterized by dispersed structure. The biological explanation for this observation is that a normal Golgi complex goes through fragmentation prior to cell division, and therefore a minority of cells shows a dispersed pattern. The results illustrate the value of automated typicality analysis. Image set comparison. Each fluorescence microscope image representing a certain subcellular location pattern is determined by two factors: the protein that is labeled and the environment under which the image is taken. One factor can be easily employed to infer changes of the other. For instance, the various protein subcellular location patterns can be compared to each other given a fixed environment for all classes. On the other hand, we can compare the properties of various environments (such as the presence of drugs) given a fixed protein as the reference. In both scenarios, two sets of images taken from different conditions have to be compared. We have described an objective method to compare two image sets,53 which can be used in many practical applications such as drug screening and target verification. Given our informative features, the task of comparing two image sets can be transformed to a statistical analysis that compares two feature matrices computed from the two sets. The Hotelling T 2 test,54 which is the multivariate version of the t test, can be used to compare two feature matrices. As an illustration of the approach, we performed all pairwise comparisons of the ten-class 2D HeLa set using feature set SLF6.53 Each comparison yielded an F value, which could be compared to a critical F value for a given significance level. All pair-wise F values were larger than the critical F value for 95 confidence, and therefore all class pairs were considered statistically different (which is consistent with the observation that classifiers can be trained to distinguish all of them). The two pairs that gave the smallest F values were giantin with gpp130, and LAMP2 (lysosomes) with a transferrin receptor (endosomes), which are again consistent with the classification and clustering results described earlier. To prove that the statistical test was not overly sensitive, we conducted two experiments. The first experiment was designed to compare equal-sized sets randomly drawn from the same class 1000 times. Approximately 5 of the total trials were considered to be statistically different, which is what is expected for a 95 confidence level. The second experiment was designed to compare two sets of giantin images by using different labeling approaches, a rabbit antiserum and a mouse monoclonal antibody. The resulting F value was 1.04, less than the critical F value 2.22 for 95 confidence. These two experiments confirmed that our methods were able to correctly identify two sets from the same pattern, but able to distinguish sets drawn from patterns known to be different. As a further step, we can perform univariate t tests to inspect the contribution of each feature to the discrimination of two image sets. Table 10 shows the features found by univariate t tests to be most different between the giantin and gpp130 image sets. The distinction between these two sets could be largely attributable to the morphological features that describe the overall cell shape and object properties. Our objective image set comparison method can be applied in drug screening, where the candidate drug would be the one that could cause the most significant location change of a target protein. On the other hand, the optimal target could be selected as the one that displays the largest location change given a known drug. Table 10
4.SummaryIn this review, we describe an image understanding system that features image processing, classification, clustering, and statistical analysis of fluorescence microscope images. This system is an example of applying advanced computer vision and pattern recognition techniques to digital images generated from quantitative microscopy. An objective, accurate, and high-throughput system is necessary for reliable and robust image interpretation in biomedical optics applications. Our methods, along with high-throughput imaging hardware, can be used to determine the subcellular location of every protein expressed in a certain cell type, which results in a complete location tree necessary for functional proteomics. The work described here only scratches the surface of what is possible for automated microscopy. AcknowledgmentsThe original research reviewed here was supported in part by research grant RPG-95-099-03-MGO from the American Cancer Society, by grant 99-295 from the Rockefeller Brothers Fund Charles E. Culpeper Biomedical Pilot Initiative, by NSF grants BIR-9217091, MCB-8920118, and BIR-9256343; by NIH grants R01 GM068845 and R33 CA83219; and by a research grant from the Commonwealth of Pennsylvania Tobacco Settlement Fund. 3-D imaging of HeLa cells was made possible by the generous assistance of Dr. Simon Watkins. Author Huang was supported by a Graduate Fellowship from the Merck Computational Biology and Chemistry Program at Carnegie Mellon University, founded by the Merck Company Foundation. REFERENCES
G. Macbeath
,
“Protein microarrays and proteomics,”
Nat. Genet. , 32 526
–532
(2002). Google Scholar
P. Cutler
,
“Protein arrays: The current state-of-the-art,”
Proteomics , 3 3
–18
(2003). Google Scholar
A. Sali
,
R. Glaeser
,
T. Earnest
, and
W. Baumeister
,
“From words to literature in structural proteomics,”
Nature (London) , 422 216
–225
(2003). Google Scholar
S. Ghaemmaghami
,
W. K. Huh
,
K. Bower
,
R. W. Howson
,
A. Belle
,
N. Dephoure
,
E. K. O’Shea
, and
J. S. Weissman
,
“Global analysis of protein expression in yeast,”
Nature (London) , 425 737
–741
(2003). Google Scholar
X. Chen
,
M. Velliste
,
S. Weinstein
,
J. W. Jarvik
, and
R. F. Murphy
,
“Location proteomics—Building subcellular location trees from high resolution 3D fluorescence microscope images of randomly-tagged proteins,”
Proc. SPIE , 4962 298
–306
(2003). Google Scholar
A. Kumar
,
S. Agarwal
,
J. A. Heyman
,
S. Matson
,
M. Heidtman
,
S. Piccirillo
,
L. Umansky
,
A. Drawid
,
R. Jansen
,
Y. Liu
,
K. H. Cheung
,
P. Miller
,
M. Gerstein
,
G. S. Roeder
, and
M. Snyder
,
“Subcellular localization of the yeast proteome,”
Genes Dev. , 16 707
–719
(2002). Google Scholar
J. C. Simpson
,
R. Wellenreuther
,
A. Poustka
,
R. Pepperkok
, and
S. Wiemann
,
“Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing,”
EMBO Rep. , 1 287
–292
(2000). Google Scholar
I. T. Young
,
P. W. Verbeek
, and
B. H. Mayall
,
“Characterization of chromatin distribution in cell nuclei,”
Cytometry , 7 467
–474
(1986). Google Scholar
K. C. Strasters
,
A. W. M. Smeulders
, and
H. T. M. van der Voort
,
“3-D texture characterized by accessibility measurements, based on the grey weighted distance transform,”
Bioimaging , 2 1
–21
(1994). Google Scholar
P. Andrews
,
I. Harper
, and
J. Swedlow
,
“To 5D and beyond: Quantitative fluorescence microscopy in the postgenomic era,”
Traffic Q. , 3 29
–36
(2002). Google Scholar
D. J. Stephens
and
V. J. Allan
,
“Light microscopy techniques for live cell imaging,”
Science , 300 82
–86
(2003). Google Scholar
J. H. Price
,
A. Goodacre
,
K. Hahn
,
L. Hodgson
,
E. A. Hunter
,
S. Krajewski
,
R. F. Murphy
,
A. Rabinovich
,
J. C. Reed
, and
S. Heynen
,
“Advances in molecular labeling, high throughput imaging and machine intelligence portend powerful functional cellular biochemistry tools,”
J. Cell Biochem. Suppl. , 39 194
–210
(2003). Google Scholar
J. W. Jarvik
,
S. A. Adler
,
C. A. Telmer
,
V. Subramaniam
, and
A. J. Lopez
,
“CD-tagging: A new approach to gene and protein discovery and analysis,”
BioTechniques , 20 896
–904
(1996). Google Scholar
M. M. Rolls
,
P. A. Stein
,
S. S. Taylor
,
E. Ha
,
F. McKeon
, and
T. A. Rapoport
,
“A visual screen of a GFP-fusion library identifies a new type of nuclear envelope membrane protein,”
J. Cell Biol. , 146 29
–44
(1999). Google Scholar
A. Kumar
,
K. H. Cheung
,
P. Ross-Macdonald
,
P. S. R. Coelho
,
P. Miller
, and
M. Snyder
,
“TRIPLES: a database of gene function in Saccharomyces cerevisiae,”
Nucleic Acids Res. , 28 81
–84
(2000). Google Scholar
J. W. Jarvik
,
G. W. Fisher
,
C. Shi
,
L. Hennen
,
C. Hauser
,
S. Adler
, and
P. B. Berget
,
“In vivo functional proteomics: Mammalian genome annotation using CD-tagging,”
BioTechniques , 33 852
–867
(2002). Google Scholar
C. A. Telmer
,
P. B. Berget
,
B. Ballou
,
R. F. Murphy
, and
J. W. Jarvik
,
“Epitope tagging genomic DNA using a CD-tagging Tn10 minitransposon,”
BioTechniques , 32 422
–430
(2002). Google Scholar
R. F. Murphy
,
M. Velliste
, and
G. Porreca
,
“Robust numerical features for description and classification of subcellular location patterns in fluorescence microscope images,”
J. VLSI Sig. Proc. , 35 311
–321
(2003). Google Scholar
M. V. Boland
and
R. F. Murphy
,
“A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells,”
Bioinformatics , 17 1213
–1223
(2001). Google Scholar
A. Khotanzad
and
Y. H. Hong
,
“Invariant image recognition by zernike moments,”
IEEE Trans. Pattern Anal. Mach. Intell. , PAMI-12 489
–497
(1990). Google Scholar
M. V. Boland
,
M. K. Markey
, and
R. F. Murphy
,
“Automated recognition of patterns characteristic of subcellular structures in fluorescence microscopy images,”
Cytometry , 33 366
–375
(1998). Google Scholar
R. M. Haralick
,
“Statistical and structural approaches to texture,”
Proc. IEEE , 67 786
–804
(1979). Google Scholar
S. G. Mallat
,
“A theory for multiresolution signal decomposition: the wavelet representation,”
IEEE Trans. Pattern Anal. Mach. Intell. , PAMI-11 674
–693
(1989). Google Scholar
I. Daubechies
,
“Orthonormal bases of compactly supported wavelets,”
Commun. Pure Appl. Math. , 41 909
–996
(1988). Google Scholar
J. D. Daugman
,
“Complete discrete 2-d Gabor transforms by neural networks for image analysis and compression,”
IEEE Trans. Acoust., Speech, Signal Process. , 36 1169
–1179
(1988). Google Scholar
B. S. Manjunath
and
W. Y. Ma
,
“Texture features for browsing and retrieval of image data,”
IEEE Trans. Pattern Anal. Mach. Intell. , 8 837
–842
(1996). Google Scholar
K. Huang
,
M. Velliste
, and
R. F. Murphy
,
“Feature reduction for improved recognition of subcellular location patterns in fluorescence microscope images,”
Proc. SPIE , 4962 307
–318
(2003). Google Scholar
B. Scholkopf
,
A. Smola
, and
K. R. Muller
,
“Nonlinear component analysis as a kernel eigenvalue problem,”
Neural Comput. , 10 1299
–1319
(1998). Google Scholar
A. Hyva¨rinen
,
“Fast and robust fixed-point algorithms for independent component analysis,”
IEEE Trans. Neural Netw. , 10 626
–634
(1999). Google Scholar
J. Yang
and
V. Honavar
,
“Feature subset selection using a genetic algorithm,”
IEEE Intell. Syst. , 13 44
–49
(1998). Google Scholar
C. Cortes
and
V. Vapnik
,
“Support vector networks,”
Mach. Learn. , 20 1
–25
(1995). Google Scholar
J. Platt
,
N. Cristianini
, and
J. Shawe-Taylor
,
“Large margin DAGs for multiclass classification,”
Adv. Neural Inform. Proc. Syst. , 12 547
–553
(2000). Google Scholar
Y. Freund
and
R. E. Schapire
,
“A decision-theoretic generalization of online learning and an application to boosting,”
J. Computer Syst. Sci. , 55 119
–139
(1997). Google Scholar
R. E. Schapire
and
Y. Singer
,
“Improved boosting algorithms using confidence-rated predictions,”
Mach. Learn. , 37 297
–336
(1999). Google Scholar
R. A. Jacobs
,
M. I. Jordan
,
S. J. Nowlan
, and
G. E. Hinton
,
“Adaptive mixtures of local experts,”
Neural Comput. , 3 79
–87
(1991). Google Scholar
D. A. Agard
,
“Optical sectioning microscopy: Cellular architecture in three dimensions,”
Annu. Rev. Biophys. Bioeng. , 13 191
–219
(1984). Google Scholar
T. W. Ridler
and
S. Calvard
,
“Picture thresholding using an iterative selection method,”
IEEE Trans. Syst. Man Cybern. , SMC-8 630
–632
(1978). Google Scholar
A. Bateman
,
E. Birney
,
R. Durbin
,
S. R. Eddy
,
K. L. Howe
, and
E. L. Sonnhammer
,
“The Pfam protein families database,”
Nucleic Acids Res. , 28 263
–266
(2000). Google Scholar
M. K. Markey
,
M. V. Boland
, and
R. F. Murphy
,
“Towards objective selection of representative microscope images,”
Biophys. J. , 76 2230
–2237
(1999). Google Scholar
E. J. S. Roques
and
R. F. Murphy
,
“Objective evaluation of differences in protein subcellular distribution,”
Traffic Q. , 3 61
–65
(2002). Google Scholar
K. Huang
and
R. F. Murphy
,
“Boosting accuracy of automated classification of fluorescence microscope images for location proteomics,”
BMC Bioinformatics, 6 78
(2004). Google Scholar
|