Sequential Clustering: With Applications to Microarray

Chien-Cheng Tseng
Department of Biostatistics
Harvard University

Monday, February 10, 2003
3:30 PM
Moos 2-620
Minneapolis Campus

Abstract:

Microarray is a powerful tool for biologists to simultaneously screen thousands of genes. It aims to monitor gene expression patterns under different conditions as an exploratory tool to guide further biological experiments. Many clustering methods have been applied in array data to cluster genes with similar expression pattern. However, none have provided a way to deal with the nature of array data: many genes are sporadic and do not belong to any of the significant biological functions (clusters) that we are detecting. Most current algorithms force to cluster all genes into clusters despite the biologists' eager needs to search for only the most tight and stable clusters at the size of, say, 20-60 genes to follow up. Our novel sequential clustering aims to solve this situation. Firstly an improved initial value obtained from early truncation of hierarchical tree is used in K-means algorithm to help avoid local minimum. Then the most tight and stable clusters are identified sequentially by a resampling approach. We validate this method in a simulated data of 14 clusters with sporadic points and an expression profile of Drosophila life cycle. The result shows its ability to better suit the biological real needs.