PE Anderson, JQ Smith, KD Edwards and AJ Millar
Guided Conjugate Bayesian Clustering for Uncovering Rhythmically Expressed Genes
Abstract: An increasing number of microarray experiments produce time series of expression levels for many genes. Some recent clustering algorithms respect the time ordering of the data and are, importantly, extremely fast. The focus of this paper is the development of such an algorithm on a microarray data set consisting of 22,810 genes of the plant Arabidopsis thaliana measured at 13 time points over two days. Circadian rhythms control the timing of various physiological and metabolic processes and are regulated by genes acting in feedback loops. The aim is to cluster and classify the expression pro_les in order to identify genes potentially involved in, and regulated by, the circadian clock. Results: A greedy search over time series of expression levels (where series are compared pairwise, the two most similar put in the same cluster and so forth) will get a fast result but will only explore a very limited number of the possible partitions of the pro_les. We propose an improved, deterministic method based on a multi-step application of a conjugate Bayesian clustering algorithm. It allows the entire space to be searched more fully and intelligently. The values of the summary statistics are used to not only score clusters of genes, but also to guide the search of the vast partition space. By following this procedure, we are able to cluster genes that are known to be rhythmically expressed with genes of previously unknown function; thus suggesting potentially interesting targets for future experiments.