Skip to main content Skip to navigation

Paper No. 08-22

Download 08-22

S Liverani, PE Anderson, KD Edwards, AJ Millar and JQ Smith

Efficient Utility-based Clustering over High Dimensional Partition Spaces

Abstract: Because of the huge number of partitions of even a moderately sized dataset, even when Bayes factors have a closed form, a comprehensive search for the highest scoring (MAP) partition is usually impossible. However, when each cluster in a partition has a signature and it is known that some signatures are of scientific interest
whilst others are not, it is possible, within a Bayesian framework, to develop search algorithms which are guided by these cluster signatures. Such algorithms can be expected to find better partitions more quickly. In this paper we develop a framework within which these ideas can be formalised. We then illustrate the proposed guided search on a microarray time course data set where the clustering objective is to identify clusters of genes with different types of circadian expression profiles.