Skip to main content Skip to navigation

Event Diary

Show all calendar items

CRiSM Seminar - Paul Kirk (BSU, Cambridge) (C1.06)

- Export as iCalendar
Location: C1.06, Zeeman Building

Title: Semi-supervised multiview clustering for high-dimensional data

Abstract: Although the challenges presented by high dimensional data in the context of regression are well-known and the subject of much current research, comparatively little work has been done on this in the context of clustering. In this setting, the key challenge is that often only a small subset of the covariates provides a relevant stratification of the population. Identifying relevant strata can be particularly challenging when dealing with high-dimensional datasets, in which there may be many covariates that provide no information whatsoever about population structure, or – perhaps worse – in which there may be (potentially large) covariate subsets that define irrelevant stratifications. For example, when dealing with genetic data, there may be some genetic variants that allow us to group patients in terms of disease risk, but others that would provide completely irrelevant stratifications (e.g. which would group patients together on the basis of eye or hair colour). Bayesian profile regression is a semi-supervised model-based clustering approach that makes use of a response in order to guide the clustering toward relevant stratifications. Here we consider how this approach can be extended to the "multiview" setting, in which different groups of covariates ("views") define different stratifications. We also present a heuristic alternative, some preliminary results in the context of breast cancer subtyping, and consider how the approach could also be used to integrate different 'omics datasets (assuming that each dataset provides measurements on a common set of individuals).

Show all calendar items