CRiSM Seminar - Deep learning in genomics, and a topic model for single cell analysis - Gerton Lunter
In the last decades biology has become an increasingly data-rich science. In parallel, the field of Machine Learning (ML) has made remarkable progress in modeling large and complex data sets. This suggests to try and apply ML techniques to problems in biology. In the first half of this talk I will show how ML methods can predict various intermediate phenotypes from sequence, including splicing and chromatin state. The nature of DNA means that these models often exhibit reverse-complement symmetry, and we found that explicitly dealing with this structure improves the quality of the model.
For training we use data from whole-genome epigenetic assays across a range of tissue types. To address tissue heterogeneity the field is moving towards single-cell assays, and to identify the specific cell types document topic models such as Latent Dirichlet Allocation have been used. In the second half of the talk I will discuss recent work where we extend LDA to allows topic usage to be positive correlated across cell types, which we hope will improve model fit and increase the sensitivity of detecting rare cell types.