Aim: This course presents the application of methods from probability, statistical theory, and stochastic processes to problems of interest to bioinformaticians and systems biologists, mainly in the area of biosequence analysis. It is expected that students who have taken the course will have mastered the basic set of ideas required in order to carry out further research in bioinformatics methods and algorithms, or to apply these ideas in industrial applications.
Syllabus:
Random variables – distributions and expectation: - Discrete random variables, moment and probability generating functions - Continuous random variables - Chebyshev’s inequality
Multivariate distributions: - Marginal and conditional distributions - covariance and correlation - extreme value distributions
Statistical inference: -Classical estimation and hypothesis testing -Bootstrap methods
Single DNA sequence analysis: -Signal modelling -Pattern analysis
Multiple DNA/protein sequence analysis: -Detailed study of pairwise alignment algorithms and substitution matrices
BLAST: -a detailed study of the algorithm and underlying theory
Markov chains and algorithms: -Convergence to a stationary distribution
Hidden Markov models: -Forward-Backward algorithm and parameter estimation -Applications to protein family modelling, sequence alignment and gene finding
Gene Expression, Microarrays and Multiple Testing: -differential expression – one gene and multiple genes
Illustrative Bibliography:
Statistical Methods in Bioinformatics - An Introduction, by W. J. Ewens and G. R. Grant (Springer-Verlag New York 2001)
Biological Sequence Analysis by R. Durbin, S. Eddy, A. Krogh, G. Mitchison Cambridge University Press, 1998, ISBN: 0 521 62971 3
Computational Genome Analysis: An Introduction by Michael S. Waterman, Simon Tavare, Richard C. Deonier , Springer Verlag , 2005