Module Information |
Aim: This course presents the application of methods from probability, statistical theory, and stochastic processes to problems of interest to bioinformaticians and systems biologists, mainly in the area of biosequence analysis. It is expected that students who have taken the course will have mastered the basic set of ideas required in order to carry out further research in bioinformatics methods and algorithms, or to apply these ideas in industrial applications. Syllabus:
- Random variables – distributions and expectation:
- Discrete random variables, moment and probability generating functions - Continuous random variables - Chebyshev’s inequality
- Multivariate distributions:
- Marginal and conditional distributions - covariance and correlation - extreme value distributions
- Statistical inference:
-Classical estimation and hypothesis testing -Bootstrap methods
- Stochastic processes
-Poisson process -Finite Markov chains -graphical representation -random walks
- Single DNA sequence analysis:
-Signal modelling -Pattern analysis
- Multiple DNA/protein sequence analysis:
-Detailed study of pairwise alignment algorithms and substitution matrices
- BLAST:
-a detailed study of the algorithm and underlying theory
- Markov chains and algorithms:
-Convergence to a stationary distribution
- Hidden Markov models:
-Forward-Backward algorithm and parameter estimation -Applications to protein family modelling, sequence alignment and gene finding
- Gene Expression, Microarrays and Multiple Testing:
-differential expression – one gene and multiple genes
Illustrative Bibliography:
- Statistical Methods in Bioinformatics - An Introduction, by W. J. Ewens and G. R. Grant (Springer-Verlag New York 2001)
- Biological Sequence Analysis by R. Durbin, S. Eddy, A. Krogh, G. Mitchison Cambridge University Press, 1998, ISBN: 0 521 62971 3
- Computational Genome Analysis: An Introduction by Michael S. Waterman, Simon Tavare, Richard C. Deonier , Springer Verlag , 2005
|
Lecture Notes