Skip to main content Skip to navigation

Statistical Bioinformatics (BS915)

Module Information

Aim: This course presents the application of methods from probability, statistical theory, and stochastic processes to problems of interest to bioinformaticians and systems biologists, mainly in the area of biosequence analysis. It is expected that students who have taken the course will have mastered the basic set of ideas required in order to carry out further research in bioinformatics methods and algorithms, or to apply these ideas in industrial applications.


Syllabus:

  • Random variables – distributions and expectation:
    - Discrete random variables, moment and probability generating functions
    - Continuous random variables
    - Chebyshev’s inequality
  • Multivariate distributions:
    - Marginal and conditional distributions
    - covariance and correlation
    - extreme value distributions
  • Statistical inference:
    -Classical estimation and hypothesis testing
    -Bootstrap methods
  • Stochastic processes
    -Poisson process
    -Finite Markov chains
    -graphical representation
    -random walks
  • Single DNA sequence analysis:
    -Signal modelling
    -Pattern analysis
  • Multiple DNA/protein sequence analysis:
    -Detailed study of pairwise alignment algorithms and substitution matrices
  • BLAST:
    -a detailed study of the algorithm and underlying theory
  • Markov chains and algorithms:
    -Convergence to a stationary distribution
  • Hidden Markov models:
    -Forward-Backward algorithm and parameter estimation
    -Applications to protein family modelling, sequence alignment and gene finding
  • Gene Expression, Microarrays and Multiple Testing:
    -differential expression – one gene and multiple genes

 

Illustrative Bibliography:

  1. Statistical Methods in Bioinformatics - An Introduction, by W. J. Ewens and G. R. Grant (Springer-Verlag New York 2001)
  2. Biological Sequence Analysis by R. Durbin, S. Eddy, A. Krogh, G. Mitchison Cambridge University Press, 1998, ISBN: 0 521 62971 3
  3. Computational Genome Analysis: An Introduction by Michael S. Waterman, Simon Tavare, Richard C. Deonier , Springer Verlag , 2005


     

    Lecture Notes

    Lecturer:
    David Wild

    This module is aimed at DTC students with a mathematical background. The alternative module is CH923.