Statistics for Data Analysis (CH923)

Module proposal information

Aim: To introduce students to statistical (and bioinformatic tools) for summarising and analysing data in the life sciences.

Syllabus:

Underlying Key Core Topics:

The importance of statistics and probability in quantitative scientific (life sciences) research: Introduction and motivation
Basic notions of Probability: Events and probabilities; Intersections, unions and independence; Conditional probabilities; Bayes theorem; Combinatorics
Summarising data: Types of data; Exploratory data analysis - Graphical tools, Summary statistics; Common statistical distributions and their properties - Normal distribution (including central limit theorem), Binomial distribution, Poisson distribution; Estimates and confidence intervals – point estimates for Normal, Binomial and Poisson distributions, confidence intervals for the mean and for the difference between two means (including the t-distribution)
Statistical computing: Statistics with spreadsheets such as Excel; Choice of statistical packages - GenStat, R and Minitab
Testing hypotheses: Concept and language; Construction of a simple likelihood ratio test; Student’s t-test for comparing means – one-sample, two-sample, paired sample, power; F-test for comparing variances; Chi-square test for association
Simple analyses of continuous data: From a two-sample t-test to one-way analysis of variance (ANOVA); Finding the best fitting line; Comparison of the approaches

More Specialised Core Topics:

The basics of experimental design: Main principles of good experimental design – Replication, Randomisation, Blocking and Representativeness; Separation of plot and treatment structure; Choice of treatments and treatment structure
Analysing designed experiments: Analysis of Variance (ANOVA) and testing assumptions;
Relationships between variables: Calibration and regression; Finding the best fitting line; Comparison of regression lines; Multiple linear regression; Common non-linear regression models
Analyses for non-Normal data – counts and proportions: log-linear models for counts; log-linear models for contingency tables; logit or probit analysis for proportions
Multivariate analysis: Data structure - the basic data matrix; principal component analysis; discriminant analysis, canonical variates and multivariate analysis of variance; principal coordinates and cluster analysis; multidimensional scaling

Specialised Topics:

The nature of measurement: Random and systematic errors; repeatability and reproducibility; detectin limits; blank correction; propagation of eerror
Sequence comparisons: Pairwise alignment – substitution matrices, global and local alignments; Multiway alignments – BLAST and FASTA.
Sampling and quality control: Basic concepts of sampling; random and stratified sampling; Shewhart and cusum charts
More experiemntal design: More on factorial structure - 2 to the n and 3 to the n designs; response surfaces; Taguchi
Statistical approaches for microarray experiments: Experimental design; Data collection; Image analysis using standard software; 1-colour or 2-colour arrays? Normalisation and Variance-Mean relationships; Analysis approaches; Clustering methods; Other multivariate tools
Multivariate calibration and regression: more on multiple regression - principal componet regression, partia least squares and other multivariate methods

Illustrative Bibliography:

R.R. Sokal, F.J. Rohlf.Biometry. W.H.Freeman, 1995
R. Mead, R.N. Curnow, A.M. Hasted. Statistical Methods in Agriculture and Experimental Biology. Chapman & Hall, 2002
J.N. Miller, J.C. Miller. Statistics and chemometrics for analytical chemistry. Pearson/Prentice Hall, 2005
R.G. Brereton. Applied Chemometrics for Scientists. Wiley, 2007
G.W. Snedecor, W.G. Cochran. Statistical Methods. Iowa State, 1989
G. Grimmett, D. Stirzaker. Probability and Random Processes. Oxford University Press, 2001.
R. Durbin, S.R. Eddy, A.S. Krogh G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. CUP, 1998
L. J. Bain, M. Engelhardt. Introduction to Probability and Mathematical Statistics. (Duxbury Classic Series), Wadsworth, 2000
W. Ewens, G. Grant. Statistical Methods in Bioinformatics: An Introduction. (Statistics for Biology and Health) Springer-Verlag, 20
D. E. Krane, M. L. Raymer. Fundamental Concepts of Bioinformatics. Benjamin Cummings, 2003

Module Timetable

An outline timetable for the module for 2010/11 is available

Lecture Notes

Lecture notes are available to students on the current course. Please understand that these are made available on the understanding that they will be used only for the purpose of personal study. They should not be distributed to anyone else or made public in any other way.

Statistical Computing

There are some online notes providing an Introduction to R. This is a statistical programming language that you could use for applying statistical approaches to your data.

GenStat for Teaching is an alternative statistical computing package that can be downloaded from http://www.vsni.co.uk/software/genstat-teaching/

Minitab is another alternative, available via the Warwick Tree

Assignments

Information and data files for assignments will also be provided online.

Being taught by:
Andrew Mead

Andrew Mead (Life Sciences)

and:

John Fenlon

John Fenlon (Statistics)

MOAC and Systems Biology DTC students with a suitable background may be allowed to take module BS915 instead, which is aimed at Systems Biology DTC students with a mathematical/statistical background.