# Biometry: the application of statistical and mathematical methods to Biosciences

Introduction:

Statistics is the science of collecting and interpreting data subject to variability. Within modern plant biosciences there is variability associated with every aspect of our work. To make informed decisions we must understand the nature of this variability, and allow or account for it to make the best use of the data we collect. Statistics tells us how to deal with variability, and how to collect and use data so that we can make good decisions.

A good understanding of a range of statistical techniques is essential for the modern biological scientist. Of particular importance for all areas of plant biosciences is an appreciation of the value and principles of sound experimental design and/or sampling strategies. Whilst there are a number of user-friendly statistical packages providing easy access to most of the commonly used statistical analysis tools, the modern biological scientist needs to be aware of the types of data for which these tools can be used, and the alternative approaches that can be used in particular circumstances. An understanding of how to interpret and present the results produced by the more common statistical techniques is also essential.

Objectives:

This module is designed to introduce statistical ideas for data collection and analysis that are suitable for the modern biological scientist. The aims of the module are to revise basic statistical ideas for data summary, provide students with an overview of the statistical tools necessary to design efficient experiments and surveys, and introduce the range of statistical techniques available for the appropriate analysis of the data generated. Real data examples, taken from the other modules of the MSc in “Plant Biosciences for Crop Production”, will be used to illustrate the use of the statistical methods and the interpretation of the output produced by the analyses. The wide applicability of the range of statistical and mathematical techniques introduced during the module will be illustrated through case studies covering the wide range of biological science application areas included within the MSc.

On completing this module, students will have the knowledge to design efficient experiments and surveys, to identify appropriate statistical techniques for the analysis of the data collected, to complete simpler analyses using a common statistical computing package, to interpret the output produced by these analyses, and to evaluate the results from statistical analyses presented in scientific papers.

Contents:

• The importance of statistics in biological research: introducing case studies for a range of biological application areas where the statistical or mathematical approach is an important component.
• Summarising data: types of data, summary statistics, exploratory data analysis, confidence intervals, graphical tools, common statistical distributions.
• Testing hypotheses: the language of hypothesis testing, why we want to test hypotheses, a few simple but important tests (t-test, F-test, chi-square test), including worked examples from crop production, crop protection and gene expression studies.
• Statistical computing: an introduction to GenStat for Windows or Minitab (or possibly both).
• Simple analyses for continuous data: simple linear regression and one-way analysis of variance.
• Relationships between variables: comparison of regression lines, multiple regression, common non-linear models, with example applications including seed production by weeds, plant growth in response to nutrient supply, and gene expression over time.
• Designing experiments: the principles behind good experimental design (replication, blocking, randomisation), the choice of treatment structure (including response surface designs), practical designs for real experiments in the field, glasshouses, controlled environments, laboratories and microarrays.
• The analysis of designed experiments: analysis of variance (ANOVA) as a basic tool, extracting information based on the treatment structure, testing assumptions, and interpretation of the output. Alternatives where ANOVA cannot be used because of the choice of design.
• Sample surveys: different approaches to selecting sampling units and the sampling frame, sampling for monitoring, summarising sample data, sampling to make decisions.
• Analysis approaches for data where the assumptions of ANOVA and standard regression fail - counts (e.g. the number of pests per plant, the numbers of occurrences of each nucleotide residue or amino acid in a protein or DNA sequence) and proportions (e.g. the proportion of seeds germinating, the proportion of insects killed by an insecticide) : contingency tables, chi-square test, log-linear models for counts or contingency table data, models for proportions (probit and logit analysis).
• The analysis of multivariate data – increasingly important in the biosciences with applications from ecology (e.g. the effects of herbicides on the composition and diversity of weed populations, the functional associations between different aphid predators) to high-throughput genomics (e.g. dsRNA banding patterns for different occurrences of a virus complex, microarray, proteomic and metabolomic data): types of multivariate data, graphical approaches, principal components analysis, similarity and distance measures, hierarchical clustering, other related multivariate methods.
• Statistics and mathematics in systems biology: case studies drawn from the range of biological application areas, illustrating how a range of statistical tools can be used in different application areas drawn from the other modules of the MSc.