Critical Issues in Probabilistic and Statistical Method

These half-day workshops were led for the Doctoral Training Centre in Spring Term 2013 by Dennis Leech

These workshops are intended for research students across the social sciences. They aim to focus on research students’ critical awareness rather than teach techniques that can be applied "off the shelf". It is assumed students have previously taken a course in quantitative methods including probability and the basic methods of statistical inference.

The sessions are independent can be taken together or any of them separateely.

Workshop 1. Use and Misuse of Significance Testing.

As part of their quantitative methods research training, all students of social sciences learn the theory and methodology of testing statistical hypotheses. The concept of statistical significance is an important tool for the analysis of sample-based evidence. But statistical significance is not the same as substantive significance: hypothesis testing is often misapplied – surprisingly often in fact. Many research findings are reported as ‘significant’ without it being clear in what sense. Some results are significant in the statistical sense while of little or negligible value as social science. On the other hand, important effect sizes can be wrongly disregarded because lacking statistical significance. I will argue that as social scientists our primary concern in statistical work is to discover effect sizes, not merely statistically significant effects. This is a major issue in research across the social science from economics, to psychology. And one on which there is quite a lot of literature.

Workshop 2. Some Common Statistical Paradoxes and Fallacies.

We consider Simpson’s paradox and the regression fallacy.

Simpson’s paradox (or Yule-Simpson effect). How every component of a statistical aggregate can move in the opposite direction from the aggregate. Simpson’s paradox happens quite frequently in applied work, but is often missed. Real world examples have occurred in: education results; voting (eg the vote on the Civil Rights bill in US Congress in 1964); the effect of smoking on low birth weight; .

The Regression Fallacy (Galton’s fallacy). The regression fallacy is a basic property of a bivariate distribution. If the two variables are at different points in time, regression results can lead to highly misleading inferences about processes occurring over time. This fallacy is associated with the name of Francis Galton who thought he had detected evidence of a process of convergence towards the mean, what he called ‘regression to mediocrity’. When he observed this in human populations. Real world examples include: Galton’s example of regression towards mediocrity in stature (mid-parent height and children’s height); the hypothesis of economic convergence of countries based on international cross-sectional regressions.

Workshop 3. Uses, limitations and abuses of the ‘Normal’ Distribution.

There is nothing normal about the Gaussian distribution. We consider how it came to be regarded as such. its useful properties as a foundation of widely used statistical methods, and the problems that have arisen from its misuse in the analysis of risk, for example, its misuse has been a major cause of the recent financial crash. The normal distribution is misnamed: it is not normal especially iwhen applied to financial data or asset prices. We stress the importance of the distinction between uncertainty and risk, and alternative models such a the Pareto law.

Assumed quantitative background:

Basic (GCSE-level) mathematics; elementary statistical ideas, including probability, probability distributions, sampling, statistic significance; simple statistical methods, including statistical inference, hypothesis testing, confidence interval estimation, correlation and regression.