Skip to main content Skip to navigation

Week 4 answers

The first part of the questions was aimed at getting an appreciation for your statistical thinking. An anonymous analysis of the results will appear here later, but the answers are below. In some cases there may not be a right and wrong answer, and indeed the questions were at times phrased with intentional ambiguity to highlight how difficult it can be to define statistical answers (or how statistics may be mis-used).

NOTE: That due to an issue with quizbuilder your user ID's were not recorded upon submission of the quiz. This that we cannot track submissions to individual users. Hence this week will not be used for credit this year.

1) This question is about highlighting the nature of counting errors. These arise because in any experiment you are not sampling the entire population, but only some subset of it (e.g. an opinion poll that may ask a few thousand people the way they intend to vote). The error in some number of "counts" is equal to the square root of the number of counts, so if I have 1 million counts the error is only 1000 counts (or 0.1%), alternatively if I have 9 counts then the error is 3 counts (33%). In other words, the smaller number of measurements that is made, the larger the relative error. In practice this error refers to an error that contains 68% of the total probability.

In the case of EBC, where no treatment is applied 4 patients survive and 6 die. In fact this should be (4 +/- 2) and (6 +/- 2.5). When EBC is used the numbers are (7 +/- 2.6) and (3 +/- 1.7). You can then ask if the two numbers are consistent with being the same. Many of you will be familiar with ways of combining the errors on the two measurements, but I won't go into them here. The key point is to note that (4 +/- 2) is actually consistent with (7 +/- 2.6) because their error bars overlap. In other words given the small size of the sample it cannot be determined if EBC has a significant impact on Ebola survival rates.

2) This question gives a good example of where things can go wrong. In an ideal study one wants to "control" only one parameter (the use of a given drug), while keeping everything else the same. If this is not done then it is difficult to separate the effects of the drug, from the effects of other treatments given. This question in itself should raise concerns (e.g. do western hospitals offer better basic facillities for replacing fluids, controlling temperatures etc), although it is not in itself enough to invalidate the study if there were genuine control groups.

3) This question is about making use of other known information, and is an introduction to the common field of Bayesain statistics. This is different from the "normal" statistics that is taugh which is mainly concerned with the relative frequency of events in data. In this example, despite the effectiveness of the test, the rareness of the condition in the general population means that any person who tests positive is more likely to be a false positive (i.e. to not have the condition) than a true positive (i.e. to have the condition). There are many examples of this at play in health today, and this also explains why certain tests are targetted at groups who are particularly at risk, rather than everybody. The correct answer is 17%. If you would like to see how this is obtained in detail please take a look at the Wikipedia page on Bayes theorem.

A more simple understanding can be obtained by looking at the numbers. Suppose 1000000 people are tested. Of these 2000 actually have the condition, and 998000 do not. Of the 998000 that do not 9998 will test positive. Of the 2000 with the condition 1980 will test positive. Therefore, of the total number of positive tests (11998), only 1980 are correct, or about 17%. Equally, a small number of people (20) will test negative, even though they carry the condition.

4) This question is concerned with confidence regions. The warming rate is 0.1 +/- 0.05 degrees, and is approximated as being a Bell curve (or gaussian distribution), in which the probability distribution is defined in the question. This means that in 67% of the cases the warming observed will be between 0.05 and 0.15 degrees, in 95% of the cases it will be between 0 and 0.2 and in 99.7% it will be between -0.05 and 0.25 degrees. In the case of observing a cooling, we are interested in the fraction of this graph which lies at less than 0. Only 5% of the total probability lies outside the range of 0 and 0.2. Half of this lies beyond 0.2 and the other half at less than 0. Hence the probability of observing a flat (or cooling) temperature profile over a decade long period is ~2.5%. The actual probability of "effectively flat" depends on how one defines "flat". However, there is only a 0.15% chance that a cooling of more than 5 degrees would be observed.

5) However, if multiple measurements are made, then this probability goes up (i.e. the probability of getting 3 heads in row in a coin toss if you only toss the coin 3 times is small (12.5%), but the probability of getting 3 heads in a row at some point if you toss a coin 1000 times is very high (almost 100%). Hence in a climate record that contains thermometer measurements for ~20, 10 year periods, we would expect one out of 20 (5%) to lie outside the 2-sigma (error bar) range.

6) The correct answer is "None of the above". The correct interpretation here is that the baldness is an indication of another variable that is important, and has nothing to do with TV or football directly. In practice, more football fans tend to be male than female, and men experience baldness at a much higher rate than women.

7) This is a slightly ambiguous quesiton. If the answer is that early diagnosis leads to longer survival times post-diagnosis then this is may simply be due to the fact that the time between diagnosis and death was longer because of the early catching of the cancer. This would be true whether or not any treatment was conducted.

PART 2:

8) The major change for ocean chemistry is that CO2 makes the ocean more acidic by creating carbonic acid.

9) CO2 does act as a fertilizer, but this does not mean that crop yield will globally increase, and any increase in crop productivity in many parts of the world is due to intenstive farming techniques. Overall we expect crop yield to decrease although some areas will experience improved productivity.

10) A&C are correct. Heat related deaths are increasing, but while there are concerns about the movement of diseases, the actual direct toll to date is small.

11) Again A&C are correct. So far there is no evidence for a mass extinction event, although the rates of extinction are very high the total biodiversity loss remains a small fraction compared to mass extinction events. Low stocks of fish are largely due to over-fishing rather than direct climate events.