Skip to main content Skip to navigation


<?xml version="1.0"?>

<!DOCTYPE TEI.2 SYSTEM "base.dtd">




<title>Sources of Variation II</title></titleStmt>

<publicationStmt><distributor>BASE and Oxford Text Archive</distributor>


<availability><p>The British Academic Spoken English (BASE) corpus was developed at the

Universities of Warwick and Reading, under the directorship of Hilary Nesi

(Centre for English Language Teacher Education, Warwick) and Paul Thompson

(Department of Applied Linguistics, Reading), with funding from BALEAP,

EURALEX, the British Academy and the Arts and Humanities Research Board. The

original recordings are held at the Universities of Warwick and Reading, and

at the Oxford Text Archive and may be consulted by bona fide researchers

upon written application to any of the holding bodies.

The BASE corpus is freely available to researchers who agree to the

following conditions:</p>

<p>1. The recordings and transcriptions should not be modified in any


<p>2. The recordings and transcriptions should be used for research purposes

only; they should not be reproduced in teaching materials</p>

<p>3. The recordings and transcriptions should not be reproduced in full for

a wider audience/readership, although researchers are free to quote short

passages of text (up to 200 running words from any given speech event)</p>

<p>4. The corpus developers should be informed of all presentations or

publications arising from analysis of the corpus</p><p>

Researchers should acknowledge their use of the corpus using the following

form of words:

The recordings and transcriptions used in this study come from the British

Academic Spoken English (BASE) corpus, which was developed at the

Universities of Warwick and Reading under the directorship of Hilary Nesi

(Warwick) and Paul Thompson (Reading). Corpus development was assisted by

funding from the Universities of Warwick and Reading, BALEAP, EURALEX, the

British Academy and the Arts and Humanities Research Board. </p></availability>




<recording dur="00:25:07" n="3710">


<respStmt><name>BASE team</name>



<langUsage><language id="en">English</language>



<person id="nf0274" role="main speaker" n="n" sex="f"><p>nf0274, main speaker, non-student, female</p></person>

<person id="sm0275" role="participant" n="s" sex="m"><p>sm0275, participant, student, male</p></person>

<person id="sm0276" role="participant" n="s" sex="m"><p>sm0276, participant, student, male</p></person>

<person id="sm0277" role="participant" n="s" sex="m"><p>sm0277, participant, student, male</p></person>

<person id="sf0278" role="participant" n="s" sex="f"><p>sf0278, participant, student, female</p></person>

<personGrp id="ss" role="audience" size="l"><p>ss, audience, large group </p></personGrp>

<personGrp id="sl" role="all" size="l"><p>sl, all, large group</p></personGrp>

<personGrp role="speakers" size="7"><p>number of speakers: 7</p></personGrp>





<item n="speechevent">Lecture</item>

<item n="acaddept">Statistics</item>

<item n="acaddiv">ls</item>

<item n="partlevel">UG/PG</item>

<item n="module">Health and Disease in Populations</item>




<u who="nf0274"><kinesic desc="projector is on showing slide" iterated="n"/> # so in the last lecture we looked at hypothesis tests <pause dur="0.9"/> where our belief about the value of the underlying tendency whatever it was <pause dur="0.7"/> was used to calculate the probability of the data that we observed <pause dur="1.3"/> # we were essentially considering <pause dur="0.2"/> how different what we observed was <pause dur="0.3"/> to what we expected to happen <pause dur="0.8"/> what we believe is happening <pause dur="1.0"/> and today we're going to be looking at confidence intervals <pause dur="0.7"/> which allows us to get an appreciation of the size of that difference <pause dur="0.5"/> we calculate a range which includes the true value of the specified probability <pause dur="5.9"/><kinesic desc="changes slide" iterated="n"/> before we go on to that <pause dur="0.3"/> we'll consider a quick illustration of the problem that might face us <pause dur="0.7"/> the slide shows data # hypothetical of a number of neural tube defects in Western Australia <pause dur="0.4"/> from nineteen-seventy-five to two-thousand <pause dur="1.8"/> # <pause dur="0.3"/> and obviously what we're most interested in <pause dur="0.5"/> is how many cases we can expect <pause dur="0.3"/> on average in a year <pause dur="1.2"/> and from what we observe the line gives us an idea the line in the # middle of the plot <pause dur="0.7"/> gives us an idea of

what we might expect to observe <pause dur="1.3"/> the first thing to note about that is that there's a fair amount of variation of the points around that line <pause dur="1.0"/> which makes it a little bit difficult to predict the number of cases <pause dur="0.4"/> in any particular year <pause dur="2.2"/> the second thing to note about the plot is that there appears to be a drop <pause dur="0.5"/> in the number of cases at around about nineteen-ninety <pause dur="1.0"/> and that actually coincides with the introduction of folic acid <pause dur="0.7"/> # given to pregnant women <pause dur="1.4"/> so an obvious question is has the introduction of folic acid made any difference on the number of cases that we've observed of neural tube defects <pause dur="1.4"/> is that drop <pause dur="0.2"/> as a result of the introduction of folic acid or is it just random variation <pause dur="1.5"/> in other words what we want to do is to remove the year on year variation <pause dur="0.2"/> that we've observed <pause dur="1.0"/> and <pause dur="0.4"/> make an inference about what the underlying trend is <pause dur="1.2"/> and perhaps whether <pause dur="0.2"/> the number of cases in the years prior to nineteen-ninety <pause dur="0.5"/> are different to the number of cases in the

years after nineteen-ninety <pause dur="0.8"/> we want to get rid of the random variation and make some kind of inference about the underlying tendency <pause dur="0.5"/> in the data <pause dur="5.3"/><kinesic desc="changes slide" iterated="n"/> so just to give you a quick reminder of what a hypothesis test is <pause dur="1.1"/> we set up our hypothesis which is to quantify our belief about say an incidence rate ratio or something like that <pause dur="1.7"/> and we calculate the probability of what we've observed in our data <pause dur="0.4"/> given what we've believed what the hypothesis is that we've set up <pause dur="1.4"/> and the inference goes that if that probability <pause dur="0.2"/> is very small <pause dur="0.4"/> then either something very unlikely has happened and you should keep in mind that <pause dur="0.3"/> the fact that something is unlikely doesn't mean that it's impossible it could happen <pause dur="1.6"/> or that the hypothesis is wrong <pause dur="2.0"/> and so if we observe a very small P-value <pause dur="0.7"/> then we conclude that our data are incompatible with our hypothesis <pause dur="0.4"/> so we can reject our null hypothesis <pause dur="1.6"/> # a quick reminder that the probability of what we observe given what we believe is called the P-value <pause dur="4.7"/><kinesic desc="changes slide" iterated="n"/> so

this is a slide i lost it a little bit on last week <pause dur="0.3"/> # <pause dur="0.8"/> just do you remember the cut-off value of the P-value <pause dur="0.2"/> that we use <pause dur="0.6"/> is completely arbitrary <pause dur="0.7"/> most of the time we use point-nought-five but we could use a different value if we chose to <pause dur="1.6"/> and if we observe # <pause dur="0.2"/> a P-value of point-zero-five-one <pause dur="0.3"/> then that's still fairly unlikely <pause dur="0.7"/> and if we're investigating something contentious like AIDS then <pause dur="0.4"/> we'd still be fairly interested in what was going on in that instance although the result that we've observed is not statistically significant at the five per cent level if we get have a P-value of point-five-one <pause dur="0.6"/> it's still a pretty unlikely event <pause dur="0.4"/> and so we'd be more interested <pause dur="1.0"/> conversely if we were investigating the common cold <pause dur="0.2"/> then we probably wouldn't be too bothered <pause dur="2.0"/> # good thing about P-values is they're they're easy to use and interpret the we <pause dur="0.2"/> gives us a simple comparison of two numbers our significance level <pause dur="0.6"/> and # the observed P-value that we have <pause dur="1.5"/> it also has # an

interpretation that it's the probability of rejecting the null hypothesis <pause dur="0.7"/> # <pause dur="0.5"/> when it's actually true in other words that the data could be consistent with the hypothesis <pause dur="0.3"/> and be very unlikely <pause dur="1.6"/> # the P-value gives us a probability of that <pause dur="2.0"/> note also that the <trunc>sid</trunc> statistical significance depends on sample size <pause dur="0.9"/> # we'd never reject the null hypothesis <pause dur="0.2"/> in # a toss of a coin three times <pause dur="0.2"/> because the lowest possible P <pause dur="0.7"/> is still greater than point-nought-five the the lowest possible P in that case is # one-over-eight <pause dur="0.3"/> it's always going to be greater than point-nought-five <pause dur="1.0"/> and so we'd never reject it <pause dur="0.8"/> so significance depends on the sample size <pause dur="0.4"/> sorry i'm getting some hands waved in <gap reason="name" extent="1 word"/> </u><u who="sm0275" trans="overlap"> how sorry how do you calculate P </u><pause dur="0.4"/> <u who="nf0274" trans="pause"> well i i have to talk about that <pause dur="0.4"/> later <pause dur="0.4"/> sorry <pause dur="1.0"/> # <pause dur="1.8"/> so # a statistically significant result <pause dur="0.3"/> # <pause dur="0.2"/> may not be clinically important remember <pause dur="0.9"/> that depends on the <pause dur="0.3"/> context of the problem <pause dur="0.4"/> for

example if we're looking at # <pause dur="0.8"/> readmission data then we might be very very interested in a very small difference of readmission <pause dur="0.8"/> but conversely a very small difference in mortality rate may not be <pause dur="0.5"/> # clinically important <pause dur="0.5"/> so <pause dur="0.7"/> depends on # <pause dur="0.5"/> the context of the problem as to whether or not the difference we observe is clinically important <pause dur="6.7"/><kinesic desc="changes slide" iterated="n"/> <vocal desc="clears throat" iterated="n"/> <pause dur="3.8"/> so our problem is to use <pause dur="0.6"/> what we observe to draw conclusions about the underlying tendencies <pause dur="1.0"/> and confidence intervals give us a range <pause dur="0.2"/> # which may include the true value <pause dur="0.8"/> and we can also test a hypothesis about the true value <pause dur="0.3"/> that we're interested in <pause dur="4.7"/><kinesic desc="changes slide" iterated="n"/> so <pause dur="0.6"/> sorry i've # <pause dur="0.2"/> pressed the wrong button on the <pause dur="0.7"/> slide i just want to make sure i'm showing you the right one <pause dur="1.7"/> estimation right <pause dur="2.0"/> so consider an example suppose in our study hypothetically <pause dur="0.8"/> we have a hypothesis that the risk of # T-B in Warwickshire or Warwick <pause dur="0.7"/> is the same as the risk of T-B in the rest of the U-K <pause dur="1.2"/> and we've collected

some data and calculated an incident rate ratio of one-point-three <pause dur="1.1"/> and on the under the assumption <pause dur="0.3"/> that <trunc>th</trunc> <trunc>t</trunc> the # <pause dur="0.4"/> two risks are the same <pause dur="1.4"/> the probability of observing that incident rate ratio of one-point-three <pause dur="0.7"/> # occurs <pause dur="0.2"/> less than one per cent of the time <pause dur="1.2"/> so the data is inconsistent with the hypothesis that we're testing <pause dur="1.2"/> and we can # <pause dur="0.5"/> conclude that that we can reject that hypothesis but <pause dur="0.3"/> what it doesn't tell you is what the magnitude of the difference <pause dur="0.7"/> of the # <pause dur="0.8"/> risk of T-B for Warwick and the risk of the U-K is <pause dur="0.8"/> and <pause dur="0.2"/> quite often we want to say something about <pause dur="0.3"/> what size of difference <pause dur="0.3"/> is <pause dur="1.0"/> what the size of difference is <pause dur="1.3"/> we want a best guess at the true risk <pause dur="3.7"/><kinesic desc="changes slide" iterated="n"/> so this slide shows the P-values associated with the range of hypotheses where we observe an incident rate ratio of one-point-three <pause dur="1.9"/> so if we look at the <pause dur="0.2"/> # line <pause dur="0.4"/> corresponding to plus-thirty per cent risk i'll just get the pointer up <pause dur="1.5"/><event desc="finds pointer" iterated="y" dur="1"/> there it is <pause dur="0.9"/> so if we're looking at this line plus-thirty per cent

risk <pause dur="1.3"/> observing an incident rate ratio of one-point-three <pause dur="0.9"/> would correspond to a P-value of <trunc>point-nought-fi</trunc> # sorry point-five <pause dur="0.4"/> and so we wouldn't reject our null hypothesis that there's a difference between the two areas there <pause dur="2.2"/> correspondingly if our null hypothesis was that there was # <pause dur="1.3"/> a plus-forty per cent risk <pause dur="0.7"/> then observing # a P an incident rate ratio of one-point-three <pause dur="0.5"/> would have a P-value of point-two so we still wouldn't reject our null hypothesis <pause dur="1.1"/> and so on for all the other values in the table <pause dur="4.0"/> this gives us an idea <pause dur="0.5"/> of which hypotheses are inconsistent with our observed data <pause dur="1.0"/> so <trunc>i</trunc> <pause dur="0.2"/> informally the values outside the range of ten per cent excess risk to fifty per cent excess risk are inconsistent with the data we've observed <pause dur="1.9"/> and that range probably includes the true value <pause dur="6.1"/><kinesic desc="changes slide" iterated="n"/> so the ninety-five per cent confidence interval <pause dur="0.8"/> is a range which includes the true value with ninety-five per cent certainty <pause dur="1.3"/> # in this example the ninety-five per cent confidence

interval for the incident rate ratio was one-point-one to one-point-five <pause dur="1.2"/> and it centred on the observed value which is our best guess at the true value <pause dur="1.1"/> and obviously because it's centred on the observed value that always falls inside the # confidence interval <pause dur="5.6"/><kinesic desc="changes slide" iterated="n"/> so # <pause dur="0.9"/> there are slightly different ways of calculating confidence intervals and <gap reason="name" extent="1 word"/> students in particular might have seen <pause dur="0.2"/> different methods # than the ones that are given in your lecture notes <pause dur="0.9"/> but for the purposes of this course we're just using the error factor <pause dur="1.1"/> formula are given in # the lecture notes for various <pause dur="0.5"/> calculations for confidence intervals <pause dur="0.5"/> # pages one-twenty-five and one-twenty-six <pause dur="2.1"/> basically the confidence interval is centred on the observed value <pause dur="1.1"/> and then we calculate the error factor <pause dur="0.3"/> and correspondingly the upper and lower confidence <trunc>lint</trunc> <pause dur="0.2"/> limits as appropriate <pause dur="0.4"/> using the <pause dur="0.3"/> # given formula <pause dur="1.8"/> and the range between the lower and the upper confidence

limit <pause dur="0.5"/> is called the ninety-five per cent confidence <pause dur="0.2"/> interval <pause dur="3.6"/> <vocal desc="sniff" iterated="n"/> <pause dur="1.1"/><kinesic desc="changes slide" iterated="n"/> so an example <pause dur="1.3"/> say that we have interest in the incidence of diabetes <pause dur="1.0"/> and we've observed fifty cases in ten-thousand person years <pause dur="1.2"/> so we have # five cases per thousand person years <pause dur="1.4"/> we can calculate the observed exposure <pause dur="1.3"/> and the error factor which is based on the number of events observed fifty <pause dur="0.7"/> <vocal desc="sniff" iterated="n"/> <pause dur="0.2"/> given by the formula which <trunc>a</trunc> <trunc>a</trunc> <pause dur="0.4"/> appears on page one-twenty-five of your notes <pause dur="0.3"/> the error factor exponential twice times the square root of one over the number of cases one over fifty in this example <pause dur="0.9"/> being one-point-three-three <pause dur="4.8"/><kinesic desc="changes slide" iterated="n"/> so we've observed <pause dur="0.6"/> five per thousand person years give cases of diabetes per thousand person years <pause dur="0.8"/> and we've calculated our error factor one-point-three-three <pause dur="0.9"/> we can then use the formula to calculate the lower and the upper ninety-five per cent confidence limits <pause dur="1.1"/> and give our best estimate of the true <trunc>infiden</trunc> <pause dur="0.2"/>

incidence being the observed <trunc>inth</trunc> incidence <pause dur="0.4"/> <vocal desc="sniff" iterated="n"/> <pause dur="0.3"/> and the ninety <trunc>perfef</trunc> <pause dur="0.2"/> ninety-five per cent confidence interval <pause dur="0.5"/> being three-point-eight to six-point-seven cases per thousand person years <pause dur="0.8"/> we can be ninety-five per cent certain <pause dur="0.2"/> that that range three-point-eight to six-point-seven <pause dur="0.4"/> includes the true value of the incidence rate </u><pause dur="1.5"/> <u who="sm0276" trans="pause"> excuse me <pause dur="0.3"/> <vocal desc="clears throat" iterated="n"/><pause dur="0.2"/> at the risk of feeling dim where do you get the two from in that formula </u><pause dur="0.8"/> <u who="nf0274" trans="pause"> sorry i've just been asked where the two comes from in the formula <pause dur="0.2"/> in the error factor <pause dur="0.2"/> <vocal desc="sniff" iterated="n"/><pause dur="0.2"/> # <pause dur="0.5"/> i i don't want to discuss that <pause dur="0.2"/> <trunc>i</trunc> <pause dur="0.4"/> in the lecture i want to carry on 'cause i've got quite a lot to get through <pause dur="0.3"/> # we can perhaps talk about that in a <pause dur="0.4"/> in a session later </u><u who="sm0277" trans="overlap"> if we don't understand the <pause dur="0.3"/> point <pause dur="0.6"/> then <pause dur="0.2"/> is there any point <pause dur="0.6"/> giving the lecture <pause dur="0.9"/> i'm not being

rude but like <pause dur="0.7"/> that's two questions and you've not answered either of them </u><pause dur="1.1"/> <u who="nf0274" trans="pause"> the reason i i'm i'm again being asked another question the reason i'm not answering these is because # i'm also lecturing to <trunc>le</trunc> to <gap reason="name" extent="1 word"/> students <pause dur="0.8"/> and so <pause dur="0.2"/> they can't hear what you say <pause dur="0.2"/> they can only hear what i say <pause dur="0.9"/> and i'm not confident enough with this system <pause dur="0.2"/> to repeat your question <pause dur="0.3"/> then think of the answer <pause dur="0.3"/> try and keep the lecture to time <pause dur="0.3"/> and keep going <pause dur="0.7"/> so i'm sorry we will have to talk about those later <pause dur="0.4"/> okay <pause dur="2.5"/> right so diabetes example sorry i'll just have to look at where i am now <pause dur="1.4"/> <vocal desc="clears throat" iterated="n"/> <pause dur="3.2"/><kinesic desc="changes slide" iterated="n"/> so <pause dur="1.7"/> consider what happens as we get more data <pause dur="1.6"/> basically error factor which is based one over the number of cases <pause dur="0.4"/> gets smaller because if we get more data we observe more cases <pause dur="0.4"/> and so one divided by a number which is increasing <pause dur="0.4"/> gets smaller <pause dur="0.8"/> the error factor gets smaller <pause dur="0.5"/> we multiply the observed value by the error factor which is

getting smaller <pause dur="1.0"/> and the confidence interval gets narrower <pause dur="0.7"/><vocal desc="sniff" iterated="n"/> <pause dur="0.8"/> so for example <pause dur="0.2"/> again if we've observed two-hundred new cases of diabetes in a population of forty-thousand people <pause dur="0.2"/> over a year <pause dur="0.9"/> then the estimated rate is the same as before <pause dur="1.1"/> but the error factor is smaller because we've observed two-hundred cases not fifty cases <pause dur="1.2"/> so the error factor using that formula is one-point-one-five <pause dur="0.8"/> and that upper and lower ninety-five per cent confidence limits <pause dur="0.3"/> are as given <pause dur="0.9"/> and we have # <pause dur="1.3"/> a confidence interval <pause dur="0.5"/> of now four-point-three to five-point-eight rather than <pause dur="0.6"/> three-point-six three-point-eight to six-point-seven <pause dur="0.7"/> so we've got more data <pause dur="0.5"/> our error factor has got smaller <pause dur="0.2"/> and our confidence interval has got # <pause dur="0.2"/> more narrow <pause dur="5.6"/><kinesic desc="changes slide" iterated="n"/> so the confidence interval reflects our uncertainty about the true value of something so it's it could be an <trunc>infi</trunc> incidence a population prevalence or an average height or whatever <pause dur="1.6"/> but you should remember that it's not a value <pause dur="0.2"/> # not a range in

which ninety-five per cent of the observations lie <pause dur="2.2"/><kinesic desc="changes slide" iterated="n"/> and you can illustrate that quite easily if we split up the data from a few slides back <pause dur="0.8"/> so if we have fifty cases on two-thousand people over five years <pause dur="0.7"/> if we consider the number of cases in each of those five years that's what's given in the table so in the first year we observe <trunc>th</trunc> # thirteen cases <pause dur="0.5"/> second year ten cases and so on <pause dur="2.1"/> # the confidence interval for that data was three-point-eight to six-point-seven <pause dur="0.7"/> but you can see that the incidence rate for # years three four and five <pause dur="0.8"/> where the rate is point-zero-zero-three point-zero-zero-seven and point-zero-zero-three-five <pause dur="0.7"/> is outside that range so sixty per cent of our observed observations are falling outside that range <pause dur="1.3"/> just an illustration <pause dur="4.0"/><kinesic desc="changes slide" iterated="n"/> another example looking at the heights of fifty students <pause dur="1.0"/> can calculate the observed mean height <pause dur="0.4"/> and the confidence interval <pause dur="0.8"/> # using the appropriate formula <pause dur="1.6"/> but ninety-five per cent of our students # fall between <pause dur="0.4"/>

one-point-five-five metres and one-point-eight-five metres <pause dur="0.2"/> in height <pause dur="0.8"/> <vocal desc="sniff" iterated="n"/> <pause dur="0.4"/> # <pause dur="1.1"/> so we find the range in which ninety-five per cent of our students <pause dur="0.2"/> lie by inspection of our data in that case <pause dur="1.6"/> and you can see that the two ranges aren't the same <pause dur="0.2"/> one is called the reference range which is different from the confidence interval <pause dur="0.7"/> and it's important that you remember that <pause dur="5.5"/><kinesic desc="changes slide" iterated="n"/> another quick example if we're interested in a rate ratio <pause dur="1.1"/> so in the first population we observe D-one cases # in in P-one person years <pause dur="0.4"/> and in the second we observe D-two cases in P-two person years <pause dur="0.6"/> <vocal desc="sniff" iterated="n"/> <pause dur="0.4"/> # calculate the observed rate ratio easily <pause dur="0.8"/> and the error factor using the appropriate formula page one-twenty-five <pause dur="5.7"/><kinesic desc="changes slide" iterated="n"/> estimation versus hypothesis testing you should note that estimation is <pause dur="0.2"/> more informative than hypothesis testing <pause dur="1.2"/> # it can incorporate a hypothesis test <pause dur="1.6"/> quick drink <pause dur="6.5"/> <event desc="drinks" iterated="n"/> so it's

actually more useful to know <pause dur="0.5"/> something about the plausible size of a difference than knowing only that there is a difference <pause dur="0.7"/> our hypothesis test can tell us whether or not our data is consistent with there being a difference <pause dur="0.4"/> but it can't tell us how big that difference is <pause dur="2.3"/> so carry on with the rate ratio example <pause dur="0.4"/> if in population A we have twelve cases in two-thousand person years <pause dur="0.6"/> and population B we have sixteen cases in four-thousand person years <pause dur="1.0"/> then we can calculate the rates per thousand person years in the usual way <pause dur="0.3"/> and the ratio <pause dur="0.3"/> of A to B being one-point-five <pause dur="3.4"/><kinesic desc="changes slide" iterated="n"/> obviously in our hypothesis <pause dur="0.7"/> # <pause dur="1.3"/> if the rates are the same then the ratio will be one <pause dur="0.4"/> <vocal desc="sniff" iterated="n"/> <pause dur="0.2"/> and the observed ratio of the rates in our <pause dur="0.2"/> data example <pause dur="0.5"/> is one-point-five <pause dur="0.7"/> excuse me <pause dur="2.1"/> # we used the formula <pause dur="0.3"/> on page one-point-five <pause dur="0.2"/> # is it one-point-five page one-twenty-five to calculate the error factor <pause dur="0.7"/> note that includes the # <pause dur="0.6"/> observed events in both populations not just in the one <pause dur="1.5"/>

and the error factor for that example <pause dur="0.2"/> # is is easily shown to be two-point-one-five <pause dur="0.5"/> so we can use the usual way to calculate the ninety-five per cent <pause dur="0.3"/> confidence interval for the rate ratio <pause dur="1.0"/> # and that gives us the range of point-seven-nought to three-point-two-three <pause dur="3.1"/> now that includes the # <pause dur="1.6"/> value of one <pause dur="0.6"/> which is that the rates are the same <pause dur="1.6"/> so from that we can conclude that the observed data we've <pause dur="0.2"/> based that confidence interval on <pause dur="0.5"/> are consistent with the null hypothesis that the rates are the same <pause dur="0.6"/> because the confidence interval <pause dur="0.4"/> includes the # null value of one <pause dur="0.6"/> we conclude <pause dur="0.2"/> that the data is # consistent <pause dur="0.3"/> with that hypothesis <pause dur="0.9"/> so we can't reject that hypothesis at the five per cent level <pause dur="2.0"/> note that it doesn't prove <pause dur="0.2"/> that the null hypothesis is true <pause dur="1.2"/> # <pause dur="0.3"/> and if you think about that for a while you'll note that the range that we've got there point-seven to three-point-two <pause dur="0.7"/> also includes the value of three <pause dur="0.9"/> so if we tested the hypothesis that the # rate ratio was three <pause dur="0.6"/>

then we wouldn't reject that hypothesis either <pause dur="2.8"/> so that just shows you that it doesn't prove that the null hypothesis <pause dur="0.4"/> of # the ratios being the same <pause dur="0.3"/> is true <pause dur="0.7"/> it merely says that the data is not inconsistent with it <pause dur="5.8"/><kinesic desc="changes slide" iterated="n"/> another example using the rate ratio this one is where the data are inconsistent with the null hypothesis <pause dur="0.9"/> the confidence interval calculated i'll not <pause dur="0.3"/> trawl through the # <pause dur="0.4"/> algebra again <pause dur="0.5"/> the confidence interval there is one-point-four to two-point-eight-six and that does include the null value of one <pause dur="1.5"/> so in this case we can reject <pause dur="0.6"/> the # <pause dur="0.3"/> sorry that doesn't include the null value of one <pause dur="1.0"/> <gap reason="name" extent="1 word"/> students are looking a bit confused there <vocal desc="sniff" iterated="n"/> <pause dur="0.3"/> # it doesn't include the null value of one <pause dur="0.3"/> and so we can't # <pause dur="0.2"/> reject <pause dur="0.3"/> the # we can reject the null hypothesis <pause dur="5.1"/><kinesic desc="changes slide" iterated="n"/> # <pause dur="0.5"/> today you'll have done a bit of inference on # <pause dur="0.6"/> standardized mortality ratios <pause dur="0.4"/> <vocal desc="sniff" iterated="n"/> <pause dur="0.3"/> again that just runs through the formula this is on page

one-twenty-six of your lecture notes <pause dur="1.1"/> we observe O deaths we expect E deaths <pause dur="0.4"/> based on age-specific rates and standard population <pause dur="0.4"/> age-specific population sizes in the study population <pause dur="0.9"/> we can calculate our observed S-M-R <pause dur="0.7"/> and the error factor <pause dur="0.4"/> twice the square root of one over the observed <pause dur="0.3"/> exponentiated <pause dur="2.1"/> and # set up our confidence interval <pause dur="3.5"/><kinesic desc="changes slide" iterated="n"/> so suppose we put some data in we # we have # <pause dur="0.2"/> we expect fifty deaths in our study population <pause dur="0.7"/> and we observe sixty deaths <pause dur="0.7"/> then our observed S-M-R is one-twenty <pause dur="0.7"/> error factor one-point-two-nine and our ninety-five confidence interval <pause dur="0.3"/> is ninety-three to one-fifty-five <pause dur="1.7"/> this includes a hundred <pause dur="0.4"/> # remember if the observed equals the expected <pause dur="0.3"/> then the S-M-R would be a hundred <pause dur="0.8"/> and so we wouldn't reject the null hypothesis <pause dur="1.8"/> note that it also includes values as high as fifty per cent excess deaths so it doesn't <trunc>in</trunc> it doesn't prove the quality either <pause dur="0.8"/> again the same same argument <pause dur="1.1"/> as before <pause dur="4.5"/><kinesic desc="changes slide" iterated="n"/> so a summary quite an extensive summary for

this lecture <pause dur="1.0"/> <vocal desc="sniff" iterated="n"/> <pause dur="1.0"/> # all <trunc>obser</trunc> observations are subject to random variation we've seen several examples of data <pause dur="0.2"/> which have fluctuated usually around some underlying tendency <pause dur="1.1"/> and we're always interested in the underlying tendency <pause dur="1.4"/> we can use the data that we observe to test hypothesis <pause dur="0.2"/> hypotheses <pause dur="0.5"/> about underlying values which gives us an idea of whether data is consistent or not <pause dur="0.5"/> with # what we believe <pause dur="1.2"/> and we can also use our # <pause dur="0.2"/> observed data to estimate our underlying tendency <pause dur="1.5"/> that we're interested in <pause dur="1.9"/> in this course the best estimate of the true value <pause dur="0.3"/> # of underlying tendency is the observed value <pause dur="1.8"/> but we also want # an idea of how that varies <pause dur="0.3"/> # to take into account the the the nature of random variation <pause dur="0.4"/> just to give a single number for something would be rather naive <pause dur="1.2"/> and we express the uncertainty again in this course by calculating error factors and deriving confidence intervals <pause dur="0.8"/> and remember that the definition of

a ninety-five per cent confidence interval is the range which includes the true value of the statistic of <trunc>intre</trunc> <pause dur="0.3"/> interest <pause dur="0.4"/> with probability of ninety-five per cent or point-nine-five if you like it that way <pause dur="1.8"/> you can also look at it as the range of true values <pause dur="0.2"/> which is consistent with the observed data <pause dur="1.9"/> so if # different values are consistent with the observed data <pause dur="0.6"/> that would lead us to different <pause dur="0.2"/> different conclusions but you can only be uncertain <pause dur="0.2"/> what to conclude <pause dur="3.4"/><kinesic desc="changes slide" iterated="n"/> another summary slide <pause dur="1.2"/> we have two populations with incidence rates point-zero-zero-eight point-zero-zero-two <pause dur="0.8"/> our rate ratio is four error factor is two <pause dur="0.7"/> so our ninety-five per cent confidence interval is two to eight <pause dur="1.2"/> all the values in the ninety-five per cent confidence interval suggest that the rate in A is higher than the rate in B because it doesn't include the null value of one <pause dur="2.0"/> we can # <pause dur="1.0"/> safely conclude that A is higher than B <pause dur="3.2"/> so the rate ratio is significantly different from one <pause dur="0.3"/> at # <pause dur="0.7"/> the

five per cent level <pause dur="5.4"/><kinesic desc="changes slide" iterated="n"/> a further example <pause dur="0.2"/> rate ratios again the rate ratio <trunc>g</trunc> # is # <pause dur="0.4"/> two the error factor is four <pause dur="0.7"/> and the ninety-five per cent confidence interval is <pause dur="0.2"/> point-five to point to eight sorry <pause dur="2.2"/> the values in the ninety-five per cent confidence interval in this case are consistent with A being much higher than B <pause dur="0.5"/> A being lower than B or both the same <pause dur="0.7"/> in other words # <pause dur="1.5"/> there are values in that confidence interval which are greater than or less than one <pause dur="0.3"/> and one is also included <pause dur="1.9"/> we can't really be <pause dur="0.2"/> # too firm about our conclusions <pause dur="0.8"/> in that case the ninety-five per cent <trunc>confisa</trunc> confidence interval does include our null <pause dur="0.4"/> value <pause dur="0.3"/> # the value of one <pause dur="0.6"/> so we can say that the rate ratio is not significantly different from one <pause dur="0.3"/> in that case <pause dur="0.9"/> but it doesn't prove the quality <pause dur="0.2"/> again again we have values of up to eight in that confidence interval <pause dur="0.7"/> so the data is also consistent

with <pause dur="0.2"/> with # hypotheses that extreme <pause dur="2.8"/><kinesic desc="changes slide" iterated="n"/> so things to remember <pause dur="0.5"/> # variation <pause dur="0.2"/> always exists people are different we should all know that without too much uncertainty <pause dur="1.7"/> and because of that variation our underlying <trunc>dat</trunc> # our observed data <pause dur="0.5"/> is different from our underlying tendency <pause dur="1.4"/> you need to have an appreciation of what sources of variation might be <pause dur="0.9"/> why why things differ <pause dur="0.5"/> and # be able to test hypotheses about true values and set up confidence intervals <pause dur="0.3"/> using the formula given in your lecture notes <pause dur="2.9"/> so that's it for today <pause dur="0.3"/> # <pause dur="0.2"/> i believe someone has an announcement to make i don't know where are you <pause dur="0.7"/> she's there do you need to make it to <gap reason="name" extent="1 word"/> students as well </u><pause dur="0.3"/> <u who="sf0278" trans="pause"> no </u><pause dur="0.2"/> <u who="nf0274" trans="pause"> no okay this is just for <gap reason="name" extent="1 word"/> students so <pause dur="0.9"/> there we are thanks very much