Skip to main content

pslct036

<?xml version="1.0"?>

<!DOCTYPE TEI.2 SYSTEM "base.dtd">

<TEI.2><teiHeader>

<fileDesc>

<titleStmt>

<title>Significance tests</title></titleStmt>

<publicationStmt><distributor>BASE and Oxford Text Archive</distributor>

<idno>pslct036</idno>

<availability><p>The British Academic Spoken English (BASE) corpus was developed at the

Universities of Warwick and Reading, under the directorship of Hilary Nesi

(Centre for English Language Teacher Education, Warwick) and Paul Thompson

(Department of Applied Linguistics, Reading), with funding from BALEAP,

EURALEX, the British Academy and the Arts and Humanities Research Board. The

original recordings are held at the Universities of Warwick and Reading, and

at the Oxford Text Archive and may be consulted by bona fide researchers

upon written application to any of the holding bodies.

The BASE corpus is freely available to researchers who agree to the

following conditions:</p>

<p>1. The recordings and transcriptions should not be modified in any

way</p>

<p>2. The recordings and transcriptions should be used for research purposes

only; they should not be reproduced in teaching materials</p>

<p>3. The recordings and transcriptions should not be reproduced in full for

a wider audience/readership, although researchers are free to quote short

passages of text (up to 200 running words from any given speech event)</p>

<p>4. The corpus developers should be informed of all presentations or

publications arising from analysis of the corpus</p><p>

Researchers should acknowledge their use of the corpus using the following

form of words:

The recordings and transcriptions used in this study come from the British

Academic Spoken English (BASE) corpus, which was developed at the

Universities of Warwick and Reading under the directorship of Hilary Nesi

(Warwick) and Paul Thompson (Reading). Corpus development was assisted by

funding from the Universities of Warwick and Reading, BALEAP, EURALEX, the

British Academy and the Arts and Humanities Research Board. </p></availability>

</publicationStmt>

<sourceDesc>

<recordingStmt>

<recording dur="00:41:44" n="6656">

<date>28/11/2002</date><equipment><p>video</p></equipment>

<respStmt><name>BASE team</name>

</respStmt></recording></recordingStmt></sourceDesc></fileDesc>

<profileDesc>

<langUsage><language id="en">English</language>

</langUsage>

<particDesc>

<person id="nm0940" role="main speaker" n="n" sex="m"><p>nm0940, main speaker, non-student, male</p></person>

<personGrp id="ss" role="audience" size="l"><p>ss, audience, large group </p></personGrp>

<personGrp id="sl" role="all" size="l"><p>sl, all, large group</p></personGrp>

<personGrp role="speakers" size="3"><p>number of speakers: 3</p></personGrp>

</particDesc>

<textClass>

<keywords>

<list>

<item n="speechevent">Lecture</item>

<item n="acaddept">Statistics</item>

<item n="acaddiv">ps</item>

<item n="partlevel">UG2</item>

<item n="module">Mathematical Statistics</item>

</list></keywords>

</textClass>

</profileDesc></teiHeader><text><body>

<u who="nm0940"> today's lecture follows a <pause dur="0.6"/> # very <pause dur="0.2"/> directly on from what i was saying <pause dur="0.5"/> # yesterday and i'm afraid only one overhead is working so i'll have to be over here all the time <pause dur="1.1"/> # <pause dur="0.5"/> # yesterday i was introducing the idea of significance <pause dur="0.5"/> and # <pause dur="0.8"/> i i talked about five key ideas which i'm going to run over very quickly to start with today <pause dur="0.4"/> and then today i'm going to add two more new ideas <trunc>t</trunc> <pause dur="0.3"/> on the end <pause dur="1.0"/> well the five key ideas that i was talking about yesterday <pause dur="0.4"/> was # the null hypothesis <pause dur="0.2"/> the <pause dur="0.4"/> # test statistic <pause dur="0.6"/> the # <pause dur="0.6"/> P-value the critical region <pause dur="0.4"/> and a significant level <pause dur="0.5"/> you know so those are five key ideas <pause dur="0.2"/> that we've got to <pause dur="0.5"/> get on board <pause dur="0.2"/> before # we can we can go any further really so let me <trunc>qui</trunc> quickly review <pause dur="0.4"/> # these so first of all <pause dur="1.4"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> the null hypothesis i'll just mention verbally what they are null hypothesis H-sub-zero <pause dur="0.2"/> so # a rather at a rather general level the null hypothesis is just some statement <pause dur="0.2"/> about the distribution of the data we have observed <pause dur="2.3"/> # test

statistic <pause dur="1.8"/> that's the notation i used yesterday a test statistic <pause dur="0.2"/> is a function of the data <pause dur="0.3"/> which we look at in order to inform us <pause dur="0.3"/> about whether we think H-nought is reasonable or not <pause dur="0.3"/> in the light of the data so usually the test statistic <pause dur="0.2"/> is some kind of difference <pause dur="0.3"/> between what we've observed <pause dur="0.2"/> and what we would expect under the null hypothesis and the idea is that the the bigger the value of the test statistic <trunc>i</trunc> is <pause dur="0.2"/> the more doubt we have that H-nought is actually reasonable in the light of the data <pause dur="0.6"/> so if we're going to be accepting and rejecting H-nought which we're going to be <pause dur="0.3"/> doing with ever increasing frequency as we go along <pause dur="0.2"/> # <pause dur="0.2"/> the the logic is that we will want to reject H-nought think H-nought is unreasonable <pause dur="0.2"/> when T is large <pause dur="0.7"/> and the larger it is <pause dur="0.3"/> the more <pause dur="0.2"/> unreasonable we think H-nought is and that is encapsulated by the <pause dur="0.9"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> P-value the third thing <pause dur="0.3"/> P-value is the <kinesic desc="writes on transparency" iterated="y" dur="3"/> probability <pause dur="1.7"/> that by chance <pause dur="0.8"/> if the null hypothesis is

true <pause dur="0.2"/> i would get a value of the test statistic <pause dur="0.3"/> greater than than or equal to the one i actually got <kinesic desc="writes on transparency" iterated="y" dur="4"/> so the notation <pause dur="1.3"/> that i talked about <pause dur="0.8"/> yesterday <pause dur="0.7"/> that's the probability <pause dur="0.2"/> so <pause dur="0.3"/> # we just pretend for a moment that the null hypothesis actually is true <pause dur="0.3"/> and we ask ourselves well what's the chance <pause dur="0.2"/> that we would have come up with a fairly good test statistic <pause dur="0.2"/> which <trunc>i</trunc> which is as or more extreme <pause dur="0.4"/> than the one that's # one we've actually got notice the convention capital letters <pause dur="0.2"/> and small letters that i keep on going on about <pause dur="0.4"/> # well you really see the importance of that <pause dur="0.4"/> convention here <pause dur="1.2"/> so that's the # <pause dur="0.3"/> # <pause dur="0.8"/> that's the P-value <pause dur="0.2"/> critical <trunc>va</trunc> critical region <kinesic desc="writes on transparency" iterated="y" dur="2"/> was the fourth <pause dur="1.3"/> idea i talked about yesterday so <pause dur="0.2"/> # if we had a significance test for that sum that sum procedure for deciding <pause dur="0.3"/> on the basis of X whether or not <pause dur="0.3"/> H-nought is reasonable or

unreasonable <pause dur="0.3"/> the critical region is just <pause dur="0.4"/> the set of all possible sets of data that would lead us to think <pause dur="0.2"/> that H-nought was unreasonable <pause dur="0.2"/> or in the jargon <pause dur="2.3"/><kinesic desc="writes on transparency" iterated="y" dur="3"/> it would # <pause dur="0.6"/> sets of data which would lead us to reject <pause dur="0.7"/> H-nought or <kinesic desc="writes on transparency" iterated="y" dur="5"/> if you like to decide <pause dur="0.5"/> if H-nought <pause dur="0.5"/> is <pause dur="1.8"/> false <pause dur="1.8"/> so # <pause dur="0.8"/> so the first thing i'm really going to do today formally is just to <pause dur="0.2"/> just to point out that the the sensible critical region should be those values of the data X such that the P-value's sufficiently small <pause dur="1.1"/> so that's the critical region that's that's a a general concept applied with any decision procedure we might <pause dur="0.5"/> # propose <pause dur="0.6"/> and # <kinesic desc="writes on transparency" iterated="y" dur="1"/> the significance level alpha <pause dur="0.8"/> everybody uses that notation <kinesic desc="writes on transparency" iterated="y" dur="9"/> the significance level is the probability <pause dur="0.6"/> that if <pause dur="2.5"/> if H-nought is true <pause dur="1.6"/> that we would by chance <pause dur="0.4"/> get <pause dur="0.3"/> a data set <pause dur="0.3"/> that was in the critical region <pause dur="1.1"/> so these were the five things that i <pause dur="0.3"/> i talked about <pause dur="0.3"/> # <pause dur="0.2"/> yesterday and these are five

key ideas <pause dur="0.3"/> # that understanding what these are <pause dur="0.3"/> is absolutely mandatory if we're going to get anywhere at all <pause dur="0.3"/> in talking about # significance <pause dur="1.3"/> so first things first thing the first thing i want to do today is to <pause dur="0.3"/> is to formally <pause dur="0.3"/> # try and explain what the connection is <pause dur="0.2"/> between the P-value <pause dur="0.3"/> and the significant level <pause dur="0.7"/> and the idea is <pause dur="0.4"/> i've already mentioned it very briefly idea is that <pause dur="0.3"/> that a sensible critical region <pause dur="0.3"/> should be the sets of data whose P-values are less than something <pause dur="0.9"/> and that <gap reason="inaudible" extent="1 sec"/> <pause dur="0.3"/> is none other than the alpha <pause dur="0.9"/> so i'm going to set that <unclear>round to</unclear> <pause dur="1.8"/><kinesic desc="writes on transparency" iterated="y" dur="3"/> as a theorem <pause dur="1.3"/> although you can really hardly justify it in <trunc>term</trunc> <pause dur="0.2"/> in a mathematical sense <pause dur="0.6"/> the theorem says <pause dur="6.1"/><kinesic desc="writes on transparency" iterated="y" dur="32"/> the significance test <pause dur="2.1"/> given by <pause dur="1.4"/> this in particular <pause dur="1.0"/> critical region and the critical region we're thinking about is the sets of data such that the P-value is less than the threshold <pause dur="0.7"/> so

it's the set of <pause dur="0.4"/> data X such that the P-value <pause dur="1.4"/> corresponding to <pause dur="1.1"/> X is less than or equal to <pause dur="0.9"/> a threshold alpha <pause dur="0.2"/> and the theorem says <pause dur="0.3"/> that if we take that as the critical region in other words if we <pause dur="0.4"/> decide H-nought is false when the P-value is less than alpha <pause dur="0.3"/> that is precisely a significance test <pause dur="0.2"/> with significance level <pause dur="0.2"/> alpha <pause dur="2.7"/><kinesic desc="writes on transparency" iterated="y" dur="9"/> so this thing has <pause dur="4.5"/> significance level <pause dur="1.9"/> alpha <pause dur="4.7"/> and the proof of it <pause dur="1.6"/> is is is <pause dur="0.2"/> <trunc>w</trunc> once you see the idea if you prove it's completely trivial and <trunc>i</trunc> and it's <trunc>easi</trunc> easier to rather rather than trying to write write it out formally <pause dur="0.4"/> it it's easiest just to <pause dur="0.9"/><kinesic desc="writes on transparency" iterated="y" dur="9"/> # look at a picture so let's think of <pause dur="0.3"/> let's think of the values of the test statistic <pause dur="2.2"/> so so # here's

the distribution <pause dur="0.3"/> of the test statistic <pause dur="0.2"/> we would expect to see <pause dur="0.9"/> if the null hypothesis is true <pause dur="1.1"/> so <pause dur="0.4"/> looks like a normal distribution of course it doesn't have to be so we've got some distribution for T under H-nought now <pause dur="0.2"/> what what are we what are we doing with the P-value what we do is we <pause dur="0.3"/> we <kinesic desc="writes on transparency" iterated="y" dur="15"/> locate on the scale <pause dur="0.4"/> of the test statistic we locate <pause dur="0.5"/> a particular value <pause dur="0.4"/> of the test statistic we've observed so notice a little-X here <pause dur="0.2"/> big-X is the general one <pause dur="0.8"/> so # <pause dur="0.8"/> so we locate the value of the <pause dur="0.2"/> test statistic we've actually observed <pause dur="0.3"/> and then <pause dur="1.3"/><kinesic desc="writes on transparency" iterated="y" dur="7"/> from the definition <pause dur="0.2"/> which you can <pause dur="0.3"/> just see <kinesic desc="indicates point on screen" iterated="n"/> still at the top there on the screen <pause dur="0.2"/> the P-value <pause dur="1.0"/> is this area isn't it it's the probability <pause dur="2.2"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> it's the probability <pause dur="0.2"/> of for getting by chance <pause dur="0.2"/> a value of T at capital-X bigger <pause dur="0.3"/> than T <pause dur="0.4"/> at little-X so what we have to show <pause dur="0.5"/> is that if we consider all possible sets of data <pause dur="0.4"/> such that <kinesic desc="indicates point on transparency" iterated="n"/>

this area is less than alpha <pause dur="1.1"/> then that has significance level <pause dur="0.2"/> alpha <pause dur="0.4"/> so what we have to show <pause dur="1.1"/><event desc="changes pen" iterated="n"/> and this black thing's getting a bit <pause dur="1.2"/> worn out with this i'm going to do it in blue now <kinesic desc="writes on transparency" iterated="y" dur="1"/> what what what we have to show <pause dur="0.7"/> is that the probability <pause dur="2.3"/><kinesic desc="writes on transparency" iterated="y" dur="25"/> of X belonging to the critical region <pause dur="0.9"/> calculated under the <pause dur="1.1"/> assumption of H-nought is equal to alpha is what we have to show <pause dur="1.0"/> so what is this so from the <pause dur="0.5"/> from the definition of C in the same theorem this is the probability <pause dur="1.0"/> that P-X <pause dur="0.8"/> is less than or equal to alpha <pause dur="1.8"/> by definition of what C is <pause dur="1.8"/> and that this is this is now where we use the picture when is <pause dur="1.1"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> when is <kinesic desc="indicates point on transparency" iterated="n"/> this P-value this area i shaded <pause dur="0.2"/> when is this less than alpha <pause dur="0.8"/> well <trunc>o</trunc> <pause dur="0.2"/> obviously it's less than alpha <pause dur="0.2"/> if the T <pause dur="0.4"/> if if the # left-hand end of the shaded area <pause dur="0.2"/> is beyond the one-minus-alpha quantile <pause dur="0.5"/> remember about quantiles <pause dur="0.2"/> so this is exactly the same <kinesic desc="writes on transparency" iterated="y" dur="2"/> thing <unclear>as i say</unclear> <pause dur="0.3"/> that the <pause dur="0.5"/> this is the

probability <pause dur="0.4"/> that the value T i get <pause dur="0.4"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> is greater than or equal to <pause dur="1.0"/> the point <kinesic desc="indicates point on transparency" iterated="n"/> under this distribution <pause dur="0.2"/> which cuts off area alpha to the right and that's the <pause dur="1.1"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> notation we used before that's the one-minus-alpha quantile <pause dur="0.4"/> of the distribution at T and that <kinesic desc="writes on transparency" iterated="y" dur="1"/> everything is calculated <pause dur="0.2"/> under the assumption <pause dur="0.2"/> of the <pause dur="0.5"/> null hypothesis okay so Q <pause dur="1.1"/><kinesic desc="writes on transparency" iterated="y" dur="7"/> Q this is the one-minus alpha <pause dur="2.3"/> quantile <pause dur="1.2"/> of T <pause dur="1.1"/> that's what Q is <pause dur="1.8"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> so what is the chance now <pause dur="0.3"/> that a random variable <pause dur="0.2"/> by <trunc>cha</trunc> by chance <pause dur="0.2"/> will give you a value greater than or equal to the one-minus alpha quantile <pause dur="0.3"/> well <pause dur="0.7"/> the the the suffix one-minus-alpha means you have a chance one-minus-alpha to the left <pause dur="0.4"/> so the chance to the right is therefore one minus one-minus-alpha <pause dur="0.2"/> which is alpha-<pause dur="1.7"/>T <kinesic desc="writes on transparency" iterated="y" dur="1"/> okay that's the proof <pause dur="0.3"/> it's kind of trivial really it's just a matter

of understanding <pause dur="0.2"/> what probabilities we're <pause dur="0.2"/> we're we're we're we're talking about <pause dur="1.3"/> so # so this shows that this significance test <pause dur="0.2"/> rejecting H-nought when the P-value is less than alpha <pause dur="0.5"/> is a significance test <pause dur="0.3"/> with significance level <pause dur="1.0"/> alpha now in a sense this is extremely subtle because we <trunc>n</trunc> we <unclear>are</unclear> two interpretations <pause dur="0.5"/> of of alpha <pause dur="0.3"/> i'm sorry i don't have another <shift feature="voice" new="laugh"/> overhead to <shift feature="voice" new="normal"/><pause dur="0.8"/> switch over there so i hope you're not losing too much of this on the screen <pause dur="0.6"/> okay can i just <kinesic desc="writes on transparency" iterated="y" dur="30"/> emphasize this by <pause dur="1.9"/> just noting up here <pause dur="1.9"/> that this theorem <pause dur="0.3"/> gives us <pause dur="0.2"/> two interpretations <pause dur="0.4"/> of this quantity alpha <pause dur="2.4"/> the first interpretation <pause dur="1.4"/> is the interpretation <pause dur="1.5"/> <gap reason="inaudible" extent="1 sec"/> just see it in the top there the first interpretation <pause dur="0.2"/> is alpha is a threshold <pause dur="0.3"/> for the P-value <pause dur="2.3"/> so this is a threshold <pause dur="2.2"/> for P-value <pause dur="0.8"/> and the P-value <pause dur="0.3"/> is a measure of how surprised we are <pause dur="0.6"/> to get a particular value of our test statistic so so # <kinesic desc="indicates point on transparency" iterated="n"/> this interpretation <pause dur="0.6"/> is is is saying something like alpha is a measure of how

surprised we've got to be <pause dur="0.3"/> to reject <pause dur="0.3"/> H-nought <pause dur="4.9"/><kinesic desc="writes on transparency" iterated="y" dur="20"/> so i'll write that in <pause dur="0.3"/> so how surprised <pause dur="3.1"/> we have <pause dur="2.1"/> got <pause dur="0.6"/> to be <pause dur="4.7"/> before we reject <pause dur="1.5"/> H-nought by reject i mean declare it to be false or think it to be <pause dur="0.5"/> <gap reason="inaudible" extent="1 sec"/><pause dur="0.3"/> okay so first interpretation then <pause dur="0.3"/> threshold of the P-value how surprised <pause dur="0.2"/> we have to be with the data set in front of us <pause dur="0.2"/> in order to think that H-nought is false <pause dur="0.6"/> so it's a kind of measure of surprise it's a measure of how we think <pause dur="0.3"/> about the data <pause dur="1.2"/> now the second interpretation <pause dur="0.3"/> is given by <kinesic desc="indicates point on screen" iterated="n"/> the result of the theorem that alpha is a significance level now what is that <pause dur="0.2"/> that is an error rate <pause dur="1.8"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> so this is the probability <pause dur="0.9"/> that we're in the critical region given H-nought is true now what do we mean to say <pause dur="0.4"/> the test that it the data in the critical region what we mean is that we reject <pause dur="0.4"/> or decide H-nought

is false <pause dur="0.6"/> so when alpha <pause dur="1.5"/><kinesic desc="writes on transparency" iterated="y" dur="16"/> is a probability <pause dur="1.5"/> so <pause dur="1.2"/> that you'll decide H-nought is false <pause dur="1.3"/> and that but that probability is calculated <pause dur="0.2"/> on the assumption <pause dur="1.0"/> that H-nought <pause dur="1.9"/> is true <pause dur="0.8"/> so you know it's an error rate it it <trunc>t</trunc> tells you how often you make a mistake <pause dur="0.6"/> if you use this <gap reason="inaudible" extent="1 sec"/> for a significance test <pause dur="0.4"/> given by comparing P-values <pause dur="0.2"/> with this threshold <pause dur="0.8"/> and so this is a <pause dur="1.7"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> this is an error rate <pause dur="0.7"/> it's a natural <pause dur="0.4"/> long run frequency probability <pause dur="1.5"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> and # in order to exercise this error rate <pause dur="0.4"/> # <pause dur="0.2"/> idea <pause dur="1.3"/> a little bit more <pause dur="1.0"/> carefully some people # use the jargon type one error rate <pause dur="7.9"/><kinesic desc="writes on transparency" iterated="y" dur="5"/> it's a type one error rate now <pause dur="0.2"/> # why it's called type one error rate will be clear hopefully in twenty minutes' time <pause dur="0.3"/> when i've talked about <shift feature="voice" new="laugh"/>a

type <shift feature="voice" new="normal"/>two error rate <pause dur="0.6"/> but it's a type one error rate that's because it's just <pause dur="0.2"/> it's about making an error in one direction only <pause dur="0.8"/> if there if H-nought is true what's the chance we think it's false <pause dur="0.2"/> of course you might make an error the other way round as well <pause dur="0.5"/> and that's what i'm going to talk about later but for the moment it's just one <pause dur="1.0"/> thing okay so two two subtly different interpretations <pause dur="0.3"/> at this point here alpha <pause dur="1.9"/> well let me talk about an example now <pause dur="4.3"/><kinesic desc="writes on transparency" iterated="y" dur="4"/> just to reinforce again this idea of what H-nought is what a test statistic is and what P-values are et cetera <pause dur="1.4"/> and again i'm sorry <kinesic desc="indicates overhead projector" iterated="n"/> this silly thing isn't working <pause dur="1.0"/> <gap reason="inaudible" extent="1 sec"/> well this is an example in genetics and it it's an example of historical <pause dur="0.6"/> # <pause dur="0.3"/> interest i think i've mentioned once or twice already <pause dur="0.6"/> that # early work in <trunc>bio</trunc> biology particularly in quantitative genetics <pause dur="0.5"/> # <pause dur="0.2"/> <trunc>ha</trunc> had a very informative role <pause dur="0.3"/> in the early development of statistics <pause dur="0.9"/> and # in example five which i <pause dur="0.3"/> gave

out to you the other day <pause dur="1.2"/> # <pause dur="0.3"/> # there there there is a rather famous experiment from Charles Darwin which i've put in one of the questions for you to look at <pause dur="0.7"/> and here's another famous experiment done <pause dur="0.4"/> by a geneticist in # the early nineteen-twenty <pause dur="0.2"/> and and and this this work is done by somebody <pause dur="0.8"/> called # Frets who was a geneticist <pause dur="0.4"/> and published his paper in nineteen-twenty-one <pause dur="0.2"/> and # Frets was interested in this # question <pause dur="0.2"/> of the inheritance <pause dur="0.2"/> of human characteristics <pause dur="0.5"/> we all we're all very familiar with the idea <pause dur="0.3"/> that # human characteristics like <pause dur="0.2"/> facial features for example <pause dur="0.2"/> do tend to be inherited <pause dur="0.4"/> how often have we seen <pause dur="0.6"/> mothers and daughters looking really very similar and sisters looking similar and brothers looking similar of course we <pause dur="0.2"/> see that all the time don't we <pause dur="0.2"/> so so there is a strong <pause dur="0.2"/> inheritance in facial characteristics and and Frets was one of the first <pause dur="0.5"/> biologists who really tried to get to grips with this <pause dur="0.3"/> quantitatively he asked the

question well how can we measure how much <pause dur="0.5"/> of facial features are inherited <pause dur="0.2"/> and how much are just <pause dur="0.3"/> random occurrences that nobody can explain <pause dur="0.7"/> and # Frets did a a rather famous experiment he was interested in <pause dur="0.2"/> people's faces and and he <pause dur="0.2"/> he took a number of families and tried to compare faces of different brothers <pause dur="0.2"/> in the same family <pause dur="1.0"/> and # <pause dur="0.9"/> and # <pause dur="0.3"/> he <pause dur="2.7"/><kinesic desc="writes on transparency" iterated="y" dur="7"/> he <trunc>w</trunc> he was particularly interested in <pause dur="1.4"/> head length now length is a kind of funny <pause dur="0.6"/> word to use but the head length is actually the distance between you <trunc>sh</trunc> you shut your mouth you see you start i stop talking <pause dur="0.3"/> and it's the <trunc>dista</trunc> it's the distance between here <pause dur="0.2"/><kinesic desc="demonstrates head length" iterated="n"/> and the top of my head okay and you measure that in millimetres <pause dur="0.2"/> and that's your head length <pause dur="0.7"/> i suppose it's a length if the person's lying down flat with a tape measure <pause dur="0.4"/> it's a kind of natural length isn't it so so what i'm <unclear>trying to</unclear> express here was he measured he measured people's <pause dur="0.2"/> head length he measured this <kinesic desc="demonstrates head length" iterated="n"/> # <pause dur="0.2"/>

i call it height really <pause dur="0.2"/> this measurement anyway <pause dur="0.4"/> and # what he did was he he he found a number of <pause dur="0.3"/> he he he found a number of families a sample of families where where he had <pause dur="0.3"/> # <pause dur="0.2"/> # two or more brothers in the same family <pause dur="0.5"/> and he measured the head heights or head lengths he called them <pause dur="0.4"/> # for the <pause dur="0.2"/> first son and for the second son <pause dur="0.8"/> and he tried to see how similar they were <pause dur="0.9"/> and # basically the idea is to is to show that <pause dur="0.3"/> that that brothers of the same family have faces which are much more similar to each other <pause dur="0.2"/> than just taking the sample generally from the population <pause dur="0.5"/> we know now that that is of course true <pause dur="0.4"/> but a hundred years ago it wasn't obviously true <pause dur="0.3"/> and that's what he tried to do <pause dur="0.4"/> find out so so this is this is just a little extract from what what he did so what so what what he # <pause dur="1.1"/> what what he <pause dur="0.2"/> what he measured which i'll now call X <pause dur="0.7"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> what he measured was the <pause dur="0.6"/> the value of L this this # this dimension of of of the head <pause dur="0.3"/> for the first <pause dur="1.2"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> son in a family <pause dur="1.1"/> and he also measured it for the second son and took the

difference <pause dur="1.7"/><kinesic desc="writes on transparency" iterated="y" dur="5"/> so the particular measure i'm going to talk about from his work <pause dur="1.5"/> is # the difference between the value for the first son <pause dur="0.5"/> minus the value <pause dur="1.3"/> for the second son <pause dur="0.5"/> and he did this for twenty-five families <pause dur="0.7"/> and so he got twenty-five values of X and these are my data <pause dur="0.3"/> and i will tell you in the next few minutes # you know <pause dur="0.4"/> # # talking about now what what question was he interested in well what he's interested in <pause dur="0.4"/> is # how similar these things are in particular one thing he might want to know <pause dur="0.3"/> is # <pause dur="1.0"/> obviously there are differences here 'cause there's an order effect <pause dur="0.2"/> this is the first son this is the second son <pause dur="0.2"/> so maybe <pause dur="0.4"/> # <pause dur="0.7"/> the <trunc>m</trunc> the mother by definition is getting older before she gets her second son you see so maybe the son <unclear>changed</unclear> over time so one thing he looked at was time <unclear>trends</unclear> <pause dur="0.6"/> and so the the question <pause dur="1.1"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> which i want to <pause dur="0.4"/> talk about now is the <trunc>s</trunc> surface question really of his work <pause dur="0.2"/> the

question is <pause dur="0.3"/> # <pause dur="0.7"/><kinesic desc="writes on transparency" iterated="y" dur="23"/> <unclear>okay</unclear> <pause dur="2.4"/> does L <pause dur="1.9"/> tend to get bigger <pause dur="2.3"/> or smaller <pause dur="1.9"/> is there any evidence <pause dur="0.4"/> that the son the first son <pause dur="0.2"/> has a bigger head than the second son or vice versa <pause dur="2.5"/> so if # <pause dur="0.5"/> <trunc>i</trunc> if # if the value of L gets bigger then of course X is negative <pause dur="0.5"/> and if the value of L gets smaller then <pause dur="0.4"/> X is # positive so really he's interested in <pause dur="0.3"/> <trunc>w</trunc> the sign of X essentially <pause dur="1.7"/> so # these are his data <pause dur="1.2"/> so i'm <pause dur="2.1"/> somewhat frustrated i <shift feature="voice" new="laugh"/>don't know <shift feature="voice" new="normal"/>if have a projector 'cause i <trunc>al</trunc> really should have put the data on the other projector so <pause dur="0.3"/> it's all got to go here now i i'm going to arrange them there were twenty-five <pause dur="1.6"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> twenty-five families he measured from these these were his # <pause dur="0.4"/> <trunc>h</trunc> his data <pause dur="1.2"/><kinesic desc="writes on transparency" iterated="y" dur="7"/> and i'm going to arrange these data <pause dur="0.2"/> in the kind of way that statisticians usually arrange data <pause dur="1.3"/> in what's called

a stem and leaf plot <pause dur="1.3"/> and you'll see <pause dur="0.2"/> in a moment why it's a kind of that's a sensible way of writing our data so here are three or four values <pause dur="0.4"/> # minus-nine minus-nine minus-seven minus-six and i write them in order <pause dur="0.2"/> from the <trunc>m</trunc> <pause dur="0.2"/> biggest minus-one to big plus-one you see <pause dur="0.7"/> and then the then then <kinesic desc="writes on transparency" iterated="y" dur="11"/> there were some more families which <pause dur="1.0"/> you you get two minus-four # two minus-fives two minus-fours a minus-three <pause dur="0.6"/> and two minus-ones <pause dur="1.4"/> and then then then there was a nought there was a family where <pause dur="0.3"/> these are millimetre measurements by the way <pause dur="0.3"/> so <kinesic desc="indicates point on transparency" iterated="n"/> this is a family where to the nearest millimetre their <unclear>here might</unclear> be two exactly the same <pause dur="0.7"/><kinesic desc="writes on transparency" iterated="y" dur="20"/> and there was a one and there was a two and there was a four <pause dur="0.9"/> and then there were two <pause dur="0.5"/> families with plus-five <pause dur="0.8"/> seven <pause dur="0.2"/> eight <pause dur="0.8"/> and nine there was a family with ten a family with twelve <pause dur="1.0"/> family with twelve and a family with thirteen <pause dur="1.4"/> and

there was a family with sixteen <unclear>difference</unclear> <pause dur="1.0"/> that's a stem and leaf plot <pause dur="0.6"/> it's just writing down the data but it's kind of ordered in a nice little way <pause dur="0.5"/> ranking them from the smallest to the largest <pause dur="0.2"/> and you notice i've grouped them <pause dur="0.3"/> in <pause dur="0.2"/> class intervals of width five <pause dur="1.6"/> see these are the minus-ten to the minus-five minus-five to nought et cetera <pause dur="0.8"/> and # the the the the advantage of being able to do that is you can immediately spot what the spot what the <pause dur="0.3"/> histogram looks like <pause dur="1.0"/> so if i just draw a <pause dur="1.1"/><kinesic desc="writes on transparency" iterated="y" dur="20"/> tiny little histogram down here you can immediately spot there are four <pause dur="2.0"/> four # observations in the first group <pause dur="0.5"/> there are seven in the second <pause dur="1.2"/> four in the third five <pause dur="0.6"/> in the fourth and then another four <pause dur="0.5"/> and another one see so there's the <pause dur="0.6"/> there's the histogram <pause dur="0.2"/> of the <pause dur="1.2"/> of # there it's a typical sort of histogram you get in biological experiments <pause dur="1.5"/> and # <pause dur="0.4"/> and the

question is we we now want to analyse these data in such a way that we <pause dur="0.2"/> we try and shed light on the question <pause dur="0.6"/> whether <gap reason="inaudible" extent="1 sec"/> genetic longer <pause dur="0.3"/> or <pause dur="0.2"/> or shorter now i want to talk about two <pause dur="0.9"/> two approaches <pause dur="1.3"/> # <pause dur="0.2"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> the first approach <pause dur="1.2"/> is the T-test <pause dur="2.7"/> and it's called the T-test <pause dur="0.6"/> because it's based on the T distribution and it's it's exactly the same really as what i was talking about # last week i <gap reason="inaudible" extent="1 sec"/> when i was talking about <pause dur="0.2"/> confidence intervals with a T <pause dur="0.6"/> distribution so so the first approach is to is to plot a # # a normal distribution over all this and to discuss it all in terms of inference for a normal sum <pause dur="1.5"/> so # <pause dur="0.2"/> what would be <pause dur="0.5"/> a sensible way of <pause dur="0.7"/> thinking about these data then from a <pause dur="0.4"/> normal perspective well <pause dur="0.2"/> # there's # there's my histogram see it's a <pause dur="2.6"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> sort of normal shape <pause dur="1.3"/> not very good really it's a got i mean i # we've only got twenty-five observations so that's as close to as a normal

distribution as <pause dur="0.3"/> in fact as you you could ever get <pause dur="0.9"/> so normality <pause dur="0.7"/> is # <pause dur="0.3"/> probably a reasonable assumption <pause dur="0.2"/> and all biologists assume normality without worrying about it so <pause dur="0.6"/> we will as well <pause dur="0.5"/> so this is a normal has a normal distribution <pause dur="0.4"/> so in the usual notation <pause dur="1.0"/><kinesic desc="indicates point on transparency" iterated="n"/> this thing has mu <pause dur="0.3"/> # mean mu <pause dur="0.6"/> and variance sigma-squared <pause dur="0.3"/> so there's the model <pause dur="1.1"/> so that's an <gap reason="inaudible" extent="1 sec"/> <unclear>to the ingredients i was</unclear> talking about reminding about earlier so <trunc>ne</trunc> so the next question is <pause dur="0.4"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> what is the null hypothesis <pause dur="0.6"/> well the question <pause dur="1.3"/> here we are just <gap reason="inaudible" extent="1 sec"/> at the top there the question is whether <gap reason="inaudible" extent="1 sec"/> there any evidence that L gets longer or shorter so the natural null hypothesis now <pause dur="0.3"/> is that # L on average stays the same <pause dur="1.0"/> so so <pause dur="0.2"/> # so that means the mean of X is zero <pause dur="0.7"/> which is mu isn't it so the null hypothesis <pause dur="0.4"/> is that mu <pause dur="0.3"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> is zero <pause dur="1.6"/> so that's the null hypothesis of a T-test <pause dur="1.2"/> and # <pause dur="1.0"/> next ingredient test statistic so what's a test statistic <pause dur="0.3"/> statistic are we going to take well we we need to <pause dur="0.2"/> # <pause dur="0.2"/> we need to analyse the data now <pause dur="0.3"/> so we need to <pause dur="0.7"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> #

work out the <pause dur="0.9"/> sample mean <pause dur="0.5"/> in order to <pause dur="0.2"/> <gap reason="inaudible" extent="1 sec"/> the T distribution and # <pause dur="1.0"/> and according to my <pause dur="1.1"/> calculations the sample mean is one-point-nine-six <pause dur="2.2"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> that's zero over there by the way isn't it so <pause dur="0.3"/> so clearly <pause dur="0.3"/> the distribution tends to be pushed over a bit to the right so the mean is <pause dur="0.2"/> positive one-point-nine-six <pause dur="0.9"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> and we also need the standard deviation <pause dur="0.5"/> sample standard deviation for the T <pause dur="0.5"/> statistic so <pause dur="1.7"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> according to my calculator <pause dur="0.7"/> this was seven-point-four-zero and of course <kinesic desc="writes on transparency" iterated="y" dur="2"/> as i've <trunc>alrea</trunc> already said N is twenty-five <pause dur="0.7"/> so we're all set now to form the <pause dur="0.9"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> T statistic <pause dur="1.7"/> for the the the the the T random variable which tells us about mean to a normal distribution so <pause dur="0.2"/> remember what that is so so that's just <pause dur="1.0"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> we have temporarily to go to capital letters now 'cause i'm talking about the

random <pause dur="0.5"/> distribution of these things so <pause dur="0.2"/> remember it's it's the the the T statistic standard normal random variable on the top so it's # <pause dur="0.2"/> it's the the the sample mean X-bar minus its mean <pause dur="0.6"/> but <pause dur="0.3"/> we're constructing a test statistic now so we do that under the assumption of the null hypothesis <pause dur="0.2"/> and the mean is zero <pause dur="0.3"/> so we don't have to divide by <pause dur="0.5"/> we don't have to subtract anything <pause dur="0.4"/> 'cause it's zero <pause dur="1.2"/> and then we <pause dur="0.2"/> # <trunc>the</trunc> then we scale that by dividing by the standard deviation <pause dur="0.7"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> so we divide by the standard deviation <pause dur="0.8"/> but the standard deviation of X-bar of course is X over square root of X <kinesic desc="writes on transparency" iterated="y" dur="2"/> so that's just the same thing as <pause dur="0.5"/> sticking an N <pause dur="0.6"/> up there <pause dur="1.1"/> and # <pause dur="0.9"/> and # <pause dur="0.3"/> <trunc>le</trunc> let's get the numbers in so that one-point-nine-six <pause dur="0.9"/><kinesic desc="writes on transparency" iterated="y" dur="9"/> times five <pause dur="0.5"/> square root of twenty-five <pause dur="0.8"/> divide by seven-point-four-zero <pause dur="0.2"/> and according to my calculator <pause dur="0.5"/> that gives us one-point-three-two <pause dur="0.9"/>

okay so we've got a <pause dur="0.2"/> a value of the T <pause dur="0.8"/> random variable now which is one-point-<pause dur="0.6"/>three-two so <pause dur="0.5"/> so this is an intermediate set now we're getting a test statistic so what are we going to take as the test statistic <pause dur="0.6"/> well # <kinesic desc="indicates point on transparency" iterated="n"/> this this quantity <pause dur="0.2"/> intuitively measures the difference <pause dur="0.3"/> between X-bar and what we'd expect namely zero <pause dur="1.0"/> and since the question <pause dur="0.5"/> we're asking ourselves <pause dur="0.5"/> here we are still at the top there the question we're asking ourselves is whether there any evidence that it gets either bigger or smaller <pause dur="0.5"/> we put mod bars around this thing because it's deviations in either direction <pause dur="0.3"/> which are equally <pause dur="0.3"/> interesting <pause dur="0.2"/> # <trunc>intere</trunc> equally interesting so the test statistic then is just the absolute value <pause dur="0.3"/> of this # T statistic <pause dur="1.3"/> so we now go to the <pause dur="0.2"/> calculating the <pause dur="0.5"/> P-value now the third ingredient in this # <pause dur="0.6"/> whole procedure so it's a <pause dur="0.4"/><kinesic desc="writes on transparency" iterated="y" dur="4"/> <trunc>l</trunc> let me do that little diagram over here so here's a distribution <pause dur="1.2"/> of T <pause dur="1.7"/> okay so

this is T <pause dur="0.4"/> on how many <trunc>der</trunc> degrees of freedom <pause dur="0.2"/> # twenty-five observations so it's twenty-four degrees of freedom you remember <pause dur="1.7"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> it's the N-minus-one because of the chi-squared <pause dur="0.2"/> argument behind <pause dur="0.3"/> all this so so there's a distribution <pause dur="0.8"/><kinesic desc="writes on transparency" iterated="y" dur="3"/> of # T or twenty-four degrees of freedom centred on zero <pause dur="0.7"/> that's the zero and then we we look at the value we've got <pause dur="0.3"/> which is up here somewhere <kinesic desc="writes on transparency" iterated="y" dur="2"/> one-point-three-<pause dur="0.5"/>two <pause dur="2.0"/> okay we need some <gap reason="inaudible" extent="1 sec"/> for this this is five really isn't it <pause dur="0.4"/> okay so so so what what <trunc>w</trunc> what we the P-value then is the probability <pause dur="0.6"/> of our test statistic greater than or equal to the one we've got <pause dur="1.0"/> and so i'll put in red now <pause dur="0.6"/><kinesic desc="writes on transparency" iterated="y" dur="5"/> that's going to be <pause dur="1.2"/> that area there <pause dur="0.9"/> probability of getting a T-value bigger than one-point-three-two <pause dur="0.2"/>

but we're going to put mod bars around the test statistic <pause dur="0.3"/> because deviations in the negative direction <pause dur="0.2"/> are also equally interesting <pause dur="0.6"/> so <pause dur="0.5"/> so i i put this area down here as well <pause dur="0.3"/><kinesic desc="writes on transparency" iterated="y" dur="4"/> which is minus-one-point-three-<pause dur="0.5"/>two <pause dur="1.3"/> writers of <pause dur="0.6"/> elementary textbooks like talking about two-tails and one-tails and so on <pause dur="0.2"/> this is a two-tailed test <pause dur="0.4"/> because we're interested in deviations in <pause dur="0.4"/> both directions so so we we <pause dur="0.3"/> we work out <pause dur="0.6"/> we work out the # P-value <pause dur="3.2"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> which is the sum of these two red things <pause dur="0.4"/> and this is where you have to go to your statistical tables <pause dur="0.5"/> or go to S-plus <pause dur="0.5"/> if you have S-plus switched on in your desk which i do <pause dur="0.5"/> and so i <trunc>ju</trunc> i just get S-plus to work out for me the tail area <pause dur="0.5"/> the quantity the # <pause dur="1.1"/> cumulative <trunc>distri</trunc> distribution from <pause dur="0.3"/> T and # <pause dur="0.4"/> and i make that <kinesic desc="writes on transparency" iterated="y" dur="4"/>nought-point-one-eight-six <pause dur="0.7"/> so this is about <pause dur="1.6"/> nineteen per cent <pause dur="0.8"/>

okay so the P-value <pause dur="0.8"/> is # <pause dur="0.5"/> nineteen per cent <pause dur="0.6"/> so what does that tell us <pause dur="0.5"/> general discussion over now what we're what we're trying to spot is if the P-value is small <pause dur="0.6"/> and this is this probability very small and that's a measure of how much of a fluke it is to get this <pause dur="0.4"/> value of T well it's not really is it <pause dur="0.5"/> it's sum of your chance nineteen per cent <pause dur="0.5"/> is # <pause dur="0.6"/> not really very unlikely at all <pause dur="0.2"/> so # so by any usual <pause dur="0.9"/> canons of significance testing whatever significance level you've used five per cent or one per cent or whatever <pause dur="0.2"/> you you wouldn't <pause dur="0.5"/> declare this to be significant so the conclusion is <pause dur="0.2"/> although the data show <pause dur="0.2"/> # the <trunc>g</trunc> data give a bit of a hint of a positive difference <pause dur="0.9"/> which means that face lengths tend to get smaller as you go from first to second son <pause dur="0.3"/> it's not significant <pause dur="0.2"/> the difference is not big enough <pause dur="0.3"/> to be clearly evident <pause dur="0.4"/> from such a small sample size <pause dur="1.2"/> so that's the <pause dur="0.5"/> that's the first <pause dur="0.7"/> approach to these <pause dur="0.3"/> data with the T <pause dur="0.2"/> T-test now let let me just mention another one 'cause it's <pause dur="0.2"/> i

wanted to get over the idea that the choice of a test statistic <pause dur="0.3"/> is somewhat arbitrary <pause dur="1.3"/> <kinesic desc="writes on transparency" iterated="y" dur="4"/> and here's another <pause dur="2.2"/> here's another <pause dur="0.6"/> way of looking at the data which is called the sign test <pause dur="4.7"/> so it's another way of looking at the data and the sign test <pause dur="0.7"/> sign test is much simpler than the T-test <pause dur="1.0"/> the sign test just <pause dur="0.5"/> just so it goes back to the original question <pause dur="0.2"/> here we are still up there does L tend to get smaller or longer <pause dur="1.2"/> the sign test simply looks at the sign of the difference <pause dur="0.7"/> so if this is positive <pause dur="0.8"/> the <pause dur="0.2"/> L is getting shorter isn't it and if this thing is <pause dur="0.2"/> negative <pause dur="0.2"/> then L L is getting bigger <pause dur="0.7"/> so a natural null hypothesis now <pause dur="1.5"/> just simply looking at whether <pause dur="0.7"/> you get bigger or less for a value of L <pause dur="0.4"/> the <pause dur="1.7"/> natural probability <pause dur="0.5"/> # to look at now <pause dur="1.3"/> is let's look at the probability that X <pause dur="0.8"/> is positive <pause dur="0.3"/> okay so that means that the <pause dur="0.4"/> L is actually getting less <pause dur="1.3"/> and <trunc>al</trunc> also we could look at the probability that X is negative so that

means L gets bigger <pause dur="0.8"/> and if there's nothing going on here if on average there's no trend over time <pause dur="0.3"/> you'd expect to see as often L getting <pause dur="0.4"/> bigger as it getting smaller you see so the <pause dur="0.6"/> another way of formulating the null hypothesis <pause dur="0.2"/> is that the probability that X is positive <pause dur="0.2"/> should be the same <kinesic desc="writes on transparency" iterated="y" dur="2"/> as the probability <pause dur="0.6"/> that X is negative <pause dur="5.1"/> and # <pause dur="1.2"/> if we have a normal distribution with mean zero then truly that satisfies that requirement <pause dur="0.2"/> but notice that <kinesic desc="indicates point on transparency" iterated="n"/> this is a much more <pause dur="0.3"/> this is a much weaker null hypothesis than <kinesic desc="indicates point on transparency" iterated="n"/> this one here the <trunc>f</trunc> the T-test puts all this baggage on it like assuming normality and things and working out standard deviation and things <pause dur="0.4"/><kinesic desc="indicates point on transparency" iterated="n"/> this is a much more crude this is a much more primitive <pause dur="0.2"/> way of looking at the data just counting up how many positives and negatives <pause dur="0.3"/> and

testing <pause dur="0.3"/> this hypothesis <pause dur="1.2"/> so let's see <pause dur="2.2"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> let's see how many positives and negatives we've got so <pause dur="0.2"/> so can i <pause dur="1.0"/><kinesic desc="writes on transparency" iterated="y" dur="12"/> F for frequency there so let me put F-<pause dur="0.3"/>sub-plus <pause dur="1.9"/> is the number of positive <pause dur="0.6"/> Xs <pause dur="4.3"/> and let let # <kinesic desc="writes on transparency" iterated="y" dur="6"/> F-<pause dur="0.7"/>minus <pause dur="1.6"/> be the number of negative <pause dur="0.8"/> Xs <pause dur="1.7"/> okay <pause dur="1.9"/> and # <pause dur="0.4"/> # the probability of positive and negative should be the same according to the <trunc>nu</trunc> <trunc>hyer</trunc> null hypothesis <pause dur="0.2"/> so on average <pause dur="0.6"/> F-plus should be the same <pause dur="0.3"/> as F-minus it <trunc>w</trunc> <trunc>w</trunc> never be exactly the same of course but on average <pause dur="0.4"/> # the the difference between these things would be zero and one will be the same <gap reason="inaudible" extent="1 sec"/> <pause dur="0.6"/> as the other so so that's the null hypothesis this is what we'll look at now <pause dur="0.3"/> so we <trunc>d</trunc> we don't we don't have Xs any more we just replace them by the frequencies <pause dur="0.8"/> so <kinesic desc="writes on transparency" iterated="y" dur="6"/> so what is what's going to be a suitable test statistic now <pause dur="1.9"/> a test statistic

is a function of the data but now we're only looking at how many positives and negatives there are <pause dur="0.8"/> so it's really a function of F-plus and F-minus i'll be looking at now <pause dur="0.6"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> what is a suitable <pause dur="0.5"/> <trunc>te</trunc> test statistic what is it about these things that <pause dur="0.2"/> would lead us to doubt the null hypothesis is true <pause dur="0.2"/> well it's the difference isn't it intuitively so if we just take <kinesic desc="writes on transparency" iterated="y" dur="2"/> F-plus <pause dur="0.5"/> minus F-difference <pause dur="0.4"/> absolute bars around that <pause dur="1.3"/> then that's a good test statistic <pause dur="0.4"/> the more different <pause dur="0.2"/><kinesic desc="indicates point on transparency" iterated="n"/> these two frequencies are <pause dur="0.2"/> the more evidence we have <pause dur="0.6"/> that # <pause dur="0.5"/> X is more likely to be positive or negative <pause dur="1.3"/> <gap reason="inaudible" extent="1 sec"/> okay so this is the sign test now a completely different way of looking at data <pause dur="0.2"/><kinesic desc="indicates point on transparency" iterated="n"/> this is now the <pause dur="0.5"/> test statistic so how do we work out the P-value well let's get the data <pause dur="2.6"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> let's look let's let's look at the <pause dur="0.4"/> and i'm really very fed up we haven't got the <pause dur="2.0"/><kinesic desc="changes transparency" iterated="y" dur="2"/>

anyway there's the data again <pause dur="0.3"/> how many positive and how many negative <gap reason="inaudible" extent="1 sec"/> we've got four and a seven <pause dur="0.3"/> in the first two cells haven't we so we've got seven <pause dur="0.4"/> we've we've got eleven negatives <pause dur="0.5"/> how many positives have we got <pause dur="0.2"/> all the rest are positive except for zero <pause dur="0.7"/> we'll leave the zero out 'cause that doesn't tell us anything about which way we're going <pause dur="0.7"/> so we've got eleven <pause dur="1.9"/><kinesic desc="changes transparency" iterated="y" dur="1"/> we've got eleven negatives <pause dur="2.6"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> and the number of positives is the rest of them <pause dur="0.5"/> which is fourteen 'cause the fifteen to twenty-five <gap reason="inaudible" extent="1 sec"/> <pause dur="0.2"/> but we'll leave the zero out so the number of negatives <pause dur="0.9"/><kinesic desc="writes on transparency" iterated="y" dur="3"/> is # <pause dur="0.6"/> thirteen <pause dur="5.4"/> i'm so sorry number of positives is thirteen see that so we've got <pause dur="0.5"/> # <trunc>th</trunc> three there <pause dur="0.5"/> five there four there and one there and <gap reason="inaudible" extent="1 sec"/> <pause dur="0.2"/> so we've got a total <pause dur="1.8"/> okay so the P-value then what is the P-value then <pause dur="0.7"/> this

is the probability <pause dur="0.7"/> that by chance <pause dur="2.8"/><kinesic desc="writes on transparency" iterated="y" dur="10"/> this is the probability that by chance <pause dur="1.1"/> F-plus <pause dur="0.7"/> minus F-minus <pause dur="0.8"/> is going to be <pause dur="0.6"/> greater than or equal to what we have observed and what we have observed <pause dur="0.5"/> is two isn't it there's only a difference only two between these two <pause dur="1.0"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> these two things <pause dur="1.5"/> so this probability <pause dur="0.2"/> <gap reason="inaudible" extent="1 sec"/> going to be pretty big now <pause dur="0.3"/> if you just think about it <pause dur="2.8"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> so so <pause dur="0.5"/> how how are we going to get this probability well easy way to think about it <pause dur="0.2"/> is that we've got # fourteen positives and eleven negatives so so think of tossing a coin twenty-four times <pause dur="0.9"/> and we get thirteen heads <pause dur="0.2"/> and eleven tails that's essentially the same <pause dur="0.7"/> problem isn't it <pause dur="0.3"/> recast in terms of coin tossing <pause dur="0.5"/> so so <kinesic desc="indicates point on transparency" iterated="n"/> this is the probability of the number of heads minus the number of tails in twenty-four tosses <pause dur="0.6"/> is # <pause dur="0.3"/> greater than or equal <pause dur="0.2"/> to two <pause dur="0.5"/> well two is a pretty small number so the easiest thing is to work out the

opposite of that <pause dur="1.4"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> so that's one minus the chance <pause dur="0.4"/> that F-plus minus F-minus is strictly less than two <pause dur="0.7"/> now F-plus minus F-minus <pause dur="0.7"/> is always going to be an even number isn't it <pause dur="1.7"/> if you think about it 'cause if you put the number of heads up by one the number of tails goes down by one <pause dur="0.4"/> so this is so so the only exception the only <pause dur="0.5"/> possible sample result which doesn't satisfy this inequality here <pause dur="0.4"/> is if the two <gap reason="inaudible" extent="1 sec"/> are exactly equal <pause dur="1.1"/><kinesic desc="writes on transparency" iterated="y" dur="6"/> so it's one minus the probability <pause dur="0.4"/> that F-plus <pause dur="0.6"/> equals F-minus and therefore equals twelve <pause dur="3.1"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> and we can <pause dur="0.5"/> we know what that is don't we <pause dur="0.2"/> what's the so so that's the chance that if you toss a coin twenty-four times <pause dur="0.2"/> you will get exactly twelve heads <pause dur="0.8"/> and twelve tails in exact balance and we know what that is don't we that's <kinesic desc="writes on transparency" iterated="y" dur="10"/> twenty-four <pause dur="1.4"/> C twelve <pause dur="1.7"/> times one over two <pause dur="1.4"/> to the power of twenty-four <pause dur="1.7"/>

okay so it it follows down <pause dur="0.4"/> to a binomial <pause dur="0.3"/> # probability as it must do 'cause we're <shift feature="voice" new="laugh"/>talking about binomial distributions <shift feature="voice" new="normal"/><pause dur="0.4"/> basically and if you work out that probability <pause dur="1.1"/> this is about point-one-six <pause dur="1.0"/> this probability if you work it out on your calculator so one minus that <pause dur="0.3"/><kinesic desc="writes on transparency" iterated="y" dur="2"/> is nought-point-eight-<pause dur="0.2"/>three-<pause dur="0.6"/>nine <pause dur="3.7"/> so that's the P-value <pause dur="0.7"/> so that's the P-value calculated <pause dur="0.4"/> by this other way <pause dur="0.8"/> of # <pause dur="0.8"/> looking at it so the main point to note <pause dur="1.0"/> is we've we've got exactly the same data <pause dur="1.0"/> both tests are doing something which is very sensible <pause dur="0.6"/> first test looking at the mean of the distribution <pause dur="0.5"/> in the classical way <pause dur="0.3"/> the second test is looking at how many positives and negatives there are both are <pause dur="0.4"/> perfectly plausible <pause dur="0.3"/> and yet they give different answers <pause dur="0.7"/> and they give different answers <pause dur="0.3"/> because they're using different test statistics <pause dur="0.4"/><kinesic desc="indicates point on transparency" iterated="n"/> this one's <trunc>look</trunc> looking at the T thing <pause dur="0.2"/> with the baggage of standard deviation <pause dur="0.2"/><kinesic desc="indicates point on transparency" iterated="n"/> this test statistic is just

looking at the difference between <pause dur="0.2"/> two <pause dur="0.2"/> integers <pause dur="1.2"/> so # <pause dur="0.7"/> so the so the moral of this <pause dur="0.4"/> example is <trunc>i</trunc> is is is the point <pause dur="0.3"/> that significance levels are not everything <pause dur="0.9"/> significance levels <pause dur="1.1"/> isn't really the full story we want we want to somehow get the idea here <pause dur="0.2"/> that the second test <pause dur="0.2"/> is actually worse than the first test we want to <pause dur="0.3"/> somehow take into get get in <pause dur="0.6"/> get get to grips with the fact that the first test <pause dur="0.2"/> looks at the data in much more detail <pause dur="0.2"/> than the second test we're making <pause dur="0.2"/> lots of assumptions in the first test like normality and things like that <pause dur="0.5"/> so we would expect to get a premium <pause dur="0.3"/> for doing the first method <pause dur="0.3"/> rather than the second method so in <trunc>se</trunc> there must be a sense <pause dur="0.2"/> in which the P-value for the first way method <pause dur="0.2"/> is better <pause dur="0.2"/> than the P-value <pause dur="0.2"/> for the second method <pause dur="0.6"/> and the only way we can get to grips with that <pause dur="0.9"/> is # by looking at the error rate the other way round <pause dur="0.9"/> we're not just interested <pause dur="0.3"/> in how often we reject the null hypothesis when it's true <pause dur="0.2"/> we've also got

to worry about <pause dur="0.2"/> the sensitivity of the test we've got to worry about the probability <pause dur="0.3"/> that if <trunc>ne</trunc> if H-nought actually isn't true <pause dur="0.8"/> we've got to worry about the the probability that we'd actually detect that <pause dur="0.8"/> and we like a test <pause dur="0.4"/> which has a a a good sensitivity which <trunc>i</trunc> # going to be more likely to detect <pause dur="0.8"/> the # lack of truth of H-nought rather than just simply detect <pause dur="0.2"/> its truth <pause dur="0.7"/> and that's how the rest of the theory of <pause dur="0.2"/> significance tests goes which is what i'm going to talk about <pause dur="0.4"/> # <pause dur="0.5"/> just <trunc>be</trunc> beginning now and then # that's the main topic for next week really and what i'm going to be talking about from now on <pause dur="0.2"/> is much later work actually historically <pause dur="0.4"/> than P-value P-values goes back about a hundred years <pause dur="0.7"/> the idea of looking at these types of error <pause dur="0.2"/> goes back to the late nineteen-thirties so it's a a more recent <pause dur="0.7"/> # <pause dur="0.7"/> innovation <pause dur="0.9"/> so # finally today then <pause dur="0.2"/> let me define <pause dur="2.7"/> let me define # <pause dur="0.2"/> two more things i said i was going to have two more concepts to define <pause dur="0.4"/> here they are <pause dur="1.1"/> so the first

thing to define <pause dur="0.7"/> is # <pause dur="0.6"/> <trunc>i</trunc> is <trunc>i</trunc> is what's called alternative hypothesis <pause dur="1.2"/> and # <pause dur="0.4"/> you see why now i have H-nought 'cause i'm now going to have H-one <pause dur="2.2"/><kinesic desc="writes on transparency" iterated="y" dur="16"/> H-one <pause dur="2.4"/> is the <pause dur="11.2"/> H-one <pause dur="0.2"/> stands for the alternative <pause dur="0.3"/> hypothesis and H-one <pause dur="0.5"/> H-one is simply the complement of H-nought <pause dur="7.1"/><kinesic desc="writes on transparency" iterated="y" dur="6"/> in in the sets that <trunc>i</trunc> in the set <gap reason="inaudible" extent="1 sec"/><pause dur="0.2"/> sets so <pause dur="0.2"/> if H-nought is true <pause dur="1.1"/> H-one is its opposite so H-one must be false <pause dur="1.0"/> and similarly is H-nought is false <pause dur="0.6"/> then the opposite of it is true <pause dur="0.9"/> so H H-one is just the opposite of H-nought <pause dur="1.0"/> so in set theory sense it's the complement <pause dur="3.8"/> so that's the first definition the alternative hypothesis so we've got two hypotheses going on now <pause dur="0.8"/> either H-nought or H-one and we're trying to decide <pause dur="0.8"/> which is which <pause dur="0.4"/> and the # <pause dur="1.2"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> the second quantity is the error rate the other way around so everybody uses a simple beta now instead of alpha <pause dur="1.1"/> so so this <kinesic desc="writes on transparency" iterated="y" dur="8"/>

is now the probability <pause dur="1.6"/> that we're going to <pause dur="4.4"/> accept the null hypothesis in other words believe the null hypothesis is true <pause dur="0.5"/> but now calculate it <kinesic desc="writes on transparency" iterated="y" dur="1"/> on the <pause dur="0.9"/> assumption <pause dur="0.2"/> that H-one is the <pause dur="0.2"/> the the truth <pause dur="1.1"/> so this is the error rate the other way round this is the chance that <pause dur="0.4"/> H H-one actually is true in other words H-one <trunc>r</trunc> H-nought is really false <pause dur="0.4"/> but we conclude that it's true so we made a mistake of course <pause dur="0.5"/> so this <pause dur="0.7"/><kinesic desc="writes on transparency" iterated="y" dur="1"/> i defined alpha to be the type one error rate a moment ago this is now the type two <pause dur="1.0"/><kinesic desc="writes on transparency" iterated="y" dur="9"/> error rate <pause dur="10.1"/> so it's the probability of getting things wrong <pause dur="0.6"/> # the other way round and the theory of significance tests <pause dur="0.3"/> now goes down the route <pause dur="0.4"/> of looking at values of alpha <pause dur="0.2"/> and beta <pause dur="0.6"/> and now not only do we want to control alpha which i've talked about so far <pause dur="0.2"/> we also want to choose significance tests <pause dur="0.2"/> I-E choose test statistics <pause dur="0.3"/> for which beta <pause dur="0.2"/> is also small <pause dur="1.0"/> and that's what i'm going to talk about <pause dur="0.6"/> # next Monday <pause dur="0.6"/> and Thursday

</u></body>

</text></TEI.2>