Skip to main content Skip to navigation


<?xml version="1.0"?>

<!DOCTYPE TEI.2 SYSTEM "base.dtd">




<title>Survival Analysis</title></titleStmt>

<publicationStmt><distributor>BASE and Oxford Text Archive</distributor>


<availability><p>The British Academic Spoken English (BASE) corpus was developed at the

Universities of Warwick and Reading, under the directorship of Hilary Nesi

(Centre for English Language Teacher Education, Warwick) and Paul Thompson

(Department of Applied Linguistics, Reading), with funding from BALEAP,

EURALEX, the British Academy and the Arts and Humanities Research Board. The

original recordings are held at the Universities of Warwick and Reading, and

at the Oxford Text Archive and may be consulted by bona fide researchers

upon written application to any of the holding bodies.

The BASE corpus is freely available to researchers who agree to the

following conditions:</p>

<p>1. The recordings and transcriptions should not be modified in any


<p>2. The recordings and transcriptions should be used for research purposes

only; they should not be reproduced in teaching materials</p>

<p>3. The recordings and transcriptions should not be reproduced in full for

a wider audience/readership, although researchers are free to quote short

passages of text (up to 200 running words from any given speech event)</p>

<p>4. The corpus developers should be informed of all presentations or

publications arising from analysis of the corpus</p><p>

Researchers should acknowledge their use of the corpus using the following

form of words:

The recordings and transcriptions used in this study come from the British

Academic Spoken English (BASE) corpus, which was developed at the

Universities of Warwick and Reading under the directorship of Hilary Nesi

(Warwick) and Paul Thompson (Reading). Corpus development was assisted by

funding from the Universities of Warwick and Reading, BALEAP, EURALEX, the

British Academy and the Arts and Humanities Research Board. </p></availability>




<recording dur="00:40:58" n="4433">


<respStmt><name>BASE team</name>



<langUsage><language id="en">English</language>



<person id="nf0951" role="main speaker" n="n" sex="f"><p>nf0951, main speaker, non-student, female</p></person>

<person id="sf0952" role="participant" n="s" sex="f"><p>sf0952, participant, student, female</p></person>

<person id="sf0953" role="participant" n="s" sex="f"><p>sf0953, participant, student, female</p></person>

<person id="sf0954" role="participant" n="s" sex="f"><p>sf0954, participant, student, female</p></person>

<personGrp id="ss" role="audience" size="m"><p>ss, audience, medium group </p></personGrp>

<personGrp id="sl" role="all" size="m"><p>sl, all, medium group</p></personGrp>

<personGrp role="speakers" size="6"><p>number of speakers: 6</p></personGrp>





<item n="speechevent">Lecture</item>

<item n="acaddept">Statistics</item>

<item n="acaddiv">ps</item>

<item n="partlevel">UG3/PG</item>

<item n="module">Medical Statistics</item>




<u who="nf0951"> to start with <pause dur="0.2"/> # <pause dur="1.0"/> i gather that Newton-Raphson having happened at A-level was rather a long time ago <pause dur="0.6"/> and you had a bit of fun with it on Friday <pause dur="1.9"/> so i thought i'd just give you a <pause dur="0.2"/> quick reminder <pause dur="1.2"/> <kinesic desc="writes on board" iterated="y" dur="4"/> the idea is we've got some sort of function <pause dur="0.2"/> that crosses zero <pause dur="1.3"/> what we want to know is the point <pause dur="1.4"/> X-star <pause dur="0.4"/> such that <pause dur="1.2"/> F-of-X-star equals zero <pause dur="0.4"/> you shouldn't need to write this down by the way <pause dur="1.0"/> # <pause dur="1.3"/> and there are all sorts of ways we could find that what Newton-Raphson depends on is saying <pause dur="0.7"/> well <pause dur="0.4"/> <kinesic desc="writes on board" iterated="y" dur="1"/> we'll take a guess <pause dur="0.3"/> at where we're starting so we'll call that guess <pause dur="1.2"/> <kinesic desc="writes on board" iterated="y" dur="1"/> X-zero <pause dur="1.0"/> and we'll evaluate <pause dur="1.9"/> <kinesic desc="writes on board" iterated="y" dur="2"/> F-of-X-zero <pause dur="1.9"/> and then we have to decide what to do <pause dur="0.8"/> having seen what size it is <pause dur="0.4"/> and Newton-Raphson is based on the principle that what we do is <kinesic desc="writes on board" iterated="y" dur="2"/> look at the tangent at that point <pause dur="1.7"/> and we follow the tangent down <pause dur="0.9"/>

<kinesic desc="writes on board" iterated="y" dur="1"/> and that will bring us closer to <kinesic desc="indicates point on board" iterated="n"/> this root <pause dur="0.9"/> so we follow the tangent down to <kinesic desc="writes on board" iterated="y" dur="1"/> X-one <pause dur="2.8"/> well <pause dur="0.3"/> <vocal desc="clears throat" iterated="n"/><pause dur="1.1"/> one way you can think about <pause dur="2.0"/> <kinesic desc="writes on board" iterated="y" dur="5"/> that obviously the tangent <pause dur="0.7"/> is <pause dur="0.9"/> F-of-X-nought which is the gradient <pause dur="0.5"/> and what is the gradient well the gradient is <pause dur="1.2"/><kinesic desc="writes on board" iterated="y" dur="2"/> F-of-X-nought <pause dur="1.2"/> minus zero <pause dur="1.8"/> <kinesic desc="writes on board" iterated="y" dur="5"/> divided by <pause dur="2.0"/> if we have mm <pause dur="0.3"/> i don't know if we really need an origin let's put an origin somewhere <pause dur="0.4"/> but it's divided by <pause dur="0.9"/> <kinesic desc="writes on board" iterated="y" dur="1"/> X-nought minus X-one <pause dur="1.6"/> so one way of thinking of Newton-Raphson is precisely this that we're taking a triangle <pause dur="0.7"/><kinesic desc="writes on board" iterated="y" dur="4"/> we then do the same thing we come here <pause dur="1.2"/> follow the gradient up and we're <pause dur="0.2"/> very near a <pause dur="0.2"/>

nearly at the spot <pause dur="2.3"/> so conceptually that's what Newton-Raphson's doing if you land up forgetting it <pause dur="0.8"/><kinesic desc="indicates point on board" iterated="n"/> that's one way of remembering it if <pause dur="0.4"/> if you like the geometrical way of remembering it <pause dur="0.6"/> an alternative is the thing we were <trunc>lo</trunc> talking about a lot in the asymptotics <pause dur="0.3"/> which was series expansions <pause dur="0.8"/> so <pause dur="0.3"/><kinesic desc="writes on board" iterated="y" dur="1"/> what we're interested in <pause dur="1.0"/> is <kinesic desc="indicates point on board" iterated="n"/> this point where F-of-X equals zero <pause dur="0.6"/> well <kinesic desc="writes on board" iterated="y" dur="17"/> let's do an expansion that's approximately equal to <pause dur="1.1"/> F-of-<pause dur="0.4"/>X-nought <pause dur="0.5"/> plus <pause dur="2.2"/> X-star <pause dur="0.2"/> minus <pause dur="2.2"/> # which way around is it X-nought minus <pause dur="0.3"/> X-star <pause dur="2.7"/><kinesic desc="writes on board" iterated="y" dur="4"/> and then the first derivative <pause dur="7.9"/> and <pause dur="2.1"/> in either case what we do is we basically just solve <pause dur="0.8"/> those equations so if we look at this equation <pause dur="1.1"/> what we're saying is that <pause dur="1.4"/> <kinesic desc="writes on board" iterated="y" dur="5"/> X-star <pause dur="0.2"/> minus <pause dur="1.0"/> X-nought <pause dur="2.3"/> so i've changed the sign <pause dur="1.0"/> equals <pause dur="1.5"/><kinesic desc="writes on board" iterated="y" dur="8"/> F-of-<pause dur="0.4"/>X-nought over <pause dur="2.0"/> F-dashed-of-X-nought <pause dur="1.8"/> and then <pause dur="6.3"/> <vocal desc="clears throat" iterated="n"/> <pause dur="0.8"/> we're going to <pause dur="2.7"/> actually i have a feeling <pause dur="0.3"/> you might have to

check me on the signs on this one <pause dur="1.4"/> <kinesic desc="writes on board" iterated="y" dur="3"/> off <pause dur="0.2"/> the top of my head <pause dur="2.6"/> we've got <pause dur="1.6"/> a solution so typically we take <kinesic desc="indicates point on board" iterated="n"/> this as our <pause dur="1.0"/> X-one and then we'd iterate <pause dur="1.7"/> okay that that's the basic principle you can <pause dur="0.5"/> check whether i've memorized which way round <kinesic desc="indicates point on board" iterated="n"/> that goes <pause dur="1.0"/> # <pause dur="2.1"/> by looking at <pause dur="0.3"/><kinesic desc="indicates point on board" iterated="n"/> solving for that one <pause dur="1.1"/> X-one <pause dur="1.9"/> yep <pause dur="1.8"/> the other point is that it's actually much easier to remember Newton-Raphson <kinesic desc="indicates board" iterated="n"/> in a simple form like this write it down <pause dur="0.4"/> and then fill in what the function is in that second exercise <pause dur="1.1"/> rather than trying to write it in too full of generality <pause dur="2.2"/> and <pause dur="0.4"/> in terms of tutorials the next tutorial's going to be <pause dur="0.2"/> a week on Wednesday <pause dur="1.0"/> so you'll get another chance to <pause dur="0.2"/> both look at those exercises <pause dur="0.2"/> or ask about the <pause dur="0.5"/> projects <pause dur="3.0"/> okay so that was my way of being an aside because what <pause dur="0.8"/> we're starting to talk about now until <pause dur="0.4"/> pretty well the end of term <unclear>module</unclear> the tutorials <pause dur="0.7"/> and <pause dur="1.0"/> some lectures on <pause dur="0.4"/> ethics <pause dur="0.5"/> is <pause dur="0.4"/> survival

analysis <pause dur="2.3"/> so i'm just going to take you back to the first lecture <pause dur="1.3"/> where <pause dur="6.6"/><kinesic desc="puts on transparency" iterated="n"/> we had a whole lot of lifetimes of people <pause dur="2.8"/> and the reason for the odd shape was <kinesic desc="indicates point on transparency" iterated="n"/> these were all people who were dead <pause dur="0.5"/> and <kinesic desc="indicates point on transparency" iterated="n"/> these were a mixture of people who are alive <pause dur="0.2"/> and dead <pause dur="0.8"/><kinesic desc="changes transparency" iterated="y" dur="10"/> and i showed you <pause dur="1.5"/> what a survival plot looked like <pause dur="4.6"/> and <pause dur="3.6"/> that's a fairly standard survival plot <pause dur="0.2"/> where we start with <pause dur="0.8"/> everyone being alive <pause dur="1.2"/> and then <pause dur="0.5"/> <vocal desc="clears throat" iterated="n"/> <pause dur="1.3"/> we drop the estimates in a step function <pause dur="1.7"/> so that if we look at <kinesic desc="indicates point on transparency" iterated="n"/> this group <pause dur="0.4"/> those who are wheelchair-bound <pause dur="0.9"/> we see that <pause dur="1.3"/> seventy per cent of them survive <pause dur="0.7"/> to age ten <pause dur="2.3"/> for this cohort <pause dur="0.3"/> i know the group who's studying cerebral palsy <pause dur="0.2"/> # <pause dur="0.4"/> somebody's got a friend <pause dur="0.6"/> who wasn't expected to live past primary <pause dur="0.8"/> school age but you can see from this that even quite severely handicapped people have got a good chance of living beyond primary school age <pause dur="5.6"/> <event desc="takes off transparency" iterated="n"/> right <pause dur="0.5"/> so <pause dur="0.5"/> # <pause dur="0.3"/> right <pause dur="1.7"/><event desc="puts away projector" iterated="y" dur="5"/> i'm actually going to <pause dur="3.3"/> put this down now <pause dur="0.6"/> you might

like to think about what the crucial elements of survival analysis are that make it a different topic i'm just going to move the screen <pause dur="1.5"/><event desc="puts away screen" iterated="n"/> which <pause dur="0.6"/> justifies <pause dur="2.3"/> having a separate section about it <pause dur="16.9"/> okay so <pause dur="1.0"/> cerebral palsy's rather a large dataset it's difficult to draw some examples so what i'm going to do is give another couple of examples <pause dur="0.6"/> of the data and then i'm going to go into <pause dur="0.7"/> a discussion of the crucial definitions and the actual definitions themselves <pause dur="0.8"/> <kinesic desc="writes on board" iterated="y" dur="9"/> so the topic's called survival analysis <pause dur="8.8"/> and <pause dur="0.6"/> most people tend to think of lifetimes are <unclear>for <trunc>d</trunc></unclear> of human lifetimes <pause dur="1.4"/> with a <pause dur="0.4"/> with the word survival you do tend to think of death <pause dur="0.7"/> it's also called failure time analysis because it's used in economics <pause dur="0.2"/> where <pause dur="0.6"/> you stress test components see how long they last <pause dur="0.4"/> it's used in economics how long are people unemployed or <pause dur="0.2"/> how long are <pause dur="0.6"/> do companies survive and is it different for <pause dur="0.3"/> greenfield versus brownfield sites <pause dur="1.4"/> it

can even be used for lengths <pause dur="1.8"/> how long a piece of wool can you get <pause dur="0.2"/> before it breaks <pause dur="0.5"/> <vocal desc="clears throat" iterated="n"/><pause dur="1.0"/> if you're spinning it with one or two strands that's going to be different from spinning with multiple strands <pause dur="0.9"/> # <pause dur="1.0"/> and that's actually <pause dur="0.4"/> Cox and <pause dur="0.2"/> Oakes which is one of the books that's mentioned <pause dur="0.5"/> David Cox actually worked in the <pause dur="0.6"/> Wool Research Institute i think it was called <pause dur="0.5"/> precisely on <pause dur="0.3"/> lengths of yarn so that that is actually <pause dur="0.6"/> quite a major application <pause dur="1.2"/> but as i say typically we're going to be thinking in medical examples <pause dur="0.5"/> of people doing something like entering a screening programme entering a trial <pause dur="1.1"/> being born <pause dur="0.4"/> and then we follow them up till an event <pause dur="1.2"/> and <pause dur="0.9"/> <kinesic desc="writes on board" iterated="y" dur="14"/> the reality of what will actually happen <pause dur="0.3"/> is that <pause dur="0.9"/> we've got <pause dur="0.3"/> calendar time so <trunc>im</trunc> <pause dur="0.2"/> got say <pause dur="0.7"/> nineteen-ninety here <pause dur="1.7"/> two-thousand here <pause dur="1.9"/> but not everybody turns up in nineteen-ninety at the beginning so we get people <kinesic desc="writes on board" iterated="y" dur="19"/> coming in <pause dur="0.7"/> maybe dying <pause dur="0.6"/> coming in living <pause dur="1.7"/> coming in dying at some

stage <pause dur="1.0"/> carrying on living <pause dur="1.5"/> # that person's just emigrated to Australia <pause dur="6.3"/> and <pause dur="2.6"/><kinesic desc="writes on board" iterated="y" dur="4"/> we've stopped the study <pause dur="1.4"/> in two-thousand so <pause dur="1.1"/> in calendar time that's the kind of pattern we've got <pause dur="0.9"/> but in terms of what we want to analyse <pause dur="4.6"/><kinesic desc="writes on board" iterated="y" dur="3"/> we'll much more typically just use <pause dur="0.3"/> time from zero up to <pause dur="0.9"/> we'll <kinesic desc="writes on board" iterated="y" dur="1"/> try to draw this reasonably to scale <pause dur="0.5"/> zero up to ten so <kinesic desc="indicates point on board" iterated="n"/> these first two points <pause dur="0.7"/> <kinesic desc="writes on board" iterated="y" dur="27"/> whoops <pause dur="0.7"/> <vocal desc="clears throat" iterated="n"/> <pause dur="2.3"/> are <pause dur="1.5"/> pretty much at the same point <pause dur="0.6"/> but then we start <pause dur="1.9"/> having to think <kinesic desc="indicates point on board" iterated="n"/> about these points coming <pause dur="5.0"/> back to being censored <kinesic desc="indicates point on board" iterated="n"/> that point's <pause dur="3.4"/> if you want to draw yourselves a more accurate picture <pause dur="0.9"/> <shift feature="voice" new="laugh"/># <shift feature="voice" new="normal"/><pause dur="5.3"/> you can so the idea is that <kinesic desc="writes on board" iterated="y" dur="14"/> we can either measure on a scale of calendar time <pause dur="3.3"/> or <pause dur="1.1"/> in some sense an exposed time <pause dur="3.5"/> so if we're thinking of something like

hormone replacement therapy <pause dur="0.3"/> and whether it carries a greater risk of heart attack <pause dur="0.4"/> or stroke <pause dur="0.8"/> we're interested in the length of time <pause dur="0.2"/> women are <pause dur="0.2"/> on hormone replacement therapy <pause dur="1.3"/> <vocal desc="clears throat" iterated="n"/> we may secondarily be interested in the date 'cause it'll change the prescriptions of the components but primarily we're just interested in the length of time <pause dur="2.2"/> another way you may actually get data <pause dur="0.5"/> # <pause dur="1.9"/> is that <pause dur="0.2"/> you don't actually get it in that form <pause dur="0.2"/> <kinesic desc="writes on board" iterated="y" dur="9"/> much more likely <pause dur="0.2"/> in this example from kidney <pause dur="0.5"/> register data <pause dur="4.8"/> it's quite a lot <pause dur="0.4"/> quite likely that instead of actually getting a graph like that somebody will have done the <pause dur="1.3"/> the summary <pause dur="3.1"/> so this is <pause dur="1.1"/> counter-registry type data <pause dur="0.5"/> and the kind of thing you would <kinesic desc="writes on board" iterated="y" dur="15"/> get is <pause dur="1.6"/> year since diagnosis <pause dur="0.9"/> zero to one <pause dur="0.2"/> one to two <pause dur="1.8"/> two to three <pause dur="0.7"/> three to four <pause dur="4.3"/> so

you'd get subdivisions by year <pause dur="1.4"/> and those of you who are going to do actuarial science will find you <pause dur="0.3"/> tend most typically to work <pause dur="0.6"/> in subdivisions of year as opposed to exact times <kinesic desc="indicates board" iterated="n"/> that i've used there <pause dur="1.6"/> <kinesic desc="writes on board" iterated="y" dur="30"/> then you're going to have <pause dur="0.3"/> the number <pause dur="1.3"/> at the start <pause dur="7.1"/> oh a hundred-and-twenty-six <pause dur="1.0"/> the number of deaths in that year <pause dur="3.9"/> was forty-seven <pause dur="2.0"/> and the number <pause dur="0.4"/> what are called <pause dur="0.8"/> lost <pause dur="0.4"/> to follow-up <pause dur="4.1"/> lost to follow-up's meant to be quite a general term <pause dur="0.2"/> because remember in the cerebral palsy case <pause dur="0.9"/> you're going to land up with people who are still alive <pause dur="1.10"/> so you you're losing them in <trunc>tha</trunc> in that sense <pause dur="2.2"/> <kinesic desc="writes on board" iterated="y" dur="47"/> okay so we had nineteen <pause dur="3.6"/> sixty <pause dur="0.2"/> five <pause dur="1.2"/> seventeen <pause dur="4.4"/> thirty-eight <pause dur="1.4"/> two <pause dur="0.4"/> fifteen <pause dur="1.8"/> # <pause dur="2.2"/> twenty-one <pause dur="0.5"/> two <pause dur="1.5"/> nine <pause dur="3.3"/> ten <pause dur="0.7"/> zero <pause dur="0.4"/> six <pause dur="1.7"/> four <pause dur="1.0"/> zero <pause dur="0.3"/> four <pause dur="0.8"/> probably <trunc>sh</trunc> <pause dur="0.2"/> gone up to age <pause dur="1.4"/> to year <pause dur="0.3"/> six <pause dur="12.6"/> and the kinds of questions you typically <pause dur="0.5"/> find people interested

in this <pause dur="0.2"/> are <pause dur="5.4"/> <kinesic desc="writes on board" iterated="y" dur="6"/> well one thing that's very widely used in cancer <pause dur="0.5"/> # is one and five year survival rates so <pause dur="2.4"/> <kinesic desc="writes on board" iterated="y" dur="29"/> what is <pause dur="0.3"/> the <pause dur="0.4"/> let's just say five year <pause dur="1.2"/> survival rate <pause dur="11.1"/> <kinesic desc="writes on board" iterated="y" dur="14"/> you might think about medians <pause dur="0.6"/> what is <pause dur="1.7"/> the median survival <pause dur="10.8"/> and what is the life expectancy <pause dur="12.5"/> <kinesic desc="writes on board" iterated="y" dur="17"/> and take that to be the formal <pause dur="1.5"/> mean <pause dur="5.3"/> okay just so you can focus on the kinds of problems <pause dur="0.7"/> have a quick look at this data and <pause dur="0.5"/> discuss with the person <pause dur="0.3"/> next to you <pause dur="0.6"/> perhaps that first question how are you going to estimate the five year survival rate <pause dur="1.5"/> and can you think of anything <pause dur="0.6"/> that looks obvious but that's going to be wrong <pause dur="2.5"/> and i'll give you about half a minute on that <pause dur="1.4"/> then i might ask somebody to answer the question </u><event desc="discussing task" iterated="y" n="ss" dur="unknown"/><gap reason="break in recording" extent="uncertain"/> <u who="nf0951" trans="pause">

anybody willing to <trunc>o</trunc> <pause dur="0.2"/> volunteer an obviously wrong answer <pause dur="1.7"/> simple answer but that <pause dur="0.2"/> is likely to be wrong <pause dur="0.6"/> for the five year survival rate <pause dur="5.6"/> is ten out of one-twenty-six likely to be a good <trunc>i</trunc> estimate <pause dur="2.1"/> <kinesic desc="shake heads" iterated="n" n="ss"/> okay you agree it isn't a good estimate <pause dur="0.6"/> why <pause dur="0.3"/> broadly speaking </u><pause dur="3.0"/> <u who="sf0952" trans="pause"> <gap reason="inaudible" extent="1 sec"/> don't know why </u><pause dur="0.7"/> <u who="nf0951" trans="pause"> 'cause of all the people you've lost to follow-up <pause dur="1.1"/> so that's basically why <pause dur="0.7"/> all of <kinesic desc="indicates point on board" iterated="n"/> these questions although they're quite <pause dur="0.4"/> reasonable questions <pause dur="0.5"/> that's why they're going to need <pause dur="2.1"/> some sensible <pause dur="0.6"/> # <pause dur="0.9"/> methods that allows for all those people who get lost <pause dur="1.1"/> so <pause dur="0.2"/> in order to define <trunc>s</trunc> survival times <pause dur="1.3"/> there are three critical elements so <pause dur="6.2"/> <kinesic desc="writes on board" iterated="y" dur="22"/> for the definitions <pause dur="2.5"/> of <pause dur="0.6"/> i'm going to carry on calling them survival times <pause dur="0.8"/> every now and again

i'll make reference to these other things that aren't <pause dur="0.6"/> necessarily survival <pause dur="11.6"/> very first thing we need to know for a survival time <pause dur="1.3"/> is <pause dur="1.4"/> <kinesic desc="writes on board" iterated="y" dur="16"/> the start point <pause dur="0.7"/> start point <pause dur="3.9"/> for <pause dur="2.4"/> each individual <pause dur="13.3"/> and the first examples we can think of <pause dur="0.9"/> might make you wonder why we need to discuss this <pause dur="0.6"/> 'cause the kinds of examples you might like to think of are <pause dur="0.3"/> date of birth for cerebral palsy <pause dur="1.1"/> or date of entry to a randomized control trial <pause dur="2.2"/> fairly obvious that that's the date you should use randomized trials <pause dur="0.4"/> accrue people over time <pause dur="0.2"/> just as people come in to <pause dur="0.8"/> cancer registries over time or anything else <pause dur="2.0"/> where it starts <pause dur="0.2"/> <vocal desc="cough" iterated="n"/> to be slightly more complicated is if you think of epilepsy which i've mentioned before <pause dur="1.8"/> and <pause dur="1.3"/> by the time somebody who has epilepsy is randomized into a trial <pause dur="0.4"/> they've got to have shown symptoms of the disease <pause dur="0.9"/> so you might well think <pause dur="0.5"/> shouldn't we <pause dur="0.4"/> start from when they first showed symptoms of the disease <pause dur="0.7"/> rather

than just from <pause dur="0.5"/> entry to the trial <pause dur="1.6"/> the advantage of starting at entry to the trial is it's going to be unbiased because of the randomization mechanism <pause dur="0.7"/> you should have equal lengths of time <pause dur="0.3"/> before in both arms <pause dur="1.4"/> because recall of first events is going to be quite poor <pause dur="0.5"/> and so in fact with epilepsy what you do is you do start from <pause dur="0.3"/> date of randomization <pause dur="0.8"/> you also take into account <pause dur="0.4"/> when the first symptoms were and how bad things have been <pause dur="0.4"/> but as a covariate <pause dur="0.2"/> not as the start point <pause dur="1.1"/> # <pause dur="0.6"/> so if you want to put a <trunc>n</trunc> you know an aside <pause dur="0.6"/><kinesic desc="writes on board" iterated="y" dur="1"/> <trunc>s</trunc> some sort of remarks on that it's not compulsory but <pause dur="0.4"/> things like randomized control trials are quite easy <pause dur="1.4"/> something where it becomes much more critical to define the start time <pause dur="0.4"/> is something like screening for disease <pause dur="0.6"/> there's still quite a big debate about the value of screening <pause dur="0.2"/> for breast cancer <pause dur="1.6"/> # it's been in the media a couple of <trunc>t</trunc> in the last couple of years a fair bit because <pause dur="0.3"/> Scandinavians have said <pause dur="1.1"/> not only is this a waste of money

it actually kills more people than it benefits <pause dur="0.8"/> and the head of the U-K breast screening has said <pause dur="0.5"/> no no no we are wonderful <pause dur="1.4"/> well what's the problem <pause dur="0.4"/> the thing about screening for a disease is the whole point is you want to pick the disease up early <pause dur="0.5"/> before there are symptoms <pause dur="0.9"/> so you can intervene <pause dur="1.5"/> now what that means is if you think about it <pause dur="0.2"/> even if you did nothing <pause dur="0.4"/> the time from first saying somebody's got breast cancer to death <pause dur="1.0"/> is going to be longer in a screening programme than if you wait for symptoms <pause dur="0.5"/> if you want to think of it as a line again <pause dur="0.6"/> <kinesic desc="writes on board" iterated="y" dur="1"/> you think of somebody ambling along <pause dur="0.8"/> and <pause dur="0.6"/> <kinesic desc="writes on board" iterated="y" dur="1"/><kinesic desc="indicates point on board" iterated="n"/> at this point <pause dur="0.3"/> they have symptoms and they go along to see their <pause dur="0.3"/> G-P and <kinesic desc="writes on board" iterated="y" dur="1"/> at <kinesic desc="indicates point on board" iterated="n"/> this point they die <pause dur="1.0"/> and what a screening programme tries to do is to say <pause dur="0.7"/> <kinesic desc="writes on board" iterated="y" dur="2"/>

let's see if we can <pause dur="0.7"/> leap in here <pause dur="0.7"/><kinesic desc="indicates point on board" iterated="n"/> with some kind of tests and pick them up <pause dur="1.0"/> so if you measure from <pause dur="0.5"/> the time of screening <pause dur="2.3"/> <kinesic desc="writes on board" iterated="y" dur="1"/> it's always going to be longer than from symptoms <pause dur="1.7"/> so the mere increase in length of time doesn't tell you anything at all about the benefit of the screening programme <pause dur="1.0"/> fact the only real way you can tell about the benefits of screening programmes is <pause dur="1.7"/> <trunc>i</trunc> well ideally randomized control trials but otherwise <pause dur="0.4"/> you've got to have two populations one screened one not <pause dur="0.6"/> that's what all the debate is about <pause dur="0.2"/> what does the evidence from those kinds of trials show do they show a benefit to screening or not <pause dur="1.9"/> and the other occasion where <pause dur="0.5"/> you would get <pause dur="0.6"/> # <pause dur="0.7"/> slightly <pause dur="1.1"/> have to think carefully about your defined point would be exposure to disease so the <pause dur="0.3"/> case control cohort studies we've talked about <pause dur="1.1"/> if you're thinking of something like asbestosis or <pause dur="0.2"/> even <trunc>s</trunc> <pause dur="0.2"/> exposure to cigarette smoking <pause dur="1.8"/> you want to <pause dur="0.2"/> do it from the

start of the exposure <pause dur="0.5"/> it may well be confounded with age <pause dur="0.3"/> <trunc>i</trunc> you may want to know whether <pause dur="0.4"/> starting to smoke at age ten has a different effect from starting to smoke at age twenty <pause dur="1.5"/> but you need to think about the exposure so if you like briefly <kinesic desc="writes on board" iterated="y" dur="20"/> the the kinds of issues here would be <pause dur="0.6"/> comparing a randomized control trial <pause dur="0.2"/> versus screening <pause dur="1.7"/> and <pause dur="0.6"/> exposures <pause dur="1.0"/> so in <trunc>ec</trunc> exposure <pause dur="0.2"/> to risk factors <pause dur="5.5"/> and it's those latter two that <pause dur="0.3"/> that <pause dur="0.4"/> warn you why this is <pause dur="0.9"/> such an important point <pause dur="1.6"/><kinesic desc="writes on board" iterated="y" dur="1"/> and then the <trunc>thir</trunc> second thing to think about is the <pause dur="0.7"/> <kinesic desc="writes on board" iterated="y" dur="4"/> time scale <pause dur="5.4"/> <kinesic desc="writes on board" iterated="y" dur="10"/> or <pause dur="1.6"/> we might change that to

saying the measurement scale <pause dur="9.5"/> and again that's <pause dur="1.0"/> essentially because if we're thinking about the generality of survival analysis <pause dur="1.4"/> typically when we're thinking in medical terms we are just thinking of <pause dur="0.2"/> days months years that kind of thing <pause dur="0.8"/> we could be thinking in engineering <pause dur="1.2"/> about the load on a spring <pause dur="1.2"/> # that's what you do in stress testing <pause dur="0.4"/> load on a spring load on a bridge load on an aircraft wing to see when the rivets pop out <pause dur="0.9"/> # <pause dur="1.1"/> that sort of thing what kind of impact <pause dur="0.9"/> # concrete can sustain <pause dur="0.8"/> if you're dropping loads on it <pause dur="1.4"/> and as i said things like yarn you might have thickness you might have length before things break down <pause dur="2.0"/> # <pause dur="0.4"/> and so you just <pause dur="0.5"/> need to agree on that <pause dur="0.4"/><kinesic desc="indicates point on board" iterated="n"/> this is also incidentally one point where one of the statistical <pause dur="0.8"/> # groups of models come in <pause dur="0.5"/> is whether you're going to transform the time scale <pause dur="0.7"/> so should you be modelling on actual time scale or on log of time scale <pause dur="4.2"/> so we've got a beginning we've got a time scale <pause dur="1.3"/> clearly the thing we need

is an end <pause dur="0.5"/> so we need <pause dur="1.8"/> <kinesic desc="writes on board" iterated="y" dur="13"/> a well defined <pause dur="4.2"/> unique <pause dur="2.2"/> event <pause dur="5.1"/> death being the most common one that we'll be dealing with <pause dur="3.8"/> but in fact one of my <pause dur="0.2"/> colleagues when i was doing a PhD <pause dur="0.6"/> the # <pause dur="0.6"/> <vocal desc="clears throat" iterated="n"/> <pause dur="0.3"/> failure point they were looking at was the birth of a baby they were measuring length of labour <pause dur="0.5"/> so rather ironically # <pause dur="1.0"/> <vocal desc="clears throat" iterated="n"/> <pause dur="0.3"/> the <pause dur="0.2"/><kinesic desc="makes quotation mark gesture" iterated="n"/> failures at that stage was a successful <pause dur="1.3"/> live birth <pause dur="2.3"/> # <pause dur="6.7"/> where does this get complicated <pause dur="1.0"/> just as i point out a <trunc>s</trunc> a few issues <kinesic desc="indicates point on board" iterated="n"/> here where you might need to think carefully <pause dur="0.6"/><kinesic desc="writes on board" iterated="y" dur="2"/> well as i say if it's death it's not <pause dur="1.1"/> too tricky <pause dur="0.4"/> but quite often we're going to be looking at things like a recurrence of cancer <pause dur="0.5"/> or you could look at that <pause dur="0.9"/> and <pause dur="2.2"/> there you could have multiple events so you'd want to say first recurrence <pause dur="0.8"/> if you're going over to something like epilepsy or asthma where you have repeated <pause dur="0.2"/> # attacks <pause dur="0.6"/> quite often you won't

be using survival analysis you'll be using <pause dur="0.4"/> methods for modelling <pause dur="0.6"/> stochastic processes <pause dur="0.2"/> which some of you will have studied <pause dur="2.0"/> <vocal desc="cough" iterated="n"/> <pause dur="0.8"/> and you may or may not want death from a particular cause you may only want deaths from lung cancers <pause dur="0.6"/> so any deaths from <pause dur="1.1"/> heart attacks <pause dur="0.2"/> might not be of interest <pause dur="4.3"/> well that's fine <pause dur="1.9"/> the most the one that's going to <pause dur="0.2"/> mean that we've got complications in life is that <pause dur="1.4"/> defined end point <pause dur="0.5"/> <kinesic desc="writes on board" iterated="y" dur="10"/> death <pause dur="0.2"/> or <pause dur="1.1"/> a recurrent <pause dur="2.2"/> a recurrence of the tumour or <pause dur="0.6"/> as i said in the case of labour statistics birth can be your end point <pause dur="2.0"/> all studies of premature children and and delaying <pause dur="1.1"/> # the birth of the child birth will be an end point <pause dur="0.7"/> # study that <pause dur="0.2"/> biological sciences was hoping to do <pause dur="0.6"/> but of course the

whole point is lost to follow-up <pause dur="0.5"/> and what do we do about loss to follow-up <pause dur="2.0"/> well that brings us into the major <pause dur="0.5"/> definition that we <pause dur="0.7"/> have in <pause dur="2.2"/> survival analysis of censoring <pause dur="2.1"/> and <pause dur="4.5"/> so <pause dur="0.9"/> i'll <trunc>ca</trunc> <pause dur="0.5"/> <kinesic desc="writes on board" iterated="y" dur="1"/> call this four <pause dur="0.2"/> it's the one first one two three that are the essential things <pause dur="1.3"/> to have survival analysis four is required to make sense of <pause dur="0.2"/> some of the rest of this <kinesic desc="writes on board" iterated="y" dur="3"/> which is <pause dur="1.5"/> censoring <pause dur="2.2"/> okay most of you might have thought of censoring in terms of <pause dur="0.9"/> governments telling you what films you can't can or can't watch or extracting <pause dur="1.5"/> parts of newspapers some of some of most of you won't have but some of us have been in countries where the newspapers appear with blank sections <pause dur="0.7"/> 'cause it's been written out <pause dur="0.9"/> # <pause dur="0.7"/> and that's <pause dur="0.4"/> the same <pause dur="0.3"/> <vocal desc="cough" iterated="n"/>

same meaning <pause dur="0.4"/> the reason the word's choosed in this <pause dur="0.3"/> chosen in this context <pause dur="0.5"/> censoring is just saying we have no more information <pause dur="2.3"/> so censoring <pause dur="0.8"/> <kinesic desc="writes on board" iterated="y" dur="3"/> of <pause dur="0.5"/> times <pause dur="2.0"/> and the <trunc>w</trunc> mechanism in which this <pause dur="0.2"/> is viewed <pause dur="0.4"/> is to say that <pause dur="2.2"/> <kinesic desc="writes on board" iterated="y" dur="13"/> we have <pause dur="0.8"/> for each individual <pause dur="7.9"/> # where am i going <pause dur="0.4"/> oops up here i think <pause dur="1.5"/> <kinesic desc="writes on board" iterated="y" dur="1"/> for each individual <pause dur="2.0"/><kinesic desc="writes on board" iterated="y" dur="12"/> a time <pause dur="1.7"/> C-I <pause dur="2.9"/> <trunc>m</trunc> <pause dur="0.2"/> beyond which we don't observe them <pause dur="14.9"/> do not <pause dur="4.0"/><kinesic desc="writes on board" iterated="y" dur="6"/> observe them <pause dur="2.2"/> okay <pause dur="0.6"/> so this means in fact that # <pause dur="1.4"/> time <pause dur="2.8"/> that we're actually going to observe is made up of <pause dur="0.6"/> two parts so <pause dur="0.6"/> if we <kinesic desc="writes on board" iterated="y" dur="34"/> let <pause dur="2.9"/> the <pause dur="0.3"/>

or an individual's <pause dur="6.0"/> actual lifetime <pause dur="0.5"/> what would we we would see if we were able to follow them up <pause dur="0.3"/> indefinitely <pause dur="4.1"/> be <pause dur="1.7"/> X-I <pause dur="5.0"/><kinesic desc="writes on board" iterated="y" dur="18"/> then <pause dur="2.0"/> we <pause dur="4.8"/> observe <pause dur="0.7"/> the survival time <pause dur="8.9"/> so we're going to observe the survival time <pause dur="2.3"/> <kinesic desc="writes on board" iterated="y" dur="1"/> which we're going to call T-I <pause dur="2.7"/> and T-I is a function of <kinesic desc="writes on board" iterated="y" dur="4"/> two things <pause dur="0.8"/> X-I <pause dur="1.7"/> and C-I <pause dur="3.0"/> so can you write down what that function must be <pause dur="2.9"/> the observed survival time is what function of the <pause dur="0.5"/> actual survival time and censoring <pause dur="11.3"/> <event desc="drinks" iterated="n"/> simple function <pause dur="11.9"/> if you think of that top left board <pause dur="2.7"/> where we've got crosses and then we've got the lines that go into circles or keep going on right <pause dur="1.3"/> and if we were to censor at two-thousand <pause dur="3.2"/> what do we do <pause dur="0.5"/> with any line that goes through that two-thousand mark <pause dur="4.5"/> we take the first line <pause dur="1.5"/> are we going to <trunc>s</trunc> observe the censoring

time or the death time <pause dur="6.6"/> right we're always going to observe the <pause dur="1.2"/> # <pause dur="0.7"/> i'm going to regret this aren't i <pause dur="4.6"/><kinesic desc="writes on board" iterated="y" dur="5"/> this person had a notional <pause dur="1.3"/> censoring time <pause dur="3.5"/> we've got notional censoring times for these people <pause dur="0.2"/> and we'll <trunc>al</trunc> always observe the minimum <pause dur="1.3"/> of <pause dur="1.7"/> the death time <pause dur="1.1"/> and the censoring time because that individual <pause dur="0.4"/> we'd stopped watching at two-thousand so we wouldn't have seen them <pause dur="0.4"/> so the <pause dur="4.3"/> the function we want here <pause dur="0.3"/> is <pause dur="1.6"/> <kinesic desc="writes on board" iterated="y" dur="2"/> min <pause dur="5.3"/> ah <pause dur="5.4"/> but we don't only observe the minimum <pause dur="0.4"/> 'cause that wouldn't be much use to us <pause dur="0.9"/><kinesic desc="writes on board" iterated="y" dur="14"/> we also need <pause dur="0.2"/> and <pause dur="1.4"/> an indicator function <pause dur="10.4"/> and this'll sometimes be <pause dur="0.5"/> given as a death and sometimes be given as censoring <pause dur="0.6"/> we'll call it <pause dur="1.1"/><kinesic desc="writes on board" iterated="y" dur="12"/> delta-I <pause dur="1.1"/> which is going to equal one <pause dur="1.2"/> if <pause dur="2.8"/> X-I <pause dur="0.2"/> is less than or equal to <pause dur="0.4"/> C-I <pause dur="1.3"/> in that case you can think of it as indicating that the death has occurred <pause dur="0.8"/><kinesic desc="writes on board" iterated="y" dur="4"/> and it's going to equal zero <pause dur="0.3"/> if <pause dur="1.9"/> X-I is greater than C-I <pause dur="0.7"/>

in other words we haven't actually observed the event <pause dur="15.5"/> in all the analysis that we do <pause dur="0.3"/> we're going to be assuming <pause dur="0.4"/> that censoring is <pause dur="1.6"/> non-informative that we're not going to learn anything from the censoring <pause dur="2.0"/> the ways in which censoring <pause dur="0.7"/> turn up they actually are given the names type one and type two <pause dur="0.6"/> as i quite often find it difficult to remember which one is which i'm not going to <pause dur="0.3"/> ask you to do that <pause dur="1.1"/> type one censoring <pause dur="0.6"/> is the kind of thing you most often get in medical statistics <pause dur="0.9"/> <vocal desc="clears throat" iterated="n"/> you have a study <pause dur="0.7"/> it has to finish at some point <pause dur="1.5"/> so if it finishes at two-thousand or if it finishes <pause dur="0.6"/> <vocal desc="clears throat" iterated="n"/> <pause dur="0.7"/> at a series of dates so it finishes in two-thousand <pause dur="0.8"/> in <pause dur="0.2"/> the Walsgrave Hospital but we carry out data collection in one or two other hospitals at a later date <pause dur="0.8"/> but we're still finishing at fixed times <pause dur="0.7"/> then that's called type one censoring <pause dur="0.4"/> the reason you don't observe people <pause dur="1.0"/> isn't because <pause dur="0.4"/> <vocal desc="clears throat" iterated="n"/> <pause dur="0.3"/> you've decided i'm going to ignore that person it's for a fixed time <pause dur="0.2"/> you've

stopped the study <pause dur="1.1"/> type two censoring <pause dur="0.3"/> is <pause dur="0.3"/> much less common in medical statistics but it's very common in engineering <pause dur="0.6"/> which is to say <pause dur="0.7"/> i'm going to observe this cohort of individuals <pause dur="0.4"/> until a certain number or certain percentage of them have died <pause dur="0.6"/> or <kinesic desc="makes quotation mark gesture" iterated="n"/> failed <pause dur="0.5"/> so i'm putting <pause dur="0.3"/> twenty <pause dur="0.3"/> items on test <pause dur="0.7"/> at different loadings <pause dur="0.8"/> and <pause dur="0.3"/> once we've <pause dur="0.2"/> put the loads up to the point at which <pause dur="1.1"/> ten of them have failed <pause dur="0.4"/> i'm going to stop the study <pause dur="0.7"/> so type two censoring <pause dur="1.0"/> is dependent on the number of failures so it actually does depend on the whole <pause dur="0.4"/> time process up to that point <pause dur="1.3"/> the way in which it determines when the tenth failure will occur <pause dur="0.6"/> but what it doesn't do is depend on anything in the future <pause dur="2.1"/> and then you can get <pause dur="0.3"/> other kinds of <pause dur="0.2"/> censoring <trunc>mecha</trunc> mechanisms <pause dur="2.3"/> but <pause dur="1.4"/> what's a <trunc>s</trunc> <pause dur="0.2"/> crucial is that you want your <pause dur="2.4"/> so <pause dur="0.3"/> <trunc>i</trunc> in this course <pause dur="0.3"/> well there is research in other things <pause dur="0.4"/> <kinesic desc="writes on board" iterated="y" dur="15"/> but <pause dur="0.9"/> in this

course and in most <pause dur="0.6"/> of the work that you'll look at <pause dur="2.9"/> we <pause dur="0.5"/> assume <pause dur="4.5"/> that <pause dur="0.9"/> # <pause dur="9.0"/> right <pause dur="0.9"/> assume that <kinesic desc="writes on board" iterated="y" dur="18"/> censoring <pause dur="4.0"/> is <pause dur="2.2"/> independent <pause dur="3.4"/> in a fairly general sense of <pause dur="3.1"/> survival <pause dur="11.6"/> what we want <kinesic desc="writes on board" iterated="y" dur="19"/> more formally is that the probability that <pause dur="1.1"/> T is greater than <pause dur="0.6"/> some value T <pause dur="1.4"/> given that <pause dur="0.9"/> this was censored <pause dur="1.6"/> at <pause dur="0.6"/> time <pause dur="1.2"/> C <pause dur="2.2"/> well that shouldn't depend on C <pause dur="2.4"/> as in that that <pause dur="0.3"/> particular point <pause dur="0.5"/> the times have been <pause dur="0.4"/> censored <pause dur="1.2"/> so we just <kinesic desc="writes on board" iterated="y" dur="8"/> want that to be equal to the probability that T is greater than T <pause dur="0.7"/> given that we already know that the time <pause dur="0.8"/> is greater than C <pause dur="2.4"/> not the fact of censoring just <pause dur="0.2"/> the sheer time so <pause dur="1.5"/><kinesic desc="indicates point on board" iterated="n"/> this would hold true <pause dur="0.8"/> true for all times before that actual censoring time <pause dur="4.4"/> so having got the <pause dur="0.2"/> definition of the survival time <pause dur="2.7"/> the thing that we <pause dur="0.2"/> the main <pause dur="0.5"/> variable that we use within survival is <pause dur="1.2"/>

<kinesic desc="writes on board" iterated="y" dur="22"/> the survival function <pause dur="0.4"/> is the focus <pause dur="3.6"/> sorry survival function <pause dur="2.4"/> almost invariably called <pause dur="0.8"/> S for survival S-T-of-T <pause dur="1.1"/> is the probability of <pause dur="0.5"/> the random variable T <pause dur="0.8"/> being greater than time T <pause dur="0.2"/> probability of surviving beyond time T <pause dur="0.9"/> so what <pause dur="0.4"/> how does that relate to the functions you're used to dealing with with random variables <pause dur="0.8"/> tell the person <pause dur="0.2"/> next to you <pause dur="0.2"/> how you'd write that in terms of a familiar function <pause dur="1.4"/> and what the function is <pause dur="24.3"/> any volunteers <pause dur="2.3"/> apart from the usual suspects <pause dur="2.0"/> does it look like anything you recollect meeting before <pause dur="4.5"/> yes <pause dur="2.6"/> puzzled looks <pause dur="0.2"/> <vocal desc="laugh" iterated="n"/> <pause dur="3.0"/> someone be kind to me <pause dur="0.2"/> where have you seen a function like this before <pause dur="1.9"/> but what was the function <pause dur="6.0"/> you've all seen it <vocal desc="laugh" iterated="n"/><pause dur="10.7"/> some volunteer <pause dur="2.8"/> no <pause dur="0.3"/> no idea <pause dur="4.2"/> probabilities <pause dur="0.2"/> what's one of the

standard things we know about probabilities <pause dur="3.3"/> and so how can you convert that probability statement into another probability statement </u><pause dur="6.3"/> <u who="sf0953" trans="pause"> so if like S-T-of-T is one minus the probability of <pause dur="0.9"/> T is less than <gap reason="inaudible" extent="1 sec"/></u><pause dur="4.5"/> <u who="nf0951" trans="pause"> <kinesic desc="writes on board" iterated="y" dur="6"/> thank you which is usually known as </u><pause dur="0.4"/> <u who="ss" trans="pause"> <gap reason="inaudible, multiple speakers" extent="1 sec"/> </u><u who="sf0954" trans="pause"> density </u><pause dur="0.7"/> <u who="nf0951" trans="pause"> cumulative density <pause dur="0.5"/><kinesic desc="writes on board" iterated="y" dur="3"/> cumulative density function or <pause dur="1.2"/> distribution function <pause dur="1.1"/> so most of the things you'll have done before in likelihood is <pause dur="0.8"/> basically been worked on the density function <pause dur="0.8"/> survival works on <pause dur="0.5"/> one minus the distribution function <pause dur="3.9"/> in other words <pause dur="1.3"/> <kinesic desc="writes on board" iterated="y" dur="11"/> those <pause dur="1.6"/>

plots i was showing you <pause dur="1.5"/> where i talked about the probability of surviving beyond some time <pause dur="0.6"/> those were plots of <pause dur="1.1"/> an empirical <pause dur="0.2"/> survival <pause dur="0.5"/> function <pause dur="0.4"/> which was actually one minus your standard cumulative density function <pause dur="2.9"/> <kinesic desc="writes on board" iterated="y" dur="1"/> right <pause dur="0.2"/> # <pause dur="6.4"/> the next logical thing for me to do is to start talking about how we do a life table analysis of <kinesic desc="indicates point on board" iterated="n"/> that data <pause dur="1.5"/> and given that it's lunchtime and you've got a <trunc>l</trunc> other things to do i'm actually planning <pause dur="0.3"/> i said i'd try to finish these lectures slightly early most times <pause dur="0.4"/> so i think it's actually more sensible for me to stop at this point <pause dur="0.3"/> answer any questions <pause dur="0.6"/> and see you again on Wednesday morning at <pause dur="0.3"/> five past nine