Skip to main content Skip to navigation


<?xml version="1.0"?>

<!DOCTYPE TEI.2 SYSTEM "base.dtd">




<title>Prosodic information and the recognition of words</title></titleStmt>

<publicationStmt><distributor>BASE and Oxford Text Archive</distributor>


<availability><p>The British Academic Spoken English (BASE) corpus was developed at the

Universities of Warwick and Reading, under the directorship of Hilary Nesi

(Centre for English Language Teacher Education, Warwick) and Paul Thompson

(Department of Applied Linguistics, Reading), with funding from BALEAP,

EURALEX, the British Academy and the Arts and Humanities Research Board. The

original recordings are held at the Universities of Warwick and Reading, and

at the Oxford Text Archive and may be consulted by bona fide researchers

upon written application to any of the holding bodies.

The BASE corpus is freely available to researchers who agree to the

following conditions:</p>

<p>1. The recordings and transcriptions should not be modified in any


<p>2. The recordings and transcriptions should be used for research purposes

only; they should not be reproduced in teaching materials</p>

<p>3. The recordings and transcriptions should not be reproduced in full for

a wider audience/readership, although researchers are free to quote short

passages of text (up to 200 running words from any given speech event)</p>

<p>4. The corpus developers should be informed of all presentations or

publications arising from analysis of the corpus</p><p>

Researchers should acknowledge their use of the corpus using the following

form of words:

The recordings and transcriptions used in this study come from the British

Academic Spoken English (BASE) corpus, which was developed at the

Universities of Warwick and Reading under the directorship of Hilary Nesi

(Warwick) and Paul Thompson (Reading). Corpus development was assisted by

funding from the Universities of Warwick and Reading, BALEAP, EURALEX, the

British Academy and the Arts and Humanities Research Board. </p></availability>




<recording dur="00:46:15" n="6868">


<respStmt><name>BASE team</name>



<langUsage><language id="en">English</language>

<language id="pl">Polish</language>

<language id="fr">French</language>



<person id="nm0858" role="main speaker" n="n" sex="m"><p>nm0858, main speaker, non-student, male</p></person>

<person id="om0859" role="observer" n="o" sex="m"><p>om0859, observer, observer, male</p></person>

<person id="su0860" role="participant" n="s" sex="u"><p>su0860, participant, student, unknown sex</p></person>

<person id="sm0861" role="participant" n="s" sex="m"><p>sm0861, participant, student, male</p></person>

<person id="sf0862" role="participant" n="s" sex="f"><p>sf0862, participant, student, female</p></person>

<personGrp id="ss" role="audience" size="s"><p>ss, audience, small group </p></personGrp>

<personGrp id="sl" role="all" size="s"><p>sl, all, small group</p></personGrp>

<personGrp role="speakers" size="7"><p>number of speakers: 7</p></personGrp>





<item n="speechevent">Lecture</item>

<item n="acaddept">Linguistic Science</item>

<item n="acaddiv">ps</item>

<item n="partlevel">UG2</item>

<item n="module">Phonetics</item>




<u who="nm0858"> okay this # <pause dur="0.8"/> should be the last <pause dur="0.6"/> lecture of this series according to the handbook <pause dur="0.4"/> # there should be four this term this is number four <pause dur="0.6"/> # at the end # i'll just check with you because <pause dur="0.3"/> if there's anything that you'd like me to go over again or if you've got any <pause dur="0.4"/> questions or points to make <pause dur="0.3"/> # we could have another get-together <pause dur="0.2"/> at the same time next week <pause dur="0.2"/> that slot is free <pause dur="0.5"/> but as far as i'm concerned the content of the <pause dur="0.3"/> lectures is complete <pause dur="0.4"/> assuming i get through what i want to get through today <pause dur="0.7"/> okay <pause dur="2.0"/> # <pause dur="1.0"/> <gap reason="name" extent="1 word"/> here from CALS is recording what i'm saying <pause dur="0.3"/> # it's not that he's going to learn anything interesting </u> <u who="om0859" trans="overlap"> <gap reason="inaudible" extent="1 sec"/></u> <u who="nm0858" trans="overlap"> it's just # <pause dur="0.5"/> # he's going to be <pause dur="0.5"/> finding what a syntactic disaster zone <pause dur="0.4"/> # unscripted speech is in a lecture <shift feature="voice" new="laugh"/>i think <shift feature="voice" new="normal"/> <pause dur="1.0"/> # <pause dur="0.9"/> let me just <pause dur="0.2"/> oh first of all # <pause dur="0.3"/> <trunc>ap</trunc> <pause dur="0.2"/> apologies various well i've i've had it pointed out to me that this # <pause dur="0.6"/> the tables in this room of are in a pretty <pause dur="0.2"/> filthy state <pause dur="0.5"/> and # they're not being cleaned properly so

i'll pass that on to <pause dur="0.5"/> the relevant # <pause dur="0.9"/> department in the faculty <pause dur="1.1"/> okay <pause dur="0.5"/> where we got to <pause dur="0.3"/> last week <pause dur="0.6"/> # was saying that <pause dur="0.3"/> # <pause dur="0.4"/> although the evidence is very conflicting very strongly conflicting <pause dur="0.5"/> there are apparently characteristic <pause dur="0.2"/> rhythmical differences <pause dur="0.4"/> between different languages <pause dur="2.9"/> and i said that although <pause dur="0.3"/> we're still very much in the dark about this <pause dur="0.5"/> there clearly would not be this rhythmical <pause dur="0.3"/> regularity <pause dur="0.3"/> unless it was fulfilling some function <pause dur="0.8"/> okay <pause dur="0.3"/> if it's true that every <pause dur="0.4"/> speaker of every language <pause dur="0.2"/> speaks with some kind of rhythm <pause dur="0.5"/> there must be some kind of function <pause dur="0.2"/> to that rhythm <pause dur="0.4"/> # because <pause dur="0.2"/> # on the whole we don't do things just for the fun of it <pause dur="0.4"/> # when we're speaking <pause dur="0.8"/> and # <pause dur="0.3"/> one of the major <pause dur="0.7"/> functions of rhythm <pause dur="0.3"/> that we suspect <pause dur="0.6"/><event desc="student enters room" iterated="n" n="sm0861"/> is that it helps us to divide speech up into units <pause dur="4.4"/> <unclear>hello</unclear> it's phonetics </u><pause dur="0.2"/> <u who="sm0861" trans="pause"> oh sorry <pause dur="0.2"/> wrong room <event desc="student leaves room" iterated="n" n="sm0861"/></u> <u who="nm0858" trans="latching"> okay <pause dur="1.9"/> <shift feature="voice" new="laugh"/><gap reason="inaudible" extent="1 sec"/><shift feature="voice" new="normal"/><pause dur="2.3"/> okay so <pause dur="2.0"/> we're looking at <pause dur="0.3"/> the possible functions of rhythm and timing <pause dur="0.4"/> in dividing speech

up into units that we need for perceiving speech <pause dur="4.3"/> now of course one of the questions is what sort of units are we dividing up into but first of all just consider <pause dur="0.5"/> if we had no help <pause dur="0.5"/> in dividing speech up <pause dur="0.5"/> into units that is if we had no help <pause dur="0.4"/> from any kind of phonetic information <pause dur="1.2"/> what we would have to imagine is <pause dur="0.4"/> bearing in mind that we don't usually pause between words <pause dur="0.4"/> we'd have to imagine that speech is coming at us as a continuous <pause dur="0.2"/> stream <pause dur="0.4"/> of phonemes and syllables <pause dur="0.9"/> and <pause dur="0.4"/> we would have the very difficult decision <pause dur="0.6"/> # as to where one <pause dur="0.2"/> word ends and another one begins or where a phrase ends another one begins <pause dur="0.4"/> # or where we reach a a sentence boundary and so on <pause dur="1.9"/> and i was going to give you an example of this because somebody once wrote an article <pause dur="0.7"/> in phonetic transcription <pause dur="0.4"/> leaving no spaces at all <pause dur="1.2"/> # this was a a slightly nutty paper <pause dur="0.4"/> # that appeared in a rather <pause dur="0.2"/> bizarre journal in which all the papers had to be written in in phonemic transcription <pause dur="0.6"/> and this guy was saying <pause dur="0.2"/> that i

mean he was actually a very eminent phonetician John Trim <pause dur="0.4"/> he was saying <pause dur="0.6"/> # <pause dur="0.3"/> if we're going to be realistic with phonemic transcription <pause dur="0.2"/> we shouldn't put spaces in our transcription unless there is phonetically a space there <pause dur="0.4"/> and if you can't hear one you shouldn't write one <pause dur="0.8"/> and so he then wrote out this paper as though he was saying it out loud <pause dur="0.4"/> # only putting a space where he would pause for breath <pause dur="0.4"/> and of course <pause dur="0.2"/> # <pause dur="0.2"/> even if you can read phonemic transcription <pause dur="0.3"/> the actual paper is almost completely illegible <pause dur="0.4"/> because it's just a continuous string of phonemes <pause dur="0.7"/> # <pause dur="0.5"/> and # i will <pause dur="0.4"/> # <pause dur="0.3"/> let you have a copy of that paper if you're interested <pause dur="0.4"/> # unfortunately the volume on my shelves which has got it in is <pause dur="0.3"/> out on loan to # one of my research students # <pause dur="0.5"/> # <pause dur="0.2"/> who's not <pause dur="0.2"/> who's not brought it back but i will get this to you it's an <pause dur="0.2"/> interesting experience to read it <pause dur="0.3"/> in fact it's a very interesting experience to read the journal itself the <pause dur="0.4"/> it used to be called Le Mètre Phonétique <pause dur="0.5"/>

and # # it was the main publication of the International Phonetic Association <pause dur="0.4"/> and until nineteen-seventy-four <pause dur="0.5"/> you would not be <trunc>h</trunc> you would not have a paper accepted for that journal <pause dur="0.3"/> unless it was written in phonemic transcription <pause dur="1.3"/> anyway one of the big debates in <pause dur="0.5"/> # <pause dur="0.8"/> the early # well around the middle period of this journal Le Mètre Phonétique <pause dur="0.4"/> was a big dispute between two of the great giants of <pause dur="0.3"/> # phonetics in the early part of the twentieth century <pause dur="0.5"/> # <pause dur="0.3"/> that was Paul Passy <pause dur="0.2"/> who was the great French phonetician <pause dur="0.3"/> and Daniel Jones the <pause dur="0.2"/> his British equivalent <kinesic desc="turns on overhead projector showing transparency" iterated="n"/><pause dur="1.3"/> # <pause dur="0.2"/> here's a quote from Passy <pause dur="0.5"/> <reading><distinct lang="fr">il est bien entendu n'est ce pas que l'espace blanc laissé entre des mots n'a pas de valeur phonétique</distinct></reading> <pause dur="0.6"/> basically saying <pause dur="0.2"/> we leave spaces between the words when we write transcription <pause dur="0.2"/> but it doesn't mean anything phonetically <pause dur="0.6"/> okay it's just there <pause dur="0.3"/> to help us <pause dur="0.3"/> understand <pause dur="0.4"/> # <pause dur="0.2"/> more easily what is written <pause dur="2.1"/> and <pause dur="0.4"/> # <pause dur="0.9"/> this <pause dur="0.2"/> # stirred Jones to write a reply <pause dur="0.3"/> by the way # i'll be

giving you a handout which gives you these <kinesic desc="changes transparency" iterated="y" dur="6"/> quotes so you don't need to write these down verbatim <pause dur="0.5"/> # <pause dur="0.4"/> just take in the general gist <pause dur="0.7"/> # <pause dur="1.1"/> Jones's reply was i would say that a word is a phonetic entity <pause dur="0.3"/> that the blank spaces between written words do have phonetic significance <pause dur="0.5"/> Passy himself has given instance of <trunc>th</trunc> <trunc>in</trunc> instances of this <pause dur="0.5"/> # <reading><distinct lang="fr">dans un parler tant soit peu langue on distinguera</distinct> <pause dur="0.4"/> <distinct lang="fr">trois petites roues</distinct></reading> <pause dur="0.2"/> that's # three little wheels <pause dur="0.4"/> and <distinct lang="fr">trois petits trous</distinct> <pause dur="0.3"/> # three little holes <pause dur="0.5"/> okay so # <pause dur="0.7"/> # this has been disputed by French phoneticians <pause dur="0.3"/> # quite a lot <pause dur="0.3"/> because some French people say you can't hear the difference between those two <pause dur="0.6"/> # three little wheels and three little holes <pause dur="0.4"/> # <pause dur="0.2"/> but the word boundary is in a different place in these two cases <pause dur="0.3"/> and # <pause dur="0.2"/> Passy on another occasion <pause dur="0.3"/> had said that you can actually hear the difference between them <pause dur="0.3"/> i'm not enough of a French speaker to know if it's true that you can hear the difference <pause dur="0.3"/> but if any of you have French speaking

friends you might like to try it out on them <pause dur="0.3"/> and see if there is a perceptible difference <pause dur="1.8"/> okay <pause dur="0.2"/> <event desc="takes off transparency" iterated="n"/> so there's # there's a a basic dispute # certainly as far as English and French are concerned <pause dur="0.3"/> about whether we can actually hear <pause dur="0.3"/> where one word ends and one <pause dur="0.2"/> begins <pause dur="0.6"/> and it is when you think about it one of the most <pause dur="0.3"/> vital functions <pause dur="0.3"/> in <pause dur="0.3"/> speech perception <pause dur="0.8"/> if speech comes at us as one continuous stream <pause dur="0.9"/> and yet <pause dur="0.5"/> mentally <pause dur="0.5"/> we're able to divide that up <pause dur="0.4"/> into in a very sophisticated way <pause dur="0.4"/> into a whole <pause dur="0.4"/> series of units <pause dur="0.5"/> going down to very small units like words <pause dur="0.3"/> and going right up to big units like sentences <pause dur="0.5"/> if we're able to do that <pause dur="0.8"/> the <pause dur="0.4"/> question is are we getting any help in that from the phonetic information <pause dur="0.6"/> is <trunc>th</trunc> is there stuff there in the speech signal <pause dur="0.4"/> that tells us where these boundaries are <pause dur="0.5"/> or are we just figuring it out on some kind of statistical basis <pause dur="11.5"/> i've been doing some research <pause dur="0.2"/> on <pause dur="0.4"/> one particular problem that arises out of this <pause dur="0.4"/> and i'd like to use that as a kind of

peg to hang this <pause dur="0.2"/> issue on <pause dur="0.4"/> to tell you a little bit about what we've been doing <pause dur="0.4"/> # and where we've been getting with this <pause dur="0.8"/> it's the problem <pause dur="0.2"/> it's sometimes called the problem of embedded words <pause dur="4.3"/><kinesic desc="puts on transparency" iterated="n"/> when we hear a word of several syllables like responsibility <pause dur="1.1"/> okay <pause dur="1.0"/> that word will almost inevitably contain some other English words which are smaller <pause dur="1.0"/> so in the case of responsibility <pause dur="0.2"/> we find within that the word response <pause dur="1.4"/> the word sponsor <pause dur="0.8"/> that bit there <pause dur="0.7"/> <kinesic desc="indicates point on transparency" iterated="n"/> the word <pause dur="0.3"/> ability <pause dur="0.9"/> that last bit there <pause dur="0.6"/><kinesic desc="indicates point on transparency" iterated="n"/> the word bill <pause dur="0.4"/> which is that one there <pause dur="0.2"/> <kinesic desc="indicates point on transparency" iterated="n"/> and you'd find a few others as well if you looked hard enough <pause dur="0.5"/> almost any <pause dur="0.2"/> word <pause dur="0.2"/> in English <pause dur="0.3"/> of more than two syllables <pause dur="0.2"/> will actually contain <pause dur="0.3"/> within it <pause dur="0.3"/> packaged up inside it <pause dur="0.3"/> a smaller English word <pause dur="3.2"/> now consider what the brain is faced with <pause dur="0.3"/> if somebody produces a sentence containing the word responsibility <pause dur="2.1"/> if the brain wrongly <pause dur="0.2"/> segments responsibility into response <pause dur="1.4"/> and <pause dur="0.2"/> ability <pause dur="1.3"/> the <pause dur="0.3"/> parsing <pause dur="0.3"/> # the decoding of that sentence <pause dur="0.2"/> is going to go catastrophically wrong <pause dur="5.5"/> see the <pause dur="0.3"/>

point i'm making <pause dur="1.4"/> somehow we know <pause dur="0.4"/> that when we hear responsibility <pause dur="0.6"/> # <pause dur="0.5"/> it is that word <pause dur="0.2"/> not a combination of response and <pause dur="0.4"/> ability <pause dur="1.0"/> nor is it <pause dur="0.3"/> in some sense <pause dur="0.2"/> re<pause dur="0.2"/> and sponsor <pause dur="0.3"/> and <pause dur="0.2"/> bility <pause dur="0.4"/> <trunc>whi</trunc> # <pause dur="0.2"/> that of course contains two English <pause dur="0.3"/> non-words <pause dur="0.3"/> so that's a <trunc>com</trunc> comparatively easy task <pause dur="2.0"/> now the fact is <pause dur="0.4"/> that we don't go <pause dur="0.2"/> <trunc>cata</trunc> catastrophically wrong <pause dur="0.3"/> we don't make mistakes all the time <pause dur="0.4"/> about <pause dur="0.2"/> # <pause dur="0.2"/> polysyllabic words <pause dur="3.4"/> and the research that i've been <pause dur="0.3"/> involved in <pause dur="0.4"/> has been looking at factors responsible for <pause dur="0.4"/> our being able to cope successfully <pause dur="0.5"/> with <pause dur="0.2"/> this problem of embedded words <pause dur="0.3"/> the fact that we're not constantly going off in the wrong direction <pause dur="0.2"/> being fooled by <pause dur="0.3"/> the sounds into hearing something <pause dur="0.3"/> that isn't there <pause dur="3.1"/> let's just <pause dur="0.3"/> try to think about <pause dur="0.7"/> what <pause dur="0.3"/><kinesic desc="changes transparency" iterated="y" dur="6"/> factors might be <pause dur="0.8"/> helping us <pause dur="0.3"/> not to go wrong <pause dur="1.4"/> i've got <pause dur="0.3"/> three possible <pause dur="0.3"/> hypotheses here <pause dur="2.1"/> and again these are in the handout that i'll be giving you <pause dur="0.4"/> and <pause dur="0.4"/> # <pause dur="0.5"/> you don't need to write these down <pause dur="0.3"/> you you'll get this text <pause dur="0.4"/> later on <pause dur="5.0"/> okay the

question is then how are we <pause dur="0.3"/> successful how comes how come that we are successful <pause dur="0.4"/> <trunc>mo</trunc> most of the time <pause dur="0.3"/> in deciding where word boundaries come <pause dur="1.9"/> one possibility <pause dur="0.6"/> is that there is some phonetic information <pause dur="0.4"/> just at the point where the boundary comes <pause dur="0.4"/> which helps us to say ah <pause dur="0.3"/> that's a boundary <pause dur="0.3"/> so i know <pause dur="0.2"/> that one word ends and another one begins at that point <pause dur="0.3"/> now we'll look at that in a bit more detail in a moment that's <pause dur="0.2"/> certainly a possible and a plausible <pause dur="0.6"/> # explanation <pause dur="2.8"/> okay <pause dur="0.2"/> another possibility <pause dur="1.1"/> is that <pause dur="0.3"/> although we don't <pause dur="0.2"/> find <pause dur="0.6"/> all that much information <pause dur="0.6"/> in the actual segments <pause dur="0.2"/> at the boundary the actual phonemes <pause dur="0.5"/> at the beginning and end of a word <pause dur="0.4"/> there is something about the overall shape of a word <pause dur="0.4"/> that enables us to say <pause dur="0.5"/> # <pause dur="0.4"/> i can hear <pause dur="0.4"/> a word starting <pause dur="0.3"/> and now i can hear the word ending there's something about the overall shape of it <pause dur="0.7"/> for example <pause dur="0.2"/> # this is a completely wrong statement but imagine <pause dur="0.3"/> that there was something like <pause dur="0.3"/> that a word always started very quietly <pause dur="0.2"/>

and built up to a a crescendo <pause dur="0.2"/> and then faded away into silence <pause dur="0.4"/> the overall shape of the loudness pattern there <pause dur="0.3"/> would help you to know <pause dur="0.3"/> the where the beginning and end of the word <pause dur="0.2"/> came <pause dur="0.2"/> that actually doesn't happen of course <pause dur="0.3"/> # that that's just an imaginary <pause dur="0.2"/> example <pause dur="1.6"/> and the third possibility <pause dur="0.2"/> which i suspect a lot of linguists would rather prefer <pause dur="0.2"/> is that there is absolutely nothing worth listening to <pause dur="0.2"/> in the speech signal <pause dur="0.5"/> when it comes to deciding on boundaries <pause dur="0.3"/> we simply do it <pause dur="0.2"/> on the basis of linguistic knowledge <pause dur="0.8"/> that's the sort of top-down <pause dur="0.5"/> theory <pause dur="0.6"/> and we'll look at that one a bit more as well so <pause dur="0.2"/> what i've actually said <trunc>th</trunc> is # <pause dur="0.2"/> instead of using phonetic or phonological information <pause dur="0.4"/> we match segment strings against our lexicon <pause dur="0.3"/> and choose the match that gives the most plausible sequence of words <pause dur="4.7"/> let's just go back to the <pause dur="0.3"/> # example <kinesic desc="changes transparency" iterated="y" dur="5"/> i had on the previous slide the word responsibility <pause dur="0.6"/> # <pause dur="0.9"/> if you get that <trunc>str</trunc> that <trunc>s</trunc> <pause dur="0.2"/> sequence of phonemes that make up <pause dur="0.3"/>

responsibility <pause dur="0.3"/> and i put to you the problem <pause dur="0.2"/> why <pause dur="0.2"/> in a sentence like it's your responsibility to get there on time <pause dur="0.4"/> why don't we <pause dur="0.4"/> interpret that as it's your response <pause dur="0.5"/> ability <pause dur="0.3"/> to get there in time <pause dur="0.3"/> why don't we interpret it that way <pause dur="0.5"/> it's because we know <pause dur="0.3"/> the structure of that sentence <pause dur="0.2"/> we know its lexical content <pause dur="0.2"/> we know the sort of situation in which that's uttered <pause dur="0.5"/> and we simply wouldn't make a daft <pause dur="0.3"/> interpretation like <pause dur="0.3"/> response <pause dur="0.3"/> and <pause dur="0.2"/> ability <pause dur="0.2"/> as two separate words <pause dur="0.2"/> because it wouldn't fit <pause dur="0.2"/> with the syntax <pause dur="0.2"/> and it wouldn't fit with the semantics of what we were saying <pause dur="0.8"/>

end of problem <pause dur="0.3"/> you don't need <pause dur="0.2"/> phonetic <pause dur="0.2"/> or phonological information <pause dur="0.7"/> okay that would <pause dur="0.2"/> # almost certainly be <pause dur="0.5"/> # so the well anyway the computational linguists' answer to the problem <pause dur="0.5"/> there is <pause dur="0.2"/> enough contextual linguistic information to solve the problem <pause dur="0.3"/> without relying on what's there in the sounds <pause dur="0.8"/> now of course <pause dur="0.3"/> # <pause dur="0.4"/> that's not <pause dur="0.5"/> my <pause dur="0.2"/> approach to <shift feature="voice" new="laugh"/> the subject so i'm <shift feature="voice" new="normal"/> not going to buy that <pause dur="0.3"/> explanation <pause dur="0.4"/> to me <pause dur="0.4"/> # <pause dur="0.7"/> # there must be something in point one <pause dur="0.3"/> that word boundaries are marked by allophonic information in the segments adjacent to the boundary <pause dur="0.7"/> and or <pause dur="0.6"/> prosodic factors can <trunc>ter</trunc> can characterize the overall form of a word <pause dur="0.7"/> now remember that last week i was talking about differences between different languages <pause dur="0.4"/> what i suspect is <pause dur="0.6"/> that in some languages <pause dur="0.4"/> we find a preponderance of the <pause dur="0.4"/> # # <pause dur="0.9"/> # the function <pause dur="0.4"/> in word boundary divisions <pause dur="0.4"/> <trunc>b</trunc> <pause dur="0.6"/> here based on this first <pause dur="0.4"/> possibility <pause dur="0.4"/> that word boundaries <pause dur="0.2"/> are marked <pause dur="0.2"/> segmentally <pause dur="0.2"/> at

the edges <pause dur="0.3"/> and in other languages you find that the main <pause dur="0.2"/> contributing <pause dur="0.3"/> factor to our being able to divide into words <pause dur="0.3"/> is the second one the prosodic information <pause dur="0.7"/> let's let's look at this second one <pause dur="0.2"/> to begin with <pause dur="0.7"/> prosodic factors <trunc>character</trunc> <vocal desc="cough" iterated="n"/> characterizing the overall <trunc>fir</trunc> form of the word <pause dur="0.6"/> <vocal desc="cough" iterated="n"/> <pause dur="0.2"/> if it's true that in French <pause dur="1.0"/> every word <pause dur="0.4"/> ends with a <kinesic desc="writes on board" dur="4" iterated="y"/> <pause dur="1.8"/> stressed <pause dur="0.2"/> syllable <pause dur="2.7"/> which is usually <pause dur="0.2"/> the claim made in introductory phonetics books <pause dur="0.6"/> then dividing French up <pause dur="0.4"/> dividing continuous French up <pause dur="0.2"/> into words <pause dur="0.3"/> is simply not a problem <pause dur="0.6"/> you just listen for a stressed syllable <pause dur="0.5"/> and you say <pause dur="0.2"/> ha <pause dur="0.2"/> stressed syllable <pause dur="0.3"/> end of word <pause dur="0.7"/> word boundary <pause dur="0.5"/> listen for the next one and then you might have <pause dur="0.6"/> a few syllables <pause dur="0.3"/><kinesic desc="writes on board" iterated="n"/> and a strong one like that strongly stressed one <pause dur="0.4"/> so you automatically <pause dur="0.3"/> then <pause dur="0.2"/> place a word boundary <pause dur="0.2"/> it's about as simple <pause dur="0.2"/> a procedure <pause dur="0.3"/> as simple an algorithm <pause dur="0.3"/> as you could find <pause dur="0.3"/> in <pause dur="0.3"/> decoding speech <pause dur="0.2"/> just listen for a stressed syllable <pause dur="0.4"/> and place a word boundary

immediately after it <pause dur="2.5"/> # <pause dur="0.2"/> and there are other languages as i've said before which have other stressed patterns <pause dur="0.2"/> so for example in Polish <pause dur="0.4"/> # <pause dur="0.2"/> most words <pause dur="0.2"/> have <pause dur="0.2"/><kinesic desc="writes on board" iterated="y" dur="4"/> a strong <pause dur="0.3"/> syllable <pause dur="0.3"/> and then a weak one and then a word boundary <pause dur="0.4"/> in other words the stress in Polish <pause dur="0.2"/> normally comes on the penultimate syllable <pause dur="0.7"/> so if you're a Polish listener listening to Polish <pause dur="0.4"/> you <pause dur="0.2"/> <trunc>l</trunc> <pause dur="0.3"/> # let the stream of speech come in through your ears <pause dur="0.3"/> and you simply <pause dur="0.3"/> # have <pause dur="0.2"/> # some bit of your <pause dur="0.4"/> processing <pause dur="0.5"/> capability in your brain <pause dur="0.2"/> listening out for stressed syllables <pause dur="0.3"/> you let one more syllable go by <pause dur="0.3"/> and then <pause dur="0.2"/> you place the word boundary <pause dur="0.9"/> it's <pause dur="0.5"/> just as with French <trunc>w</trunc> one has to be a bit sceptical about this <pause dur="0.2"/> there are <pause dur="0.3"/> actually if you listen to spoken French there are plenty of cases where French speakers put the stress <pause dur="0.4"/> # earlier than the final syllable <pause dur="0.4"/> there are exceptions to the rule in Polish <pause dur="0.3"/> if you take a a word <pause dur="0.5"/> the <trunc>wor</trunc> Polish word for university for example is <distinct lang="pl">uniwersytet</distinct> <pause dur="0.4"/> # which is <pause dur="0.3"/> pro-penultimate it's the it's

it's not on <pause dur="0.3"/> # it's not they don't say <distinct lang="pl" type="sampa">[unIvEr"sItEt]</distinct> <pause dur="0.2"/> they say <distinct lang="pl" type="sampa"> [unI"vErsItEt]</distinct> which is <pause dur="0.3"/> # which leaves two unstressed syllables at the end <pause dur="0.2"/> but most Polish words are are <trunc>str</trunc> are are structured like that <pause dur="0.7"/> so in those cases there are prosodic factors characterizing the overall form of the word <pause dur="0.3"/> under those circumstances <pause dur="0.2"/> you've got lots of help <pause dur="0.2"/> for dividing up speech into words <pause dur="5.5"/> now as we know English is a much more difficult customer <pause dur="0.2"/> in that respect <pause dur="0.3"/> because <pause dur="1.2"/> we know that in polysyllabic English words <pause dur="0.3"/> we find some <pause dur="0.2"/> where the stress is on the first syllable <pause dur="0.5"/> some where the stress is on the last syllable <pause dur="0.3"/> and some <pause dur="0.2"/> # in other places in the middle <pause dur="1.0"/> and therefore <pause dur="0.2"/> we can't rely at least not in such an easy way <pause dur="0.3"/> on that overall prosodic shape of the word <pause dur="1.0"/> <vocal desc="cough" iterated="n"/> <pause dur="1.7"/> on the other hand what we do have in English in a fairly powerful way <pause dur="0.5"/> is the <pause dur="0.4"/> # ability to distinguish words or pairs of words <pause dur="0.3"/> # on the basis of phonetic information <pause dur="0.8"/> # <pause dur="0.3"/> the example that everybody's heard

of and that always comes up in <pause dur="0.4"/> # <pause dur="0.4"/> early lectures on phonetics is distinctions <kinesic desc="writes on board" iterated="y" dur="9"/> like <pause dur="0.7"/> grey <pause dur="0.3"/> tape <pause dur="2.3"/> and <pause dur="0.2"/> great <pause dur="1.0"/> ape <pause dur="2.3"/> i'm sure you've all come across examples like that <pause dur="0.3"/> people have written <pause dur="0.4"/> # <pause dur="0.2"/> huge articles all based on this particular problem <pause dur="0.3"/> # in fact a lot of it was inspired by that row or dispute that <trunc>d</trunc> argument between Passy <pause dur="0.3"/> and Jones <pause dur="0.3"/> # all those years ago <pause dur="0.5"/> because once the <pause dur="0.4"/> # <pause dur="0.3"/> issue had become a theoretical problem <pause dur="0.4"/> it impelled people to start doing experiments <pause dur="1.7"/> if you don't know what it is that distinguishes grey tape and great ape <pause dur="0.4"/> # you certainly ought to be able to explain it <pause dur="0.3"/> it's not <pause dur="0.2"/> all that <pause dur="0.2"/> difficult to understand <pause dur="0.4"/> # but you may have forgotten the sort of basic phonetics that enables you <pause dur="0.4"/> to figure this out <pause dur="0.6"/> # let's just <pause dur="0.2"/> go through this particular example <pause dur="0.5"/> # <pause dur="0.8"/> <kinesic desc="writes on board" iterated="y" dur="16"/> the first point to make is of course that <pause dur="0.3"/> both <pause dur="0.7"/> of these <pause dur="0.6"/> phrases <pause dur="0.4"/> contain <pause dur="1.3"/> exactly the same segments <pause dur="1.6"/> if you actually go through phoneme by phoneme there is no difference <pause dur="1.9"/> and

yet <pause dur="0.3"/> if i say <pause dur="0.2"/> either grey tape <pause dur="0.2"/> or great ape <pause dur="1.8"/> <trunc>th</trunc> the <trunc>v</trunc> <trunc>a</trunc> ninety-nine per cent of people will <pause dur="0.2"/> successfully recognize <pause dur="0.3"/> which of the <pause dur="0.2"/> two <pause dur="0.2"/> i intended you to hear <pause dur="1.7"/> now i can <trunc>v</trunc> i can make the difference <pause dur="1.0"/> even clearer <pause dur="0.3"/> if i sort of fake it <pause dur="0.3"/> if i put a glottal stop in here before the <pause dur="0.3"/><kinesic desc="indicates point on screen" iterated="n"/> before the <pause dur="0.3"/> vowel begins in the second word and say great ape <pause dur="0.5"/> great ape like that <pause dur="0.3"/> then there is no ambiguity at all you simply couldn't interpret <pause dur="0.6"/> great ape <pause dur="0.3"/> as <pause dur="0.3"/> # a combination of grey <pause dur="0.2"/> and tape <pause dur="1.6"/> but <pause dur="0.2"/> if we take it a little more bit more naturally and say <pause dur="0.4"/> # grey tape and great ape <pause dur="0.4"/> without a glottal stop <pause dur="0.2"/> you can still hear the difference <pause dur="0.2"/> what are the phonetic factors <pause dur="0.8"/> well <pause dur="0.3"/> one of them <pause dur="0.4"/> # <pause dur="1.8"/> anybody want to tell me before i tell you <pause dur="2.9"/> this is delving back into phonetics from long ago <pause dur="1.6"/> go on <pause dur="0.2"/> you're nearly <pause dur="0.7"/> <vocal desc="laugh" n="su0860" iterated="n"/> <pause dur="0.4"/> <vocal desc="laugh" iterated="n"/> </u><pause dur="1.7"/> <u who="sf0862" trans="pause"> is it stress </u><pause dur="0.8"/> <u who="nm0858" trans="pause"> mm <pause dur="0.4"/> </u><u who="sf0862" trans="latching"> is it stress </u><pause dur="0.6"/> <u who="nm0858" trans="pause"> no the stress is identical <pause dur="0.8"/> grey tape great ape it's so it's # the second syllable is

stressed in both cases <pause dur="1.0"/> no it's <trunc>a</trunc> this is allophonic information this is # <pause dur="0.2"/> we've got the same phonemes <pause dur="0.3"/> but they have different allophones <pause dur="0.3"/> this one here is initial <pause dur="0.2"/> in the syllable <pause dur="0.2"/> in tape <pause dur="0.6"/> and so it's aspirated <pause dur="0.8"/> if you listen <pause dur="0.3"/> grey tape <pause dur="0.2"/> grey tape <pause dur="0.4"/> but if i <pause dur="0.2"/> take this one at the end of the word great it's unaspirated <pause dur="0.6"/> # and so it's pronounced great ape <pause dur="0.3"/> great ape <pause dur="0.4"/> great ape <pause dur="0.3"/> and there's no <distinct type="sampa">[t_h]</distinct> <pause dur="0.4"/> <distinct type="sampa">[t_h]</distinct> <pause dur="0.4"/> <distinct type="sampa">[t_h]</distinct> <pause dur="0.3"/> sound <pause dur="0.2"/> at the end <pause dur="0.2"/> of this one here <pause dur="0.3"/> so here <kinesic desc="writes on board" iterated="y" dur="1"/> we the the T is aspirated <pause dur="0.3"/> here <pause dur="0.3"/> the T is unaspirated <pause dur="1.7"/> there's another difference as well <pause dur="0.5"/> this <kinesic desc="indicates point on board" iterated="n"/><pause dur="0.3"/> word here great <pause dur="0.7"/> has a final fortis consonant <pause dur="0.6"/> what do final fortis consonants do to preceding vowels <pause dur="5.8"/> there's a lot of rust on that old phonetics <shift feature="voice" new="laugh"/>isn't there <shift feature="voice" new="normal"/> <pause dur="1.6"/> a <trunc>f</trunc> a final fortis consonant shortens <pause dur="0.2"/> the preceding vowel <pause dur="0.7"/> if you measure <pause dur="0.2"/> the <pause dur="0.4"/> <distinct type="sampa">[eI]</distinct> sound in great <pause dur="0.6"/> it is very much shorter <pause dur="0.3"/> than the <distinct type="sampa">[eI]</distinct> sound in grey <pause dur="0.4"/> listen to this <pause dur="0.4"/> grey tape <pause dur="0.5"/> grey tape <pause dur="0.6"/> and now this other one <pause dur="0.4"/> great ape <pause dur="0.4"/> great ape <pause dur="0.4"/> the <distinct type="sampa">[eI]</distinct> <pause dur="0.3"/> is <pause dur="0.3"/> shortened <pause dur="0.3"/>

by <pause dur="0.2"/> possibly fifty or sixty per cent <pause dur="0.3"/> it's a <trunc>ver</trunc> very striking <pause dur="0.2"/> shortening effect <pause dur="0.4"/> that is almost unique to English <pause dur="1.1"/> most languages in the world have a slight shortening effect from fortis consonants <pause dur="0.4"/> English has taken this <pause dur="0.2"/> very slight almost imperceptible difference <pause dur="0.3"/> and for some reason that we can only guess at <pause dur="0.2"/> has magnified this enormously <pause dur="1.4"/> so what we're seeing here <pause dur="0.3"/> is a case <pause dur="0.2"/> which i've labelled as one among these hypotheses <pause dur="0.3"/> that word boundaries are marked by allophonic information <pause dur="0.5"/> we are able to pick up <pause dur="0.5"/> from <pause dur="0.3"/> the allophonic detail <pause dur="0.6"/> in the phonemes <pause dur="0.3"/> where <pause dur="0.2"/> the word boundary must be <pause dur="0.5"/> given this information <pause dur="0.2"/> that in this case of grey tape <pause dur="0.3"/> you've got an aspirated initial <distinct type="sampa">[t_h]</distinct> <pause dur="0.6"/> you've got to put the word boundary <pause dur="0.2"/> before the <distinct type="sampa">[t_h]</distinct> <pause dur="0.7"/> given the information that you got a short <distinct type="sampa">[eI]</distinct> <pause dur="0.2"/> sound <pause dur="0.3"/> and an unaspirated T <pause dur="0.3"/> you are forced to put <pause dur="0.2"/> the word boundary <pause dur="0.4"/> after the <pause dur="0.2"/> T there <pause dur="0.7"/> # <pause dur="0.2"/> i mean when i say forced there is no law or <pause dur="0.5"/> # <pause dur="0.3"/> <trunc>th</trunc> or penalty

involved here <pause dur="0.3"/> but that's the way we work <pause dur="2.4"/> once this effect had been observed <pause dur="0.3"/> # it arose well # various <pause dur="0.2"/> various follow up studies were done <pause dur="1.6"/> the best known of these i'll give you the reference # to this was work done in the nineteen-sixties by O'Connor and Tooley <pause dur="0.6"/> where they got unsuspecting readers <pause dur="0.4"/> to read <pause dur="0.4"/> # <pause dur="0.5"/> rather weird sentences containing pairs like this <pause dur="0.3"/> so <pause dur="0.3"/> # <pause dur="0.2"/> things like # <pause dur="0.5"/> # i saw the grey tape out of the window and i saw the great ape <pause dur="0.2"/> out of the window <pause dur="0.5"/> people would read these they then went <pause dur="0.3"/> through the recordings with a pair of scissors <pause dur="0.3"/> and cut out just the pairs of words <pause dur="0.3"/> and played them to listeners <pause dur="0.3"/> and said can you say <pause dur="0.2"/> whether you're hearing <kinesic desc="indicates point on screen" iterated="n"/> this one grey tape <pause dur="0.4"/> or this one great ape <pause dur="0.7"/> and <pause dur="0.2"/> # what they found was # <pause dur="0.5"/> # fairly surprising <pause dur="0.5"/> when there were plosives involved particularly voiceless plosives <pause dur="0.3"/> people were very very successful <pause dur="0.4"/> in <pause dur="0.2"/> successfully <trunc>plac</trunc> very successful <pause dur="0.3"/> in placing <pause dur="0.2"/> the word boundary <pause dur="0.3"/> in the in the right place <pause dur="1.7"/> # <pause dur="0.3"/> there were <pause dur="0.5"/> many other examples that they

constructed with different types of consonants <pause dur="0.4"/> that were much less successful <pause dur="0.7"/> # <pause dur="0.4"/> and # they were # eventually forced to conclude <pause dur="0.2"/> that this <pause dur="0.6"/> business of allophonic marking of word boundaries <pause dur="0.3"/> only works in a limited number of cases <pause dur="0.7"/> but in the meantime they had a lot of fun inventing <pause dur="0.4"/> these pairs of words <pause dur="0.2"/> they're sometimes called <kinesic desc="writes on board" iterated="y" dur="3"/> juncture pairs <pause dur="0.6"/> because this <pause dur="0.8"/> word juncture is used to refer to the joining between two words <kinesic desc="writes on board" iterated="y" dur="2"/> so juncture pairs <pause dur="0.5"/> became a kind of <pause dur="0.3"/> # phoneticians' hobby <pause dur="0.4"/> # <pause dur="0.2"/> when i first started going to phonetics conferences in this country <pause dur="0.4"/> # you would often get people sitting around <pause dur="0.3"/> # over a beer in the bar after the <pause dur="0.2"/> papers were over for the day <pause dur="0.3"/> inventing things like this and seeing if they <pause dur="0.3"/> could get people to hear the difference <pause dur="0.3"/> and you get things like <pause dur="0.3"/> <kinesic desc="writes on board" iterated="y" dur="10"/> # <pause dur="1.6"/> to choose <pause dur="0.5"/> ink <pause dur="1.2"/> as opposed to <pause dur="0.3"/> to chew <pause dur="1.5"/> zinc <pause dur="1.1"/> and # what were some of the other crazy ones <pause dur="0.5"/> # <pause dur="0.4"/> yes <kinesic desc="writes on board" iterated="y" dur="7"/> more <pause dur="1.5"/> ice <pause dur="0.8"/> and more <pause dur="1.7"/> rice <pause dur="0.7"/> okay <pause dur="0.4"/> # <pause dur="0.2"/> lots of things like this <pause dur="0.3"/>

constantly <pause dur="0.2"/> thinking up <pause dur="0.4"/> pairs like that <pause dur="0.3"/> and then trying them out on listeners to see if they can hear the difference <pause dur="0.9"/> the answer is if people are deliberately trying <pause dur="0.2"/> to make it clear which of these they're intending <pause dur="0.4"/> then it can be made unambiguous <pause dur="0.3"/> but in normal speech <pause dur="0.2"/> you just can't hear the difference <pause dur="0.3"/> unless it's something which has got whacking great <pause dur="0.4"/> allophonic variations <pause dur="0.3"/> that help you <pause dur="0.3"/> like the aspiration <kinesic desc="indicates point on board" iterated="n"/> here <pause dur="0.3"/> and the prefortis <pause dur="0.3"/> shortening <pause dur="0.5"/> in that particular case <pause dur="0.6"/> so <pause dur="0.2"/> looking back at these possibilities <pause dur="0.3"/> what i'd say is that <pause dur="0.3"/> okay <pause dur="0.3"/> there are certain circumstances in which we get allophonic information at word boundaries <pause dur="0.2"/> which helps us to discriminate <pause dur="0.9"/> but not all the time <pause dur="0.3"/> and not in all languages <pause dur="2.1"/>

secondly there are prosodic factors <pause dur="0.2"/> # <pause dur="0.2"/> in <trunc>th</trunc> terms of overall shapes of words <pause dur="0.3"/> which help <pause dur="0.4"/> in some languages <pause dur="0.3"/> but that help is not very great in English <pause dur="0.8"/> and in fact until <pause dur="0.3"/> fairly recently it was believed <pause dur="0.3"/> that there was no help at all <pause dur="0.6"/> in English <pause dur="0.6"/> for <pause dur="0.3"/> # word identification <pause dur="0.3"/> based on overall prosodic shape <pause dur="2.3"/> but i mentioned briefly <pause dur="0.3"/> # a couple of weeks ago <pause dur="0.3"/> that research by Anne Cutler <pause dur="0.6"/> # <pause dur="1.1"/> <trunc>o</trunc> <trunc>o</trunc> by <trunc>look</trunc> who has looked at a very very large number of English words <pause dur="0.3"/> research by Anne Cutler and her colleagues has shown <pause dur="0.2"/> that statistically <pause dur="1.5"/> it is more likely than not <pause dur="0.5"/> that an English word of <pause dur="0.2"/> several syllables <pause dur="0.3"/> will begin <pause dur="0.2"/> with a stressed syllable <pause dur="0.8"/> statistically <pause dur="0.4"/> initial stressed syllables <pause dur="0.2"/> are the most likely in English <pause dur="3.4"/> and if you think about it you can come up with <pause dur="0.2"/> hundreds of words just off the top of your head <pause dur="0.3"/> which don't have <pause dur="0.2"/> the initial syllable stressed <pause dur="5.5"/> but even so <pause dur="0.3"/> the <pause dur="0.2"/> # <pause dur="0.6"/> figure is something like sixty-five per cent to seventy per cent in a given <pause dur="0.3"/> text <pause dur="0.3"/> the <pause dur="0.9"/>

the # <pause dur="0.9"/> the number of <pause dur="0.2"/> initial stressed words of <trunc>mor</trunc> that is of polysyllabic words <pause dur="0.4"/> the number of initial stressed words in <pause dur="0.3"/> English <pause dur="0.3"/> on average is around sixty-five per cent <pause dur="0.4"/> to even sometimes as much as seventy per cent <pause dur="0.6"/> # <pause dur="0.3"/> initial stressed <pause dur="1.4"/> and so Cutler's theory is that although it's only a weak tendency compared with <pause dur="0.3"/> languages like French and Polish and so on <pause dur="1.0"/> English speakers do <pause dur="0.3"/> to some extent rely on this as a guideline <pause dur="0.4"/> if you hear a stressed syllable <pause dur="0.5"/> your brain says <pause dur="0.2"/> this is probably the beginning of a word <pause dur="1.5"/> and of course <pause dur="0.3"/> # following on from that <pause dur="0.2"/> the <trunc>sy</trunc> the syllable before that therefore was the last syllable of the preceding word <pause dur="1.8"/> and <pause dur="0.2"/> we are at times proved wrong on that <pause dur="1.1"/> but if we stick to that simple rule <pause dur="0.3"/> we are correct <pause dur="0.2"/> more often than not <pause dur="1.0"/> # i have my doubts <pause dur="0.6"/> still about this <pause dur="0.4"/> # but # this is something which she and her <pause dur="0.3"/> coworkers <pause dur="0.3"/> # have held to for a very long time <pause dur="10.3"/><event desc="takes off transparency" iterated="n"/> let me just <event desc="turns off overhead projector" iterated="n"/> # <pause dur="0.6"/> for a moment look at that last possibility the one that we don't make use of # <pause dur="0.2"/> that we don't need to make use of <pause dur="0.2"/> phonetic information at all <pause dur="1.6"/> and i i said that this appeals to

computational linguists because if you're designing a computer to recognize words probably <pause dur="0.3"/> # you would find it a tedious <pause dur="0.3"/> # <pause dur="0.5"/> # <pause dur="0.3"/> superimposition <pause dur="0.3"/> to have prosodic information to worry about <pause dur="2.3"/> you have to assume that all the words that you know <pause dur="0.3"/> are coded in some kind of dictionary in your head <pause dur="0.8"/> that is we all have a mental lexicon <pause dur="0.5"/> opinions vary about the size of it <pause dur="0.5"/> # partly depends how <pause dur="0.4"/> # how highly educated you are <pause dur="0.3"/> and how good you are at remembering words <pause dur="0.3"/> but it can easily be somewhere around say eighty-thousand words <pause dur="0.3"/> that's quite a big dictionary <pause dur="1.0"/> we have to assume <pause dur="0.4"/> that that dictionary is coded <pause dur="0.2"/> in some kind of phonological <pause dur="0.2"/> form <pause dur="1.1"/> # that is it's <pause dur="0.3"/> <trunc>a</trunc> although we know as literate people we know the spelling of the words that we've got stored in our heads <pause dur="0.5"/> what's more important is that we know <pause dur="0.3"/> the sounds <pause dur="0.2"/> that make up <pause dur="0.3"/> the words that we have in our head <pause dur="0.4"/> you've got the word <pause dur="0.2"/> cat <pause dur="0.3"/> <trunc>e</trunc> everybody <pause dur="0.3"/> in this room has got the word cat <pause dur="0.3"/> in their mental vocabulary <pause dur="0.3"/> and that is stored <pause dur="0.3"/> in a

number of ways including the fact that it contains a <distinct type="sampa">[k]</distinct> <pause dur="0.4"/> and an <distinct type="sampa">[ae]</distinct> <pause dur="0.3"/> and a <distinct type="sampa">[t]</distinct> <pause dur="4.7"/> a fairly typical <pause dur="0.2"/> computational operation in computational linguistics <pause dur="0.4"/> is to have <pause dur="2.7"/> <kinesic desc="writes on board" iterated="y" dur="3"/> two strings of <pause dur="1.3"/> well let's say we had to have one a string of phonemes <pause dur="0.3"/> which are the input <pause dur="0.3"/> it might be the sounds which are coming in through your ears <pause dur="0.9"/> and then in your mental lexicon <pause dur="0.6"/><kinesic desc="writes on board" iterated="y" dur="5"/> you've got <pause dur="0.5"/> lots of items which are made up <pause dur="0.2"/> of <pause dur="0.8"/> phonemes where each of these little dots represents a phoneme <pause dur="0.8"/> and your job <pause dur="0.3"/> is to map <pause dur="0.6"/> the one <pause dur="0.3"/> onto the other what you've got to say is <pause dur="0.3"/> well <pause dur="0.2"/> for example <pause dur="0.3"/> # <pause dur="0.4"/> i <pause dur="0.2"/> can identify a particular phoneme here <pause dur="0.5"/> let me look through the words that i know <pause dur="0.2"/> and see if i can find any <pause dur="0.3"/> which <pause dur="0.3"/> # begin with that phoneme and you find one <pause dur="0.3"/> # supposing that's <distinct type="sampa">[k]</distinct> <pause dur="0.6"/> <kinesic desc="writes on board" iterated="y" dur="1"/> # <pause dur="0.3"/> then you might look for a word beginning with a <distinct type="sampa">[k]</distinct> <pause dur="0.2"/> like that <pause dur="0.3"/> and then you look for another match <pause dur="0.4"/><kinesic desc="writes on board" iterated="y" dur="1"/> following that <pause dur="0.2"/> and another match following that <pause dur="0.3"/> and you see <pause dur="0.2"/> if <pause dur="0.3"/> any of these patterns of sounds <pause dur="0.2"/> match up with

something in your mental <pause dur="0.3"/> lexicon <pause dur="0.8"/> and if it does <pause dur="0.4"/> you mark that down and say i think that's a whole word <pause dur="0.3"/> let's now move on <pause dur="0.3"/> and try the next one so you think well if that's a word <pause dur="0.3"/> then this should be the beginning <pause dur="0.2"/> this next dot along <pause dur="0.2"/> should be the beginning of the next word <pause dur="0.3"/> let's see if that matches any words in my mental vocabulary <pause dur="3.0"/> if you were using a a <trunc>b</trunc> a big computer and you had unlimited computer time <pause dur="0.2"/> # <pause dur="0.2"/> doing that <trunc>k</trunc> that kind of manipulation is fairly straightforward <pause dur="0.4"/> and all you need is some fairly clever mechanism which will keep cycling back every time you fail <pause dur="0.9"/> now if we go back to the example of responsibility <pause dur="0.5"/> # <pause dur="0.4"/><event desc="puts on transparency" iterated="n"/> if <pause dur="1.1"/> you had wrongly identified <pause dur="1.0"/><kinesic desc="turns on overhead projector showing transparency" iterated="n"/> # <pause dur="1.1"/> responsibility <pause dur="0.6"/> # as re <pause dur="0.5"/> and <pause dur="0.2"/> sponsor <pause dur="0.4"/> and bility <pause dur="0.6"/> sponsor would match up <pause dur="0.3"/> to one of the words in your mental lexicon <pause dur="1.0"/> but re <pause dur="0.2"/> wouldn't <pause dur="0.5"/> and <pause dur="0.3"/> bility wouldn't <pause dur="0.2"/> because re is not an English word <pause dur="0.2"/> and bility is not an English word at least not as far as i know <pause dur="0.6"/> and therefore <pause dur="0.2"/> that hypothesis <pause dur="0.2"/>

would have to be trashed <pause dur="0.4"/> you would have to <trunc>s</trunc> <pause dur="0.3"/> you would simply have to say <pause dur="0.3"/> that was a non-starter i will go back to the beginning <pause dur="0.4"/> # <pause dur="0.3"/> to the last place where i was fairly sure <pause dur="0.3"/> and <pause dur="0.2"/> start over <pause dur="0.2"/> and see if i can make a different interpretation and you might this time say <pause dur="0.3"/> maybe it's response and <pause dur="0.3"/> ability <pause dur="0.2"/> let's see if that works <pause dur="0.4"/> but then <pause dur="0.2"/> later on the syntactic information that you had <pause dur="0.3"/> would rule that out as a reasonable hypothesis <pause dur="0.2"/> so again you would trash that and say <pause dur="0.2"/> okay perhaps it's the whole word <pause dur="0.3"/> responsibility <pause dur="0.4"/> and you would match that up <pause dur="0.2"/> yep that matches up with the word in the mental lexicon <pause dur="0.3"/> and it fits the syntax <pause dur="0.2"/> and it fits the meaning <pause dur="0.3"/> that's it <pause dur="0.2"/> i'll <pause dur="0.3"/> # i'll go for that hypothesis <pause dur="1.4"/> so <pause dur="0.6"/> # <pause dur="0.5"/> a a computational linguist would like this idea of <pause dur="0.2"/> shuffling <pause dur="0.3"/> the possibilities <pause dur="0.2"/> matching patterns <pause dur="0.2"/> from the input that is the sounds that you hear <pause dur="0.4"/> to <pause dur="0.2"/> stored patterns in your brain <pause dur="0.3"/> which are the words <pause dur="0.3"/> # that you actually have stored <pause dur="0.7"/> now our brains are stupendously fast <pause dur="0.3"/> at finding

words <pause dur="0.3"/> but even so <pause dur="0.3"/> the idea <pause dur="0.2"/> that we would leave it <pause dur="0.5"/> to our brains just to work on a a a a phoneme <pause dur="0.2"/> pattern matching <pause dur="0.7"/> ignoring all this wonderfully rich information about prosody <pause dur="0.5"/> and the allophonic information <pause dur="0.3"/> is just crazy <pause dur="0.4"/> the brain <pause dur="0.2"/> would not simply ignore <pause dur="0.2"/> such a valuable source of information <pause dur="0.8"/> so it seems to me that i i would want to reject the idea <pause dur="0.3"/> that we don't use phonetic and phonological information <pause dur="0.3"/> in deciding on word boundaries <pause dur="4.8"/><event desc="takes off transparency" iterated="n"/> okay well <pause dur="0.9"/> what i want to do to finish up with is just describe an experiment <pause dur="0.6"/> # that i was working on last year <pause dur="0.4"/> # on this particular question of embedded words <pause dur="2.5"/> now embedded words are difficult things to work with <pause dur="0.5"/> because <pause dur="0.6"/> the only way we can really <pause dur="0.2"/> test peoples' ability to hear them <pause dur="0.5"/> is to cut them out of their context <pause dur="0.4"/> and when you cut a word out of context <pause dur="0.4"/> it suddenly stops <pause dur="0.4"/> sounding <pause dur="0.5"/> # <pause dur="0.3"/> recognizable and familiar <pause dur="0.9"/> # <pause dur="0.3"/> i've done lots of this and i've got a couple of examples on tape <pause dur="0.2"/> there's one that i use a lot <pause dur="0.3"/> this is an example where you

know <pause dur="0.7"/> # what i've got here is is <pause dur="0.2"/> a large number of <pause dur="0.2"/> extracted versions of one particular word <pause dur="0.3"/> which is hundred <pause dur="0.3"/> just pulled out of one of our <pause dur="0.3"/> big # computer corpora <pause dur="0.2"/> automatically <pause dur="0.4"/> by one of our research computers <pause dur="0.4"/> and since you know what the word is <pause dur="0.2"/> you can recognize the word every single time </u> <pause dur="19.6"/><event desc="starts audio" n="nm0858" iterated="n"/>

<kinesic desc="audio plays" iterated="y" dur="13"/> <event desc="stops audio" n="nm0858" iterated="n"/> <u who="nm0858" trans="pause"> it goes on for hours <pause dur="0.2"/> but this is just the you know we just set the computer loose on going through <pause dur="0.4"/> hours of speech looking for the word hundred <pause dur="0.3"/> cutting the word out and playing it out onto the tape <pause dur="0.6"/> but what we did for the <pause dur="0.4"/> perception experiment on embedded words <pause dur="0.4"/> was to take out words <pause dur="0.3"/> which were <pause dur="0.2"/> # quite identifiable to us as experimenters <pause dur="0.3"/> but when <pause dur="0.3"/> cut out without any context <pause dur="0.3"/> and without any information <pause dur="0.2"/> and presented to naive listeners <pause dur="0.2"/> were <trunc>al</trunc> <pause dur="0.2"/> very often unrecognizable <pause dur="0.5"/> now i haven't got the text for this here <pause dur="0.3"/> but what you'll hear is that you can recognize some words <pause dur="0.5"/> # these these are actually extracted from the same <pause dur="0.3"/> corpus of recordings <pause dur="0.3"/> as

those <trunc>wor</trunc> those <pause dur="0.2"/> words hundred that you just heard <pause dur="0.3"/> it's from a corpus called MARSEC corpus <pause dur="0.4"/> that # we've been working with for many years <pause dur="0.4"/> in in my group </u><pause dur="2.0"/><event desc="starts audio" n="nm0858" iterated="n"/><kinesic desc="audio plays" iterated="y" dur="50"/> <u who="nm0858" trans="pause"> this is the kind of thing that people had to listen to for our experiment to <pause dur="0.3"/> try and identify the words <pause dur="4.5"/> each one's said twice <pause dur="14.3"/> that's fairly easy <pause dur="1.3"/> that's city <pause dur="6.0"/> that's just <event desc="stops audio" iterated="n" n="nm0858"/> <pause dur="15.7"/> okay <pause dur="1.5"/> let me <pause dur="0.2"/> just explain what this # <pause dur="0.9"/><event desc="puts on transparency" iterated="n"/> experiment was trying to <kinesic desc="turns on overhead projector showing transparency" iterated="n"/> do <pause dur="1.1"/> # <pause dur="0.7"/> we started off by using this MARSEC <pause dur="0.2"/> database <pause dur="1.1"/> and we went through three stages first of all we had to select <pause dur="0.3"/> data <pause dur="0.8"/> and what were doing was looking for pairs of words <pause dur="0.8"/> in <pause dur="0.5"/> # the <pause dur="0.2"/> data that we had recorded <pause dur="0.5"/> where we could match <pause dur="0.2"/> from the same speaker <pause dur="0.7"/> a full word <pause dur="1.0"/> like it might be response <pause dur="1.3"/> and <pause dur="0.3"/> something which seemed to be the same <pause dur="0.4"/> which existed as an embedded word <pause dur="2.4"/> so that we had pairs of words <pause dur="0.3"/> # <pause dur="0.2"/> although we separated them

out in the tapes <pause dur="0.5"/> so that sometimes people were listening <pause dur="0.2"/> # i mean we heard the word just on that tape <pause dur="0.4"/> in some cases people heard the word just <pause dur="0.5"/> out of a sentence that <trunc>sa</trunc> said things like <pause dur="0.2"/> i was just going down the road <pause dur="0.3"/> or <pause dur="0.3"/> he was a just man <pause dur="0.7"/> but in some other cases <pause dur="0.2"/> from the same speaker <pause dur="0.4"/> we had the word just <pause dur="0.4"/> from <pause dur="0.4"/> # <pause dur="0.3"/> a a word like adjustment <pause dur="0.5"/> okay <pause dur="0.3"/> that's an embedded word <pause dur="0.4"/> the word just <pause dur="0.3"/> sits inside the word adjustment <pause dur="0.3"/> and we cut it out <pause dur="0.5"/> and the idea was to find <pause dur="0.3"/> by testing listeners' perception <pause dur="0.4"/> whether <pause dur="0.3"/> our <pause dur="0.2"/> listeners were more successful at hearing <pause dur="0.3"/> the embedded words <pause dur="0.3"/> or <pause dur="0.3"/> the <pause dur="0.4"/> # what we call the real words the words which genuinely had <pause dur="0.3"/> a word boundary at either <pause dur="0.2"/> side <pause dur="3.3"/> # and so we went through and we <pause dur="1.0"/> # this is work # done jointly with the <pause dur="0.6"/> # <pause dur="0.2"/> with Anne Cutler's group in the Max Planck Institute for Psycholinguistics in <pause dur="0.3"/> in Nijmegen in Holland <pause dur="0.7"/> # and we spent <pause dur="0.2"/> very very large amount of time <pause dur="0.4"/> going through extracting these pairs of examples <pause dur="0.3"/> and then recording them in random

order <pause dur="0.5"/> for for listening tests <pause dur="4.1"/><event desc="takes off transparency" iterated="n"/> now the first thing when we'd done all this <pause dur="0.3"/> was we the experimenters listened to tapes to see if we could hear the difference <pause dur="0.5"/> and we could of course we'd been working on this for years <pause dur="0.2"/> so it's not surprising <pause dur="0.3"/> that we could tell the difference between real words and embedded words <pause dur="0.7"/> # <pause dur="0.2"/> we actually did <trunc>t</trunc> a <pause dur="0.2"/> test on this <pause dur="0.4"/> # <kinesic desc="puts on transparency" iterated="n"/> as # experts <pause dur="0.4"/> # these are the <pause dur="0.4"/> statistical results and the main thing is that the <pause dur="0.4"/> # <pause dur="0.3"/> # probability value is point-zero-zero-seven-six <pause dur="0.3"/> which means that the <pause dur="0.3"/> difference between real embedded <trunc>wor</trunc> and embedded words <pause dur="0.3"/> in terms of us recognizing which was which <pause dur="0.3"/> was highly significant <pause dur="0.5"/> so <pause dur="0.3"/> we were able as the experts running the experiment <pause dur="0.3"/> we were able to distinguish between real and embedded words <pause dur="1.8"/><event desc="takes off transparency" iterated="n"/> there's nothing very surprising about that <pause dur="0.6"/> then we played <pause dur="0.4"/> these words to naive listeners <pause dur="0.4"/> who had had no previous experience of working <pause dur="0.3"/> with this kind of problem <pause dur="0.6"/> and <vocal desc="cough" iterated="n"/> we worked out scores <pause dur="0.5"/> for <pause dur="0.2"/> how many words <pause dur="0.4"/> they got <pause dur="0.5"/> correct <pause dur="6.1"/><kinesic desc="puts on transparency" iterated="n"/> # <pause dur="0.6"/> the

i won't i won't it would take too long to explain what these # success <pause dur="0.2"/> scores # <pause dur="0.4"/> were actually calculated on <pause dur="0.3"/> but we get <pause dur="0.2"/> a much higher success rate <pause dur="0.2"/> here six-point-one-five <pause dur="0.3"/> on real words <pause dur="0.2"/> compared with four-point-three-three on the embedded words <pause dur="0.3"/> and that difference there is very highly significant with a probability value <pause dur="0.3"/> of point-zero-zero-zero-four <pause dur="0.9"/> so <pause dur="0.3"/> there was no doubt at all that our <pause dur="0.3"/> listeners <pause dur="0.5"/> did better <pause dur="0.5"/> on <pause dur="0.2"/> real words <pause dur="0.2"/> rather than embedded words <pause dur="5.7"/> remember that these words were presented completely out of context <pause dur="0.3"/> and therefore our listeners had nothing to go on <pause dur="0.3"/> except what they could hear from the tape <pause dur="0.8"/> and the only conclusion you can make from that <pause dur="0.3"/> is <pause dur="0.2"/> that there is something there phonetically <pause dur="0.4"/> that enables you <pause dur="0.3"/> to <pause dur="0.2"/> tell <pause dur="0.6"/> what is a word and what is part of a word <pause dur="0.6"/> to enable you to distinguish between <pause dur="0.4"/> bits of words <pause dur="0.3"/> and <pause dur="0.2"/> whole words <pause dur="8.9"/> so we went back to the tapes and we spent a lot of time listening to them and i spent <pause dur="0.2"/> # <pause dur="0.4"/> quite a lot of time <pause dur="0.3"/> # over in Nijmegen <pause dur="0.2"/> working

through every single word <pause dur="0.4"/> doing a very detailed phonetic examination of each word <pause dur="0.5"/> and the thing that was coming out more and more clearly was <pause dur="0.2"/> that the embedded words <pause dur="0.2"/> were shorter <pause dur="0.3"/> than the corresponding real words <pause dur="2.4"/> if we look at that in <kinesic desc="changes transparency" iterated="y" dur="1"/> graphical form <pause dur="0.7"/> # <pause dur="0.3"/> what we find <pause dur="0.3"/> here <trunc>the</trunc> these these are box plots <pause dur="0.3"/> that's the scale of duration on the left hand side going from <pause dur="0.3"/> a hundred to five-hundred milliseconds <pause dur="0.4"/> # <pause dur="0.2"/> # this box covers most of the data <pause dur="0.4"/> and in the case of embedded words <pause dur="0.5"/> the <pause dur="0.3"/> # duration was <pause dur="0.4"/> <trunc>f</trunc> rather shorter <pause dur="0.4"/> than <pause dur="0.3"/> the duration of the real words it's it's not a big difference but it's enough to be <pause dur="0.3"/> statistically significant <pause dur="0.4"/> embedded words tend to be a bit shorter <pause dur="0.6"/> than <pause dur="0.2"/> the real word <pause dur="0.6"/> probably # <pause dur="0.2"/> the difference is <pause dur="0.3"/> # enough to be over the threshold of our <pause dur="0.4"/> # <pause dur="0.2"/> ability to perceive differences <pause dur="0.4"/> in durations of <pause dur="0.2"/> words and syllables <pause dur="1.4"/> there was just one final question to answer <pause dur="0.3"/> is it that just the entire body of embedded words is shorter <pause dur="0.4"/> than <pause dur="0.2"/> the whole <pause dur="0.3"/> collection of real words <pause dur="0.2"/> or

is this a genuine relationship that each individual pair of words <pause dur="0.4"/> will exhibit <pause dur="0.3"/> a greater duration for the real word <pause dur="0.3"/> and a shorter duration <pause dur="0.4"/> for <pause dur="0.3"/> the embedded word <pause dur="0.7"/><kinesic desc="changes transparency" iterated="y" dur="1"/> # so # this is # <pause dur="0.8"/> if this this is a rather messy graph but <pause dur="0.3"/> it just shows the relationship <pause dur="0.3"/> between the durations of embedded words <pause dur="0.3"/> and the durations of <pause dur="0.2"/> real words <pause dur="0.2"/> and you can see that centre line there <pause dur="0.4"/> # represents a trend <pause dur="0.5"/> # <pause dur="0.3"/> which is that the <pause dur="0.6"/> # <pause dur="0.3"/> the longer <pause dur="0.2"/> a real word is <pause dur="0.3"/> the longer <pause dur="0.2"/> an embedded word is that is they are <pause dur="0.4"/> closely related <pause dur="0.5"/> however <pause dur="0.3"/> for any given value of a real word like three-hundred here <pause dur="0.4"/> the corresponding duration of an embedded word is shorter <pause dur="0.9"/> so in # the case of all virtually all the words in our data <pause dur="0.3"/> and <pause dur="0.5"/> i mean i had to admit if you look at some of these dots they're way off that <pause dur="0.2"/> centre line <pause dur="0.3"/> there's a lot of variation <pause dur="0.4"/> but the overall trend is that <pause dur="0.3"/> for any given pair of words the embedded word <pause dur="0.2"/> will be shorter <pause dur="0.3"/> than the <pause dur="0.2"/> real word <pause dur="0.4"/> and that must be giving us the information that we need <pause dur="0.3"/> to

identify whether we're hearing a part of a word <pause dur="0.4"/> or <pause dur="0.3"/> the word as a whole <pause dur="1.9"/> that work is still <event desc="takes off transparency" iterated="n"/> going on i'm still writing it up <pause dur="0.4"/> # <pause dur="0.2"/> but recently i had to give a talk on this at a conference <pause dur="0.4"/> and # <pause dur="0.4"/> as conference organizers do they asked me to write it up <pause dur="0.4"/> to go # in a collection of papers <pause dur="0.4"/> # and since it's a very <trunc>sh</trunc> # a short paper reporting on work in progress <pause dur="0.4"/> # what i'd like to do is give you each a copy <pause dur="0.3"/> so that you can go over this at <pause dur="0.2"/> # at at more <pause dur="0.2"/> leisure <pause dur="0.6"/> so there i was # <pause dur="0.2"/> quarter of an hour before the lecture began <pause dur="0.3"/> ready to go on the photocopier <pause dur="0.3"/> when i looked at it and realized that it was an early draft which didn't have the diagrams and the statistics in <pause dur="0.5"/> # when i went back i realized it's on my computer at home not on my computer at work <pause dur="0.5"/> so i'm afraid you don't get it this morning <pause dur="0.4"/> but i will put copies in <gap reason="name" extent="1 word"/>'s office <pause dur="0.4"/> and those will be available tomorrow onwards <pause dur="0.4"/> so if you'd like a copy of the <pause dur="0.4"/> most recent paper i've written based on this research <pause dur="0.3"/> # <pause dur="0.4"/> # # and the

bibliography that goes with it <pause dur="0.3"/> # there will be enough for one each <pause dur="0.6"/> # on the other hand # if you're not interested just leave it there and i'll give it to somebody else <pause dur="1.2"/> that gets us to the end of that <pause dur="0.2"/> and also to the end of <pause dur="0.4"/> the study of <pause dur="0.4"/> the relationship between temporal factors and speech perception <pause dur="0.5"/> and <pause dur="0.3"/> i hope that the <pause dur="0.4"/> the general impression that you've got on this <pause dur="0.4"/> is that we are not simple <pause dur="0.4"/> phoneme crunchers when it comes to <pause dur="0.3"/> perceiving speech <pause dur="0.3"/> we are not simply taking in a stream of phonemes <pause dur="0.3"/> looking them up in a mental dictionary <pause dur="0.3"/> and <pause dur="0.5"/> churning out a kind of transcript <pause dur="0.5"/> what we're doing is <pause dur="0.2"/> at the same time monitoring <pause dur="0.3"/> a very rich <pause dur="0.6"/> # <pause dur="0.5"/> stream of prosodic information <pause dur="0.4"/> and in some cases

also of allophonic variation <pause dur="0.5"/> but it's the prosodic side i really want to emphasize <pause dur="0.2"/> there is so much going on in the prosody of spoken language <pause dur="0.6"/> it's giving us so much information about <pause dur="0.2"/> how to divide the speech up into units <pause dur="0.3"/> and how to interpret it <pause dur="0.4"/> and it just has to be <pause dur="0.4"/> # something of great importance <pause dur="0.4"/> # it's something which we only understand in a very dim <pause dur="0.4"/> and partial way at the moment but <pause dur="0.3"/> a lot more research will be <pause dur="0.3"/> # going on <pause dur="0.3"/> in future years <pause dur="0.2"/> and we should discover more and more about it <pause dur="0.3"/> and ultimately we can teach computers that recognize speech <pause dur="0.3"/> how to make intelligent use of that information <pause dur="1.3"/> is that okay are there any <pause dur="0.3"/> questions <pause dur="4.0"/> okay <pause dur="1.8"/> right then