Skip to main content

BASE Files

The BASE text files (transcribed and tagged) are available from this site. However the video and audio recordings are part of the BASE Plus collection. For enquiries about accesss to the BASE Plus collection please e-mail appling at warwick dot ac dot uk

(Viewlet) TXT FILES

Holdings were distributed across four broad disciplinary groups, each represented by 40 lectures and 10 seminars. These groups are:

Arts and Humanities  transcripts 
Life an d Medical Sciences transcripts
Physical Sciences  transcripts
Social Sciences   transcripts

(Viewlet) XML Files

BASE recordings were transcribed and tagged using a system devised in accordance with the TEI Guidelines . The marked up transcripts of the BASE corpus are also available as XML files, in zipped folders. To download the data, click on one of the following links which will enable you to either open or save a zipped folder containing the XML files of all lectures and seminars for one of the academic divisions in the corpus. In addition to the files, the BASE DTD is included in the folder and it must always be present in the same folder as any of the XML files that is viewed. File names are made up of five letters and three digits, in which the first two letters indicate the disciplinary group, the next three indicate whether the file is a transcript of a lecture (lct) or a seminar (sem), and the digits are unique identifiers.

ah [Arts and Humanities] XML files
ls [Life and Medical Sciences] XML files
ps [Physical Sciences] XML files
ss [Social Sciences] XML files

(Viewlet) Freely available documentation


(Viewlet) BASE corpus analysis interface:

BASE-in-Sketch-Engine can be used as a query tool for analysis of original BASE lecture transcripts.

The lecture portion of the BASE corpus can be accessed through the corpus analysis interface, Sketch Engine. All 160 lectures are included, with 40 for each general disciplinary domain. This interface allows the user to view concordance lines, form complex queries, collect word frequency data (including word lists) and more. The service requires a subscription - for details, visit the Sketch Engine website at . The service can be obtained initially on a 30 day trial subscription with full access to all resources.


(Viewlet) TERMS OF USE OF the original BASE corpus AND HOW TO CITE IT:

The British Academic Spoken English (BASE) corpus is available to non-commercial researchers who agree to the following conditions:

  1. Corpus holdings should not be reproduced in full for a wider audience/readership (ie for publication or for teaching purposes), although researchers are free to quote short passages of text up to 100 running words, with a total of 200 running words from any given text
  2. No part of the corpus holdings should be reproduced in teaching materials intended for publication (in print or via the internet)
  3. The corpus developers should be informed of all presentations and publications arising from analysis of the corpus

Researchers must acknowledge their use of the BASE corpus project using the following form of words:

The transcriptions used in this study come from the British Academic Spoken English (BASE) corpus project. The corpus was developed at the Universities of Warwick and Reading under the directorship of Hilary Nesi and Paul Thompson. Corpus development was assisted by funding from BALEAP, EURALEX, the British Academy and the Arts and Humanities Research Council.

Companies who wish to use BASE for research and/or commercial purposes, should contact the BASE plus team: