Skip to main content Skip to navigation

BAWE (British Academic Written English) and BAWE Plus Collections

Overview of BAWEesrclogo.jpg

The British Academic Written English (BAWE) corpus was created through a project entitled 'An investigation of genres of assessed writing in British Higher Education' from 2004 – 2007. This project was funded by the Economic and Social Research Council (Project number RES-000-23-0800) and was a collaboration between the Universities of Warwick, Reading and Oxford Brookes.

The BAWE corpus contains 2761 pieces of proficient assessed student writing, ranging in length from about 500 words to about 5000 words. Holdings are fairly evenly distributed across four broad disciplinary areas (Arts and Humanities, Social Sciences, Life Sciences and Physical Sciences) and across four levels of study (undergraduate and taught masters level). Thirty-five disciplines are represented.

The assignments have been annotated using a system devised in accordance with the TEI guidelines. The header for each file includes factual information such as gender and year of birth and also contains some research findings from the initial team such as genre family. There is a dtd file which must be kept in the same folder as the corpus files, named tei_bawe.dtd and the holdings are described in an Excel spreadsheet 'BAWE.xls'. The transcription and mark-up conventions are described in the BAWE manual document, which is in PDF format.

The corpus is available free of charge to non-commercial researchers who agree to the conditions of use and who register with the Oxford Text Archive. The BAWE corpus can be accessed through the Oxford Text Archive ( as resource number 2539. It includes text files, a spreadsheet with contextual information, and a corpus manual.

One of the original Principal Investigators, Professor Hilary Nesi of Coventry University, manages a useful website about BAWE and has a database of research articles based on the corpus - please contact her to add your research to the list.

For more information about the BAWE corpus holdings at Warwick, please email

 Overview of BAWE Plus

BAWE Plus is a collection of resources for research into academic written English in the UK in the twenty-first century. In addition to the BAWE corpus, it includes the following main components:

(Viewlet) supplementary bawe data

BAWE PDF files
Applied Linguistics at Warwick holds PDF files of the assignments which make up the BAWE corpus. These may be useful to researchers who wish to examine assignments in their original layout.
The BAWE Pilot Corpus
A pilot for the British Academic Written English (BAWE) corpus was created in 2001, with support from the University of Warwick Teaching Development Fund. The pilot corpus contains about one million words of text, in the form of 500 student assignments ranging from 1,000 to 5,000 words. The collection is held at the Applied Linguistics and is not a part of the BAWE corpus submitted to the OTA.
Details of the pilot corpus are reported in: Nesi, H., Sharpling, G. & Ganobcsik-Williams, L. (2004) "Student papers across the curriculum: Designing and developing a corpus of British student writing". Computers and Composition Volume 21, Issue 4, pp 401-503.
Tutor interviews
As part of the 2004- 2007 BAWE project, Tutors from contributing departments were interviewed in order to find out more about departmental practice. Notes and transcripts from these interviews are held in the department of Applied Linguistics.


(Viewlet) the welt pilot corpus

This is a collection of written answers at grade B and above from the former Warwick English Language Test (now no longer used).


Other Academic English Resources

Applied Linguistics at Warwick also holds a collection (a corpus and associated resources) in British Academic Spoken English. See BASE Plus.


We welcome proposals from potential doctoral students and other researchers interested in working with these resources.

To contact us, please email