BAWE - Frequently Asked Questions (FAQs)

What are the main aims of the project?esrclogo.jpg

Our main aims are to collect, encode and analyze a large sample of proficient student writing from a wide variety of academic disciplines at three different British universities, namely Warwick, Reading and Oxford Brookes. This will give a snapshot of the kinds of writing task being carried out at British universities in the early years of the 21st century, which will be a valuable resource for researchers and educators in several disciplines. See also our Project Overview page.

How can I get hold of these texts?

The corpus will eventually be deposited with the Oxford Text Archive and the ESRC's UK Data Archive. However, the full corpus is not planned to be ready for distribution until the 2007 (the Year of the Boar). In the mean time, bona fide researchers, including postgraduate researchers, should contact the Principle Investigator at for details of availability. It may be possible to make subsets of the corpus available to non-commercial researchers before 2007.

Is the BAWE Corpus intended to help reduce plagiarism?

No. The texts collected will be submitted to the JISC plagiarism-detection service to ensure that they are original works, but the project is not directly concerned with detecting plagiarism. Rather, the resulting corpus will allow us, and other scholars, to investigate the linguistic and organizational characteristics of successful student writing. Knowing more about the types of written work being assessed in British universities should eventually feed into better teaching practices, and thus indirectly reduce the motivation for plagiarism, but that is not our main purpose.

Will the BAWE corpus be used in grading future assignments?

Not directly. It is not our intention to develop automatic or semi-automatic grading systems. Automated assessment of free-form natural-language assignments is a long way off. Ultimately the knowledge derived from analyzing the BAWE corpus may have a part to play in developing such systems, but it is by no means a prime concern of this project.

How large will the final corpus be?

In our pilot study we collected approximately 500 texts, totalling approximately 1.5 million words. The final corpus will contain more than 3000 texts, ranging in length from 500 to 10000 words. Thus the full corpus is expected to comprise almost 9 million words.