BAWE in Sketch Engine
The BAWE corpus can be accessed through the corpus analysis interface, Sketch Engine http://www.sketchengine.co.uk/open/. This allows the user to view concordance lines, form complex queries, collect word frequency data (including word lists) and more.
The version of the corpus in Sketch Engine has been prepared by Paul Thompson and Alois Heuboeck at Reading http://www.reading.ac.uk/internal/appling/bawe/sketch_engine_bawe.htm. The files have been tagged by Paul Rayson at Lancaster University for POS (CLAWS tagset; see http://ucrel.lancs.ac.uk/claws7tags.html ) and for semantic category (see http://ucrel.lancs.ac.uk/usas/ ) using WMatrix. The Sketch Engine website http://www.sketchengine.co.uk/ describes query options for this version, as some of the BAWE markup has been modified.
BAWE contains 6,506,995 running words. Note that in SketchEngine the total number of tokens is reported as 8,336,262. This is because the SketchEngine token counts include punctuation.
Other Search Interfaces
A prototype interface that allows filtered searching of the BAWE corpus files is available.