Proposal Form- Noortje Corpus Classification

Proposal form

Last updated: 18th July 2018

Updates

Date

Update

18th July 2018

No response from Noortje

19th July 2018

Noortje responsed and proposal edited

Project name

Lexical Tool (Corpus classification)

Date started

4 July 2018

Criteria

Create a tool with a web based interface. The tool should identify the correlation of two types of categories within a textual corpus. The category types and the categories are defined by the user, who provides indicator terms and phrases for each category ("the lexicon"), according to a template. Indicator terms and phrases are derived from the corpus, or a random sample thereof.

A template for listing categories and queries is made available via the web interface (including instructions of use, e.g. on the use of quotation marks, phrases, root words). Output should be readable by the user, and allow both iterative improvement of the lexicon and network-based visualisation. Code for the tool should be export-able to different contexts

Terms

Category type

Proposal

Write Python function to process input and generate output
Flask front end written to execute the analysis via a web interface
Source available via gitlab.cim.warwick.ac.uk
Hosted in internal server temporarily
ITS hosting for flask if possible

Timeline
Functions to process and summarise data written in 1 month
Web interface written within two months

Type

Software

Languages

Python - Flask, base libraries
HTML, CSS + JS

Input

CSV file - input data. Headers: a, b

CSV file - criterial description. Headers: Category-type, category, query-term, search-column

Output

CSV file - co-occurence table. Headers: Category-type category, Category-type, category, co-occurence-frequency

CSV file - word frequency table. Headers: Category-type, category, query-term, frequency

GDF file - Category co-occurence. Nodes: Categories by type. Edges: co-occurence of categories

Notes

Word Raw frequency to show how many words in absolute per category (visualisation). Perhaps text (occurence)