CS918 Natural Language Processing

CS918 15 CATS (7.5 ECTS) Term 1

Availability

MSc Computer Science, MSc Data Analytics

NOTE - THIS MODULE IS NOT AVAILABLE TO STUDENTS IN ANY YEAR OF AN UNDERGRADUATE INTEGRATED MASTERS DEGREE

Prerequisites

None but it would be helpful to take in conjunction with CS910 and/or CS909.

Academic Aims

The aim of the module is to equip students with a fundamental understanding of automated methods for processing linguistic data in textual form (natural language processing) from different sources (newswire, web, social media, academic publications) and associated challenges. The module will also provide students with the skills to analyse textual data and familiarise them with state of the art tools and applications.

Learning Outcomes

By the end of the module the student should be able to:

Demonstrate knowledge of the fundamental principles of natural language processing.
Demonstrate understanding of methods and algorithms used to process different types of textual data as well as the challenges involved.
Demonstrate understanding of the state of the art in the core areas of Natural Language Processing as well as related applications.
Show a working knowledge of state of the art tools available for analysing linguistic data.
Demonstrate computational skills to create NLP processing pipelines using existing NLP libraries, retrain models and extend existing NLP tools.

Content

Regular expressions, word tokenization, stemming, sentence segmentation
N-grams and language models
Part-of-speech Tagging
Hidden Markov models and maximum entropy models
Semantics: lexical semantics, distributional semantics, word sense disambiguation and vector space models
Spelling correction
Text classification
Sentiment analysis
Information extraction: Named entity recognition, relation extraction
Information retrieval
Syntactic parsing
Semantic parsing
Question answering and summarisation
Text processing in social media

Books

Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall.
Bird Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly Media, Inc., 2009.
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
Ma Marie-Francine Moens, and Juanzi Li. “Mining User Generated Content and Its Applications.” In Mining User Generated Content, 3–17. Social Media and Social Computing. Chapman and Hall/CRC, 2014.

Assessment

Two-hour examination (70%), coursework (30%)

Teaching

20 one-hour lectures plus 10 one-hour seminars plus 5 two-hour workshops

Jalote P, Fault Tolerance in Distributed Systems, Prentice Hall, 1994.
Lynch N, Distributed Algorithms, Morgan Kauffman, 1996.
Gouda M, Elements of Network Protocol Design, John Wiley, 1998.

Background: development and scope of social informatics; practical goals.
Understanding individual behaviour: perception, memory and action.
Modelling human interaction with digital systems.
Design methodologies and notations.
Techniques and technologies: dialogue styles, information visualisation.
Designer-user relations: iteration, prototyping.
Evaluation: formative and summative; performance and learnability.
Mobile computing and devices: novel interfaces; ubiquitous computing.
Organisational factors: understanding the workplace; resistance; dependability.

Innovation processes at scale: social shaping of IT, actor-network theory, co-production.