CS918 Natural Language Processing
CS918 15 CATS (7.5 ECTS) Term 1
Availability
MSc Computer Science, MSc Data Analytics
NOTE - THIS MODULE IS NOT AVAILABLE TO STUDENTS IN ANY YEAR OF AN UNDERGRADUATE INTEGRATED MASTERS DEGREE
Prerequisites
None but it would be helpful to take in conjunction with CS910 and/or CS909.
Academic Aims
The aim of the module is to equip students with a fundamental understanding of automated methods for processing linguistic data in textual form (natural language processing) from different sources (newswire, web, social media, academic publications) and associated challenges. The module will also provide students with the skills to analyse textual data and familiarise them with state of the art tools and applications.
Learning Outcomes
By the end of the module the student should be able to:
- Demonstrate knowledge of the fundamental principles of natural language processing.
- Demonstrate understanding of methods and algorithms used to process different types of textual data as well as the challenges involved.
- Demonstrate understanding of the state of the art in the core areas of Natural Language Processing as well as related applications.
- Show a working knowledge of state of the art tools available for analysing linguistic data.
- Demonstrate computational skills to create NLP processing pipelines using existing NLP libraries, retrain models and extend existing NLP tools.
Content
-
Regular expressions, word tokenization, stemming, sentence segmentation
-
N-grams and language models
-
Part-of-speech Tagging
-
Hidden Markov models and maximum entropy models
-
Semantics: lexical semantics, distributional semantics, word sense disambiguation and vector space models
-
Spelling correction
-
Text classification
-
Sentiment analysis
-
Information extraction: Named entity recognition, relation extraction
-
Information retrieval
-
Syntactic parsing
-
Semantic parsing
-
Question answering and summarisation
-
Text processing in social media
Books
- Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
- Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall.
- Bird Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly Media, Inc., 2009.
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
- Ma Marie-Francine Moens, and Juanzi Li. “Mining User Generated Content and Its Applications.” In Mining User Generated Content, 3–17. Social Media and Social Computing. Chapman and Hall/CRC, 2014.
Assessment
Two-hour examination (70%), coursework (30%)
Teaching
20 one-hour lectures plus 10 one-hour seminars plus 5 two-hour workshops