CS918 15 CATS (7.5 ECTS) Term 1
MSc Computer Science, MSc Data Analytics
NOTE - THIS MODULE IS NOT AVAILABLE TO STUDENTS IN ANY YEAR OF AN UNDERGRADUATE INTEGRATED MASTERS DEGREE
None but it would be helpful to take in conjunction with CS910 and/or CS909.
The aim of the module is to equip students with a fundamental understanding of automated methods for processing linguistic data in textual form (natural language processing) from different sources (newswire, web, social media, academic publications) and associated challenges. The module will also provide students with the skills to analyse textual data and familiarise them with state of the art tools and applications.
By the end of the module the student should be able to:
- Demonstrate knowledge of the fundamental principles of natural language processing.
- Demonstrate understanding of methods and algorithms used to process different types of textual data as well as the challenges involved.
- Demonstrate understanding of the state of the art in the core areas of Natural Language Processing as well as related applications.
- Show a working knowledge of state of the art tools available for analysing linguistic data.
- Demonstrate computational skills to create NLP processing pipelines using existing NLP libraries, retrain models and extend existing NLP tools.
Regular expressions, word tokenization, stemming, sentence segmentation
N-grams and language models
Hidden Markov models and maximum entropy models
Semantics: lexical semantics, distributional semantics, word sense disambiguation and vector space models
Information extraction: Named entity recognition, relation extraction
Question answering and summarisation
Text processing in social media
- Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
- Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall.
- Bird Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly Media, Inc., 2009.
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
- Ma Marie-Francine Moens, and Juanzi Li. “Mining User Generated Content and Its Applications.” In Mining User Generated Content, 3–17. Social Media and Social Computing. Chapman and Hall/CRC, 2014.
Two-hour examination (70%), coursework (30%)
20 one-hour lectures plus 10 one-hour seminars plus 5 two-hour workshops