Skip to main content

CS918 Natural Language Processing

CS918 15 CATS (7.5 ECTS) Term 1

Availability

MSc Computer Science, MSc Data Analytics

NOTE - THIS MODULE IS NOT AVAILABLE TO STUDENTS IN ANY YEAR OF AN UNDERGRADUATE INTEGRATED MASTERS DEGREE

Prerequisites

None but it would be helpful to take in conjunction with CS910 and/or CS909.

Academic Aims

The aim of the module is to equip students with a fundamental understanding of automated methods for processing linguistic data in textual form (natural language processing) from different sources (newswire, web, social media, academic publications) and associated challenges. The module will also provide students with the skills to analyse textual data and familiarise them with state of the art tools and applications.

Learning Outcomes

By the end of the module the student should be able to:

  • Demonstrate knowledge of the fundamental principles of natural language processing.
  • Demonstrate understanding of methods and algorithms used to process different types of textual data as well as the challenges involved.
  • Demonstrate understanding of the state of the art in the core areas of Natural Language Processing as well as related applications.
  • Show a working knowledge of state of the art tools available for analysing linguistic data.
  • Demonstrate computational skills to create NLP processing pipelines using existing NLP libraries, retrain models and extend existing NLP tools.


Content

  • Regular expressions, word tokenization, stemming, sentence segmentation

  • N-grams and language models

  • Part-of-speech Tagging

  • Hidden Markov models and maximum entropy models

  • Semantics: lexical semantics, distributional semantics, word sense disambiguation and vector space models

  • Spelling correction

  • Text classification

  • Sentiment analysis

  • Information extraction: Named entity recognition, relation extraction

  • Information retrieval

  • Syntactic parsing

  • Semantic parsing

  • Question answering and summarisation

  • Text processing in social media

Books

  • Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
  • Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall.
  • Bird Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly Media, Inc., 2009.
  • Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
  • Ma Marie-Francine Moens, and Juanzi Li. “Mining User Generated Content and Its Applications.” In Mining User Generated Content, 3–17. Social Media and Social Computing. Chapman and Hall/CRC, 2014.

Assessment

Two-hour examination (70%), coursework (30%)

Teaching

20 one-hour lectures plus 10 one-hour seminars plus 5 two-hour workshops


Jalote P, Fault Tolerance in Distributed Systems, Prentice Hall, 1994.
Lynch N, Distributed Algorithms, Morgan Kauffman, 1996.
Gouda M, Elements of Network Protocol Design, John Wiley, 1998.
  • Background: development and scope of social informatics; practical goals.
  • Understanding individual behaviour: perception, memory and action.
  • Modelling human interaction with digital systems.
  • Design methodologies and notations.
  • Techniques and technologies: dialogue styles, information visualisation.
  • Designer-user relations: iteration, prototyping.
  • Evaluation: formative and summative; performance and learnability.
  • Mobile computing and devices: novel interfaces; ubiquitous computing.
  • Organisational factors: understanding the workplace; resistance; dependability.
Innovation processes at scale: social shaping of IT, actor-network theory, co-production.