An inherent component in data science and artificial intelligence systems is the ability to process, engineer and manage the flow of data from source systems into the algorithms. In many cases, the efficiency and efficacy of these pipelines are of equal importance to the success of these systems as the algorithms themselves. This module, using the industry-standard Python language, aims to provide students the necessary skills and competencies to implement efficient and reliable code, and employ best practices in data management, algorithm development and computational statistics.
This module aims to introduce students to many of the advanced statistical and data engineering techniques made possible by innovations in computing and modern processing power. This includes:
- dimension reduction
- feature engineering
- natural language processing
- high performance computing
- analysis of algorithms and computational complexity.
Upon successful completion participants will be able to:
- Write original, non-trivial Python applications and algorithms.
- Develop a sound understanding of current, modern data engineering approaches and their application to a variety of datasets.
- Automate dimension reduction and clustering techniques on a variety of complex datasets and critically evaluate the results.
- Evaluate and optimise algorithms for better computational performance.
Programming with Python
- Programming Fundamentals
- Introduction to Python
- Data structures
Pandas and data management
- Data cleaning
- Joining and merging datasets
- Feature engineering
Computational complexity and analysis of algorithms
- Big O notation
- Best practices for programming
Natural language processing
- Working with text data
- Sentiment analysis
- Topic models
Clustering and Dimension Reduction
- Dimension reduction
- Python programming test (20%)
- Data Engineering Presentation (10%)
- 4,000 words Post Module Assignment (70%)
1 week including 15 hours of lectures and 15 hours of practical classes