Big Data Technology & Visualisation
Introduction
The management of an organisation's data lifecycle is an essential activity in modern business. In recent years, the advent of cloud computing and the emergence of big data, has fundamentally challenged and changed these processes. This module will explore these changes, the challenges and opportunities they bring, and give students practical exposure to the use of these tools.
The full data management lifecycle will be covered in this module, from data acquisition, data storage, data cleaning and engineering, data analysis tools through to data visualisation. These techniques will be implemented using the latest, cutting-edge tools made available in modern, cloud environments. This includes a combination of relational and NoSQL data stores, populated by data extracted from source APIs and open data sources. These data stores will be connected to dashboards and visualisations that can communicate the value and insights of the data via web-based applications. Participants will engage in a final, capstone project which applies these methods to a real-world setting.
Objectives
Upon successful completion participants will be able to:
- Demonstrate an comprehensive understanding of the key differences between Big Data technologies and analysis methods and traditional approaches.
- Evaluate real-world scenarios and determine appropriate database solutions (traditional and NoSQL)
- Demonstrate a comprehensive understanding of cloud data architectures, the operational risks associated with them, and develop appropriate mitigation strategies
- Demonstrate an comprehensive understanding of the core concepts of visual communication and data visualisation.
- Practically implement data pipelines and processing in a cloud setting
Syllabus
- Introduction to AWS
- AWS Glue
- Step functions and AWS Lambda
- Working with APIs
- Web crawlers
- Open data
- RDBMs and NoSQL databases
- Building a data store
- Querying and processing data from a database
- Hadoop and MapReduce
- Apache Spark
- Lambda architecture
- Analysis software
- Operationalisation
- Visualisation software
- Interactive data visualisation
- Dashboards
Assessment
- Big Data Architecture Presentation (30%)
- 4,000 words Post Module Assignment (70%)
Duration
2 weeks including 21 hours of lectures, 9 hours of seminars and 15 hours of supervised practical classes