CS347-15 Fault Tolerant Systems
The module concentrates on the principles and technologies that can be applied in the design, development and measurement of fault tolerance under varied assumptions. You will have the opportunity to analyse, design and write software based on state-of-the-art approaches in dependable systems.
The aim of the module is to provide you with a knowledge of advanced issues and concepts in the design, implementation and evaluation of fault-tolerant systems.
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
General: Fault, error, failure, fault transformation process. Implications of coverage on dependability, specifications, methods to achieve dependability.
Middleware: Protocols for synchronous distributed systems (leader election, consensus, clock synchronisation, Byzantine agreement and FDIR).
Protocols and abstractions for asynchronous distributed systems, including logical and vector clocks, broadcast (best-effort, unordered reliable, ordered reliable), failure detectors, global predicate detection in fault-free and faulty systems.
By the end of the module, students should be able to:
- General: Understand dependability attributes, threats and means. Understand the differences between fault, error and failure. Discuss the process by which a fault eventually causes a system failure. Understand the link between fault model and the corresponding dependability mechanisms. Introduction of terms such as fail-safe, fail-operational, fail-stop, etc. Concepts such as fault tree, FMECA, FMEA, etc.
- HW/System: Calculate reliability of a system. Use of tools for reliability modelling. Design of dependable HW.
- Middleware: Understand critical functions such as clock synchronisation, consensus, FDIR protocols, etc. Understand Byzantine failures and its impact on system complexity. Introduction to asynchronous message-passing distributed systems.
- SW: Understand the various methods for SW fault tolerance. NVP, recovery blocks, run-time checks, problem of predicate detection.
Indicative reading list
Please see Talis Aspire link for most up to date list.
Students are required to based on their project on a scientific research paper. Students will position their project in the group report by incorporating a literature review.
Subject specific skills
Application and systems programming.
Software development processes.
Systems analysis and design.
Technical - Expertise in the analysis and design, operation of dependable computer systems. An understanding of the hardware and software mechanisms that facility the development of dependable computer systems, including the ability to implement these mechanisms.
Communication - Lecture listening. Technical report writing. Technical document comprehension and analysis. Documenting software solutions. Research paper reading. Presentation skills.
Critical Thinking - Systems analysis and technical problem solving. Quantitative performance analysis based. Research project / paper critique.
Multitasking - Management of competing deadlines and priorities. Management of parallel project activities.
Teamwork - Working as part of a technical team in contributing to the development and documentation of a solution.
Creativity - Developing an original solution to a research-based problem.
Leadership - Combining teamwork, critical thinking and technical understanding in the development of a software solution.
|Lectures||20 sessions of 1 hour (13%)|
|Private study||130 hours (87%)|
Private study description
N. Lynch, Distributed Algorithms (1st Edition), Morgan Kaufmann, April 1996.
Reading, programming, systems design, team meetings and project management.
Dependability Concepts: Fault, error, failure, fault transformation process. Implications of coverage on dependability, specifications, methods to achieve dependability.
Software: Understand the various methods for SW fault tolerance. NVP, recovery blocks, run-time checks, problem of predicate detection.
Middleware: Protocols for synchronous distributed systems, including leader election, consensus, clock synchronisation, Byzantine agreement and FDIR.
Hardware: Deign and analysis of dependable hardware.
Synchronous and asynchronous systems: Protocols and abstractions for asynchronous systems, including logical and vector clocks, broadcast (best-effort, unordered reliable, ordered reliable), failure detectors, global predicate detection in fault-free and faulty systems
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Students can register for this module without taking any assessment.
Assessment group D1
Having determined a mark for the group submission, credit will be split between group members according to the information you provide on a contribution form.
2 hour examination in final chronological term.
~Platforms - AEP
Assessment group R
|CS347 resit exam||100%|
CS347 resit exam
~Platforms - AEP
Feedback on assessment
Written feedback on coursework
Verbal feedback in lectures
This module is Optional for:
- Year 3 of UCSA-G4G1 Undergraduate Discrete Mathematics
- Year 3 of UCSA-G4G3 Undergraduate Discrete Mathematics
This module is Option list A for:
- Year 3 of UCSA-G400 BSc Computing Systems
- Year 4 of UCSA-G401 BSc Computing Systems (Intercalated Year)
- Year 4 of UCSA-G504 MEng Computer Science (with intercalated year)
- Year 3 of UCSA-G402 MEng Computing Systems
- Year 4 of UCSA-G403 MEng Computing Systems (Intercalated Year)
- Year 3 of UCSA-G500 Undergraduate Computer Science
- Year 4 of UCSA-G502 Undergraduate Computer Science (with Intercalated Year)
- Year 3 of UCSA-G503 Undergraduate Computer Science MEng
This module is Option list B for:
- Year 3 of UCSA-GN51 Undergraduate Computer and Business Studies
- Year 4 of UCSA-GN5A Undergraduate Computer and Business Studies (with Intercalated Year)
- Year 3 of USTA-G302 Undergraduate Data Science
- Year 3 of USTA-G304 Undergraduate Data Science (MSci)
- Year 4 of USTA-G303 Undergraduate Data Science (with Intercalated Year)