Skip to main content

Statistics Seminar 10: Big Data & Large-scale Data Analysis

Files Included in this Resource:

Video recording of the speakers presentation

Further Details:

Further talks and resources in the Statistics Seminar Series can be found in the Resource Bank, or visit the Statistics Seminar Series webpages.

The seminar presentation files accompanying this talk:

Andrew Mead's Introduction (Powerpoint Presentation)

Simon Spencer's talk (PDF Document)


Resource Description:

Modern technologies have made it possible and easy for us to collect vast quantities of data in a wide range of application areas. These include a range of 'omics (genomics, proteomics, metabolomics, …) data from high-throughput technologies, environmental data from remote sensing methods and satellites, meteorological data, shopping purchase data collected by supermarkets, and even data associated with the use of these new technologies (Twitter messages, etc.). The quantities of data generated often mean that conventional summary and analysis approaches are no longer practical, and a range of new techniques are being developed (including data mining).


Further Details:

Discussions at the end of the seminar included the different 'computing clusters' that are used across the university. If IT Services can't deal with a request, they refer people in the first instance to the Centre for Scientific Computing, and the High Performance Computing facilities that they provide:

Open source tools for grappling with non-traditional data challenges that may be of interest:

Gephi for graph/network analysis -

Slides of CIM session:

Mondrian for an interactive statistical data-visulaisation system that can handle large data:

These tools have been applied to twitter data (as an example that surfaced in the session) in this article:

Source: The seminar introduction was given by Andrew Mead (Teaching Fellow in Life Sciences), followed by the main talk by Simon Spencer (Statistics)

Date: 24th October 2013

We would like to hear any comments about this resource, including top tips of how you have used the material to support your research development. Please post your comments below: