IM904 Labs Week 4

Aims

By the end of this session you should be able to:

Load a network file into the Gephi program
Examine the network and comment on trends
Look at the changes in networks over time

The TCAT data set for today is the Brexit data set. You can find the data set on the Picard server. Data set and server details (including login passwords) are on this page.

Preparation

Download and install Gephi.
Read through the Gephi tutorial here.
- Note: If you follow this tutorial, note that the version of Gephi used in the tutorial (0.7) is older than the version you are probably running. Some things to keep an eye on:
  - When importing the data (slide 3), make sure that you choose the Graph Type 'Mixed' instead of the default 'Undirected'. This will make sure all of your edges have directionality associated with them.
  - The 'table result view' feature (slide 11) may not be available.
  - The 'diamond icon' for setting the size of the node (slide 16) no longer looks like a diamond, but is in the same location.
  - You should no longer have to click 'refresh' to update the modularity categories (slide 21).
Watch the below video.

Session

The session lasts for two hours and has the following structure:

Demonstration - Using Gephi to visualise a Twitter mention network.
Task 1 - Replicate the demonstration and use Gephi.
Group discussion (10 minutes).
Break (10 minutes).
Task 2.
Group discussion (10 minutes).

Part 1

The data for this task is taken from the Brexit dataset. We have chosen data collected on the 17th of January. The TCAT summary of the data is shown below:

This date was chosen because Theresa May gave an important speech on that day - you can read about the speech here.

James has uploaded some of the network analysis from TCAT below. The files are here to ensure that TCAT is not overloaded.

Mentions
Co-hashtag analysis
Both files as a zip folder are here. Download this if you are using the Safari Browser.

A video showing James going through the demonstration of the mention network is below.

Part 2

This section will give you an opportunity to explore an analysis building on Michael's lecture this week on community detection using maximal modularity. The files for this section can be downloaded here.

After reducing the complexity of the network with the following filters from the Filters tab:

Topology → Giant Component
Topology → Degree Range
(for an example of this, see the video Filtering Networks by Jen Golbeck, also below)

You will compute the modularity of the network using 'Modularity' on the Statistics panel.

Once the modularity is computed, go to Nodes → Partition in the Appearance panel and partition by 'Modularity Class'. You can also view each partition separately using the following filter from the Filters tab:

Attributes → Partition → Modularity Class

Layout algorithms worth exploring here include:

Yifan Hu
Fruchterman-Reingold

Going Further

Gephi is a very powerful tool for network analysis. The Gephi website contains a web page dedicated to learning materials - the page is here. In particular, the learning materials for Visualisation and Layouts are useful.

As mentioned by Noortje in the lecture, there are several statistics you can use to summarise a network. For example,

Gephi allows you to calculate these statistics. Docuentation for these is available here.

Workshop additional material

The following links are for content mentioned during the workshop.

An additional excellent introduction to Gephi is here.
The degreeness is outlined in this blog post.
A more advanced introduction to networks can be found here.
Gephi can be extended to include geo-location data. These features are added using plugins. Two of the geo location plugins are here and here.