# Code snippets

## Simple Hierarchical clustering in Python 2.7 using SciPy

I've found that there's not a lot of useful information on how to do Hierarchical clustering in SciPy, which is rather easy. First, you need to organise your data as an array with each column being a dimension, and each row being an observation. Here's an example with nine observations each with three dimensions.

```data  = [[0.1,0.1,0.1],
[0.1,0.1,0.1],
[0.1,0.1,0.1],
[0.2,0.2,0.2],
[0.2,0.2,0.2],
[0.2,0.2,0.2],
[0.3,0.3,0.3],
[0.3,0.3,0.3],
[0.3,0.3,0.3],]
```

We need to create a distance matrix (calculate the distance between each pair of observations). I'm using the default (euclidian) distance metric (the SciPy documentation for spatial.distance.pdist gives more information on difference distance metrics you can use).

```from scipy import spatial
distance = spatial.distance.pdist(data)
```

Next, we need to calculate the linkage; the SciPy documentation has information on other built-in methods. I'm using the fastcluster package to speed things up (it's a drop in replacement for SciPy's cluster module).

```import fastcluster
```

linkage is a list containing the instructions to merge clusters together starting with each observation being its own cluster and ending in everything being one cluster. There's a plot.dendrogram method which will plot this for you, but if we wanted to get the members when there are n clusters (let's say that we want 3 in this case) then you have to do the following.

`# We now iterate over the linkage object, merging clusters together until there are clusternum clusters left.`
```clusternum = 3
clustdict = {i:[i] for i in xrange(len(linkage)+1)}
```print clustdict