Synching your Mac or Linux laptop with CSC using rsync
rsync is a utility to keep directories in sync. Here I'll show you how to use ssh, rsync and cron to set up your mac/laptop to automatically backup to your /home/space/ storage. Through the use of hard links, rsync can be used to save snapshots of our directory, without having to save multiple copies of the same file. This works much in the same way as Apple's Time Machine, and will work with on both Mac and Linux machines.
Setting Up SSH Keys
We are going to create a script that will backup our working directory to the CSC storage. To keep us from having to enter a password to login to godzilla we need to set up our SSH key.
Firstly you need to generate your public/private keys. Run this on your local machine:
ssh-keygen -t rsa -b 4096
Just press enter at the prompts and do not create a passphrase. This will generate your public key in ~/.ssh/id_rsa.pub.
you need to paste the contents of this file to into ~/.ssh/authorized_keys on godzilla.
Once you have done this, logout of godzilla and log back in. This time you should not have to enter your password.
While we are here, we will create a directory in which our backups will be stored. This can be wherever you like but for the purposes of this guide we will use /home/space/phrXXX/backups. So on godzilla we run:
mkdir backups
Our Rsync Script
Paste this script into a file called rsync_to_csc.sh on your local machine:
#/bin/bash
USERNAME=phrXXX
HOST=godzilla.csc.warwick.ac.uk
DATE=`date "+%Y-%m-%dT%H:%M:%S"`
BACKUPPATH=/home/space/$USERNAME/backups # on Godzilla, make sure it exists
SOURCEPATH=/Users/$USERNAME/Documents # on your local Machine
echo "Running rsync backup..."
rsync -azP --link-dest=$BACKUPPATH/current $SOURCEPATH $HOST:$BACKUPPATH/backup-$DATE
ssh $HOST "rm -f $BACKUPPATH/current && ln -s backup-$DATE $BACKUPPATH/current"
You will need to replace USERNAME with your own username, The SOURCEPATH is the directory on your local machine that you want to back up to the CSC storage. In my case it is my Documents folder on Mac OS X. The BACKUPPATH is the directory on godzilla where you want to store your backups. This directory must exist.
The final line of this script ensures that the /current directory always points to the most recent backup.
Make sure this script is executable by doing:
chmod +x rsync_to_csc.sh
We can now run this script to backup our /Documents directory to our CSC storage:
./rsync_to_csc.sh
This first backup make take a while depending on how much data you have. Of course you need to ensure that you have enough space free on your CSC storage.
You can run this script whenever you want to make a backup, but what if you want your backups to occur automatically, every hour for example?
Automatic Backups - cron
cron is a unix utility that allows you to run programs/scripts at regular intervals. To make our rsync script run every hour we need to add it to crontab. On the local machine do:
crontab -e
This will bring up a text editor for editing the crontab. The default is vi, if you'd prefer another editor do for example export EDITOR=emacs first.
More detailed instructions on cron scheduling is avaiable here. Basically the first 5 arguments specify the Minute(0-59), Hour(0-23), Day of the Month(1-31), Month(1-12) and day of the week(0-6,Sunday=0) respectively. To run our script on the hour every hour, add:
0 * * * * /Path/To/rsync_to_csc.sh
MAILTO=""
Save and exit. It should output crontab: installing new crontab. The MAILTO="" just stops cron from sending you mail every time it runs the script. If you want to recieve mails from cron remove this line.
You should now be set up for automatic backups. All you need to do now is ensure that you don't run out of room in your CSC storage. Check your CSC storage by running quota -Qs on godzilla. It's better to run code outside of the backup directory then move it into it. This way you don't end up backing up a lot of useless data. You can exclude directories using the --exclude directory argument to rsync.