AlphaFold
AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence.
It can be run locally using the Avon HPC cluster, available via the Scientific Computing RTP, and there are also options running AlphaFold (with limitations) in the Cloud or on your laptop (see bottom of this page).
You will first need an SCRTP Linux Desktop account, and then request access to Avon, where AlphaFold is installed.
We are happy to help you with running AlphaFold, so in case of any questions regarding the instructions below, just come to one of our clinics or contact us, and we can get you started with using command line and high-performance computing, or address any sticking points.
Instructions for use:
AlphaFold computations can be run by creating a script and submitting it to the job manager, like so:
### Create a folder for data and results:
mkdir ~/alphafold_data
cd ~/alphafold_data
### Upload your input FASTA file(s) using scp, or curl them from EBI, e.g. for ubiquitin:
curl -F "entity=1" https://www.ebi.ac.uk/pdbe/entry/pdb/3h7p/fasta > ubiquitin.fasta
### Create a SLURM submission script for the AlphaFold job:
echo $'#!/bin/sh
#SBATCH --job-name=ubiq
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --ntasks=1
#SBATCH --gres=gpu:quadro_rtx_6000:1
#SBATCH --time=48:00:00
#SBATCH --mem=40gb
### Clear modules and load Alphafold:
module purge
module load GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5
module load AlphaFold/2.1.1
### Environment variable required by Alphafold, pointing to the shared database locations:
export ALPHAFOLD_DATA_DIR=/home/shared/alphafolddb
### Execution command for the job:
srun alphafold --fasta_paths='$PWD'/ubiquitin.fasta --output_dir='$PWD'/ubi --max_template_date=2022-03-03
' > alphafold_ubiq.sb
### Submit the job to SLURM:
sbatch alphafold_ubiq.sb
### Check job status in the queue:
squeue -u $(whoami)
### Check the job STDOUT and STDERR outputs:
less slurm-*.out
For longer sequences, the SBATCH header needs to request more memory usage.
For example HCV1 which is ~3000aa:
#!/bin/sh
#SBATCH --job-name=HCV1
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --cpus-per-task=48
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=3700
#SBATCH --gres=gpu:quadro_rtx_6000:1
#SBATCH --time=48:00:00
A note on running AlphaFold Multimer:
The default model is monomer, but as it is Alphafold2 that is installed on Avon, to run Alphafold-Multimer, simply add the --model_preset=multimer flag to the command line, and supply it with a multi-sequence FASTA file as input, rather than a single sequence.
A note on results and visualisation:
If the job runs successfully, it will create a results directory, (specified by the --output_dir parameter), which will contain several results files.
https://github.com/deepmind/alphafold explains the contents of each output file.
It will produce *.pkl files which can be visualised with Python scripts, and *.pdb files (Protein Data Bank) which can be loaded into any software that can read that file type and display the predicted structure.
You can use a standalone tool like PyMol to view the *.pdb files
https://pymol.org/2/
Or via an interactive online tool e.g. https://www.rcsb.org/3d-view, https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html
A note on other AlphaFold options, if HPC is not required:
For quick analyses of monomers of limited size it may be easier but sufficiently accurate to use more lightweight options, such as those below:
1. AlphaFold Colab - an easy-to-use Notebook based environment for fast and convenient protein structure predictions.
https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb
2. gget alphafold - Python and command-line implementation of a simplified version of AlphaFold2
https://github.com/pachterlab/gget
pip install gget # only install gget once
gget setup alphafold # setup only needs to be run once
gget alphafold MSKGEELFTGVVPILVELD... # replace with your query sequence