Peter's Python Programming Pages
These pages are no longer being updated, and unless otherwise stated were for Python 2; If you have any queries please contact Peter via his alumni email address or Google Mail.
Python is a freely available programming language, available on Windows, Linux and MacOS. There is a Beginners' Guide, and I have found searching on Google for specific questions very handy.
Biopython
Biopython is an add-on module which provides support for lots of Bioinformatic work:
- Dealing with numerous forms of biological data (including Fasta and Genbank sequence files, and PDB structure files).
- Handy functions like translating nucleotide sequences into amino acids.
- Calling (standalone) NCBI BLAST to search for sequence matches.
- Calling (standalone) ClustalW to do alignments.
I started using Biopython in 2004 for my PhD, and then began contributing to the project. By the end of my PhD I was one of the core developers and was lead author on the Biopython application note publication. Why not have a look at the Biopython Tutorial and Cookbook, or some of my examples.
On a related note, Thomas Mailund's Newick Tree module is very handy for dealing with phylogenetic trees. This file format is used by used by PHYLIP, TREE-PUZZLE, PROTML, and several other programs including Clustal. Biopython 1.30 did not include any code for dealing with tree files - but check out the new Nexus module in BioPython 1.40b onwards from Frank Kauff and Cymon Cox, which is a nice alternative.
The Molecular Modelling Toolkit (MMTK)
The Molecular Modelling Toolkit (MMTK) is a python library providing a range of tools for molecular simulations (the numerically intensive parts are actually written C for speed).
MMTK also has visualisation capabilities, including the ability to use Visual Python or VMD for output. I wrote an example now included in MMTK which loads a PDB file of insulin and displays it using a space filling model.
I did some work with MMTK for my second MOAC MSc mini-project, and wrote a good chunk of the Windows installation instructions for MMTK.
RPy (R from Python)
R is a language and environment for statistical computing and graphics, available for free from The R Project. R's rich libraries for statistics and graph creation can be called from within a Python program using RPy (R from Python), and is used in several of my examples below.
See also my R programming pages.
Examples
Here are some Python examples I have written and chosen to share:
- Sudoku Solver
- Using FASTA nucleotide files to calculate GC percentages
- Downloading (Bacterial) Genomes with an FTP script
- Parsing (reading) GenBank files
- Converting GenBank files into FASTA files
- Running RPS-BLAST and parsing the output
- Using Python (and R) to calculate Linear Regressions
- Using Python (and R) to calculate Rank Correlations
- Using Python (and R) to draw heatmaps from microarray data
- Protein Superposition (3D alignment) using SVD in BioPython
- Protein visualisation using MMTK
- Ramachandran Plots for proteins:
- Protein contact map from PDB file