Skip to main content Skip to navigation


Tools for Metagenomic Analysis

This is a collection of bioinformatic tools generated for metagenomic analysis.

These tools have been designed to be used in tandem with blast and the metagenomics analysis software MEGAN.

The scripts included are as follows:

  • Removes low complexity reads and sorts by read length.
  • Find and remove exact duplicate sequences in fasta files, such as artifactual duplication emulsion PCR.
  • Search and replace incorrectly annotated Subject headers (eg. Human, not Homo sapiens) in a BLAST output file.
  • Parse a blast output and return query headers which align to subjects containing a search term in the header.
  • Retrieve the sequences from a fasta file for a file of headers.
  • Contrast 2 fasta files and recover those in the original, not contained in the query file.
  • Script to compile multiple blast output files so they may be processed as a single input file by MEGAN.


Download the MetagenomicScript.tgz file to your desktop and uncompress the contents.


The scripts contained in the PerlScripts folder require Perl to be installed on your computer. The first line of the script indicates the location of Perl and may need to be changed to suit your system.

The script contained in the JavaScripts folder requires Java to be installed on your computer.


These scripts are intended to be run from command line. Details of Inputs, outputs and parameters are included in the preface of each script.


Palmer, Clapham, Rose, Freitas, Owen, Beresford-Jones, Moore, Kitchen and Allaby (2010) Recent cotton evolution tracked through archaeogenomics. submitted.