Skip to main content

bamCleave

In some circumstances bam files arise with reads from multiple genomes. These cannot be viewed using programs such as IGV and cannot be processed by applications that assume a single genome and standard chromosome names. bamCleave can separate out chromosomes for a specific genome and change the chromosome names to more standard names.

bam files can also contain data for different cells. bamSplit can divide such data into bam files for individual cells.

Syntax

bamCleave -b <bamFIleName> (-n <txt>) (-o <outputFIle>) (-c N)/(-c < (-p <prefix>)/(-m <mappingFile>)

-b Specifies the source bam file.

-n <txt> indicates that the cell identifier is set by the read name up to the character string <txt>

-o specifies the root for the generated files. If this is not specified then the source bam filename is used as the root for the generated files

-c N is for use with single cell data where the cell is identified by a barcode within the bam file. bamSplit reads through the file to find the N cell identifiers with the most reads and then creates bam files (with an index for all N cells)

-t XY specfiies the tag that is used to identify the single cell barcode. The default if the -t option is not used is XC

The -p and/or the -m file can be used to specify the chromsomes that are separated out int a separate bam file. The -p prefix option specifies that all the chromosome names of the format <prefix>XYZ will be separated out and the new chromosomes will have the prefix removed. The -m option specifies a file that lists all the chromosomes to be removed and the names in the new bam file, as a tab delimited file, e.g.

MOUSE_17 17
MOUSE_18 18
MOUSE_19 19
MOUSE_1 1
MOUSE_1_GL456210_random 1_GL456210_random
MOUSE_1_GL456211_random 1_GL456211_random
MOUSE_1_GL456212_random 1_GL456212_random

Examples

Take the reads from source.bam and split into separate files whose filenames start with 'output' and where the
individual cells barcodes use the 'CB' tab. Output data for the top 100 cells

bamCleave -b source.bam -o output -t CB -c 100 

Take the reads from source.bam and split into separate files whose filenames start with 'output' and where the
individual cells barcodes are set by the read names up to the character ":" Output data for the top 300 cells


bamCleave -b source.bam -o output -n ":" -c 300

Generated files

All files have a root file name <rootFileName> which is either specified by the -o option or is the source bam file without the .bam suffix.

<rootFileName>_<prefix>.bam or rootFileName>_sel,bam Reads that have been separated out using the -p or the -m options
<rootFileName>_res.bam The remaining reads
<rootFileName>_<prefix>/sel_XYZ.bam Reads associated with the cell with idenifier XYZ
<rootFileName>_chimeras.txt Info about paired end reads which are split between genomes
<rootFileName>_split.log Run log. Includes read counts.

Download

bamCleave is a command line executable that should be placed in an appropriate directory

64 bit Windows

64 bit Linux 

Apple mac 

For linux and mac the command "chmod a+x bamCleave" will need to be used to allow the downloaded file to be run as a program. The program can be run by specifying the full path name to the program or ./bamCleave if it is in the current directory. Alternatively it can be put into a directory such as /usr/local/bin where bamCleave will automatcally be found when run from the command line in any other directory. A final alternative is to place it in a new directory for such programs and add the directory to the PATH envirnonment variable in the .profile file.