Skip to main content

LiBiNorm

wold_bias.pngLiBiNorm has been designed to perform gene expression analysis using RNA-seq data. It is command line and output compatible with htseq-count and also includes RNA-seq bias compensation tailored to library preparation as described in the paper: Modeling enzyme processivity reveals that RNA-Seq libraries are biased in characteristic and correctable ways

The RNA-seq data should be aligned to a reference genome using an aligner that can process intron spanning reads, such as HISAT2. The reads should then be converted to bam format, e.g. with samtools in that, unlike htseq-count, LiBINorm only supports reads in bam format and not sam format. However, unlike htseq-count, LiBiNorm can process unsorted bam files of any size thus removing the need for large bam files to be sorted first.

LiBiNorm can also work with feature definition files in both gtf and gff3 format. It is sometimes the case that the chromosome identifiers in the gtf/gff file do not match those used in the alignment data. In this situation LiBiNorm will match the chromosomes based on the chromosome lengths which are recorded in both the bam files and the gtf/gff files.

LiBiNorm count produces a file that contains the normalised expression for each gene, calculated as TPM values, with the RNA length used having been adjusted based on the bias that was identified for RNA of that length. LiBiNorm count is also able to produce a count file that reproduces the output of htseq-count and does not include bias normalisation.

LiBiNorm count is also able to produce a landscape file that contains the results of processing the contents of the bam and feature files, which can be also be used by LiBiNorm model to produce normalised expression counts. This mode allows additional normalisation options to be explored more efficiently

 

LiBiNorm output