Skip to main content

Differences between LiBiNorm and htseq-count

Differences in the assignments of reads to genes

There are three slight differences to the way that LiBiNorm counts reads compared to htseq-count:

  • Reads that map to contigs where no genes are located in the reference annotation are counted as noMatch by LiBiNorm. Htseq-count ignores such reads.
  • LiBiNorm ignores transcripts of biotype "retained intron" in gtf files as, in practice, few reads map to the retained introns and their presence gives misleading results when estimating gene/transcript length.
  • When a bam file contains multiple alternative mappings for a given read or read pair LiBiNorm only counts this as a single non-unique read or read pair. htseq-count counts the read multiple times.

"LiBiNorm count" can be run in a fully htseq-count compatible mode using the -z option.

Unsupported htseq-count options

-f <format>, --format=<format>

Unavailable as LiBiNorm only supports SAM files

--additional-attr=<id attributes>

LiBiNorm does not support the specification of additional attributes.

--nonunique=<nonunique mode>

LiBiNorm only operates in the default:none mode

--secondary-alignments=<mode>

LiBiNorm only operates in the default:score mode where the score is used to determine if the read should be included

--supplementary-alignments=<mode>

LiBiNorm only operates in the default:score mode where the score is used to determine if the read should be included

--max-reads-in-buffer=<number>

This not required in LiBiNorm in that when the buffer size exceeds 200000 the buffer is written to one or more temporary files which are then reprocessed to find all of the remaining pairs. The buffer size can be varied using the READ_CACHE_SIZE #define in Options.h

Differences in htseq-count options

-o <bamout>, --bamout=<bamout>

the -o option is used to create bam files and not sam files, and can only be used with paired end data when it is ordered by read name.