Jenkins et al. (2006)

Here you can find additional files associated with the paper "How many transcripts does it take to reconstruct the splice graph?" (Jenkins et al. 2006). Each file is described below. They are available together in this zip file.

ratios.xls contains likelihood ratio tests for the pairwise versus in-out models applied to all 89 genes with more than 5000 putative transcripts, formatted as follows:

		Region 1	...	Total all regions
Ensembl ID	Start of region (fragment number)
	End of region (fragment number)
	Gamma
	Degrees of freedom
	p-value (%)

readwg.m reads a directed ASG into Matlab. The ASG file should be formatted as in the example ABCB5.wg and in the same directory.
ABCB5.wg is an example weighted ASG file. ‘#’ acts as a comment character. Size of graph should be number of exon fragments + 2 (representing the appended source and sink). Below each node’s outdegree is a list of destination nodes and the supporting number of ESTs. There should also be a Y/N string indicating whether this node is attached to the next one (i.e. this and the following node are part of the same exon).
pairwise.m simulates in Matlab the sampling of transcripts under the pairwise model as described in the paper. Execute this after reading in an ASG using readwg.m. It creates an n x #exon-fragments matrix called transcripts (n is specified on input). Each row provides a random sample from the gene under the pairwise model. transcripts(i,j) is an indicator for whether exon fragment j is contained in sample i.
inout.m simulates in Matlab the sampling of transcripts under the in-out model as described in the paper. It operates in the same way as pairwise.m.
maxmin.m creates variables maximum_transcripts and minimum_transcripts which are the maximum and minimum number of informative transcripts obtainable from the current ASG. These values are calculated as described in the path-covering algorithms described in the paper. The current ASG is read into Matlab using readwg.m.
ratio.m performs the likelihood ratio test described in the paper on the current ASG, comparing the pairwise versus in-out models. It creates variables regionstart, regionend, Gam, z and pvalue which are the rows of the table template above. The current ASG is read into Matlab using readwg.m. Determining the degrees of freedom under the pairwise model is not entirely straightforward. For a test on exons k,...,l we used the following as a sensible choice for the degrees of freedom under the pairwise model:
i.e. to describe all possible outgoing destinations from a given exon i we require the number of parameters to be equal to the number of downstream acceptor sites, minus 1. The degrees of freedom is the sum of all such i over k,...,l.
putative_transcripts.m returns the number of putative transcripts (i.e. distinct paths through the ASG) associated with the current ASG, which has been read into Matlab using readwg.m.
count_downstream_paths.m is a function for use by putative_transcripts.m. Leave it in the same directory as this file.
full_asg.py is a Python script for which returns a p-value for an ASG, based on alpha = 1. Usage:
python full_asg.py m example.wg repeats
where m is the number of transcripts we’ll draw for each repeat, and repeat is the number of such repeats on which the p-value is based. This operates on the ASG contained in example.wg. It should be executed in the same directory as wwdigraph.py and digraph.py, as well as your example.wg.