cisGenome software
The cisGenome software, written at Stanford University (Ji, Jiang et al. 2008) has been written to analyse ChIP-chip and ChIP-seq dat to locate transcription factor binding sites and then to further analyse the dat to find nearest genes, overrepressented motifs etc.
- The home page
- Tutorial on using cisGenome
- A users manual
- Information on using the core executables from the command line
- Information about the browser component
- The download page, including downloading genomes in cisGenome format
I have created my own pages with notes on specific aspects of cisGenome
The software consists of three components:
- A core set of command line executables
- A GUI for managing files and launching the executables
- A display server for displaying the results.
Installing the software: Windows.
As part of my work with this software I have made significant changes and enhancements.
Installers for the current software version: Version 12
Changes since Version 9
- New Installer for installing on Windows 7 through 10.
- Bug fix: Files not always closed
- Improved drawing of fonts in graphs and better use of space for axes
- Add ability to just put in one chromosome location
- Add support for drawing RNA motifs
Changes since Version 7
- Bug fixes
- Improved initiallisation of sessions
Changes since Version 6
- Correct incompatibility between windows and Firefox.
- Peakfinder update to allow processing of bigger files
Changes since Version 5:
- Display of region data in line format, together with the option of smoothing the data.
This is particularly useful to see the average proportion of the genome that is unique given
any specific read length
Changes since Version 3:
- Display of cod files can be sorted by clicking in the header region
- Add support of TAIR genomes in browser
- Ability to copy complement of bases from the browser
- Further bug fixes relating to directories with spaces
Changes since Version 2:
- Bugfixes relating to displaying popups with information about regions
- Improved display of regions with no 'value' data
- Correction of further incidences where data and genome were offset by 1 nucleotide
Changes since Version 1:
- Bugfixes relating to directories with spaces.
- Correct display of regions which extend beyond the range of nucleutides being displayed.
- Optional offset to reads on the reverse strand when converting input data to barfiles.
Information on using the additional features I have introduced can be found here
This installer places the software in C:\program files\cisGenome by default, although alternative locations can be selected, including installing over an existing cisGenome installation.
If an alternative location to an existing installation is selected, the contents of the browser\wwwroot\sessions directory should be copied to the equivalent directory in the new location.
Apple mac Versions.
The Apple MAC version of the display server is available here. The zip file also includes a bam2bar which includes software for converting bam files to bar files so that they can be viewed on the display server
The files should be unzipped into a suitable directory and then run.command editted to set EXECUTABLE_PATH to the directory where the files sit. run.command should then be run from the command line. This may require changing its permissions using 'chmod a+x run.command'.
The display server runs a local web server on port 11111. To use the display server open a web browser and go to the URL 127.0.0.1:11111
Web browsers
The original cisgenome browser used Java on the display server for some of the buttons. The ability to do this has now been removed on most web browsers because of security concerns so I have updated the code so that this 'active content' works in a different way in that the resulting dialog boxes are generated by the underlying display server progra rather than the web page. This has been implemented for both Windows and Apple macs.
CisGenome works best with Microsoft Internet explorer and if it is installed it will use this in preference to using the default browser.
In order for display setting within the embedded UCSC genome pages to ber persisted, cookies need to be enable for these pages. With a the UCSC genome browser subpage open In explorer, goto tools/Webpage/Privacy policy to see if cookies are being blocked, and if so change the policy for the webpage to always enabled cookies.
My changes to the software
Updates to existing Functionality
-
The table sorter software has been redesigned to accept a much larger range of input file formats
-
The table sorter can handle much larger datasets (The original would not work with the supplied examples).
-
The original software allows the boundaries of the binding region to be ‘refined’ to just include the region between the peaks for the distributions of the reads in each direction (typically 30-80 nt), rather than the overall width of the feature (600-700 nt).
Initial tests show this would often miss what appeared to be significant binding sites on the edge of this region, so an option was added that made it possible to extend the region by ‘x’ nucleotides in either direction -
A further improvement to the peak finder was the addition of the ability to have different averaging/smoothing of the control data than the signal data. Averaging over a greater distance makes the results less vulnerable to local noise in the control data.
Software Structure: GUI
- Large amounts of the code have been tidied up to
- Remove code that is unnecessary because the function is inherent in a MFC foundation class environment.
- Restructure the classes associated with a multiple document type environment to make them more consistent with a classic MFC application of this type.
Both of these changes should ensure that the software is easier to maintain in that it more closely conforms to standard MFC software design
Software Structure: Browser
- The original browser code was built using the Microsoft Foundation CLasses in Visual C++ 7. Changes have been made to reduce the dependancy on Microsoft specific libraries, such that the code can now be compiled and run on non-microsoft platforms. It has been run succesfully on the System Biology server. The changes include:
- Changing the code to work with the MFC Visual C++ 6, which are freely available in the Microsoft Platform SDK.
- Changing the code that used to use MFC for drawing gifs to use the open source, multi platform pngWriter and FreeType code.
- Writing a small class to handle the reading and writing of config data in '.ini' format, to replace the original MS system calls that were used previously.
- The only code that remains that uses MFC is now the code for displaying the icon in the toolbar, although this is redundant in a non-microsoft environment, as information on what code is running is usually provided by running command line tools rather than being provided graphically
Usability (GUI only)
- The GUI only asks if you want to save a project file if you have made changes.
- The dialog box for selecting a directory (used when configuring a number of operations) opens in the current directory and allows the creation of new directories)
- Some dialog boxes have been widened so file names are not truncated
- Most of the functions have been made available using right click menus from the project explorer.
- A framework has been introduced that allows the previously used values for dialog boxes to be persisted so that they can be used next time the dialog box is run. This has only been added to a couple of dialog boxes.
- The code has been changed so that it can operate with data in directories that contain spaces, rather than having to the unix limitation of not allowing spaces.
Logging and error messages
- A log file is created in each directory with a copy of every command used on files in the directory, together with the date and time, and any warnings and errors.
- If an error occurs when a core program is run, any output from the program is displayed in a dialog box, or if there is no output, a dialog box indicating that an error occurred is displayed.
- The last command to be run is copied into the clipboard so that it is available to be used as the command line parameters when debugging the program, or so that the command can be run ‘stand-alone’ rather than from the GUI.
New functionality
- The regionStats program has been introduced which is a general purpose framework for analysing statistics associated with regions and outputting the results as cod files for use by cisGenome, and also as csv files for analysis by Excel etc.
- A new series of classes (cisGenomeTools) are being introduced to expose the cisGenome functions in a more object oriented way, in order to make it easier to develop applications.