Using state space modeling to infer gene regulatory networks requires gene expression profile data from highly replicated, high resolution time series.
To this end such a time series experiment was conducted where four biological replicates of both B. cinerea infected and mock-infected A. thaliana leaves were harvested every two hours over 48 hours giving a total of 24 time points. aRNA from these 192 samples were labeled and hybridized with each other onto 288 CATMA arrays. The complex loop design used allows comparisons to be made of replicates within the same time points as well as comparing replicates from different time points.
The data gathered from the time series experiment above requires several steps of in silico manipulation outlined in the flow diagram below.
Spot quantitation was done in ImaGene to obtain raw intensity and background corrected values for individual genes on each of the 288 arrays.
Quality checking of data was done to identify bad data which will be excluded or repeated. Data was then transformed to remove all unwanted technical variation such as dye and sample bias (see Fig. 4). Absolute expression values of transformed data then undergoes ‘model fitting’ to reveal to what degree the different parameters within the experimental design contribute to the overall variation. All these manipulations are done in a modified version of the R package MAANOVA .
Figure 4 A Typical RI plot for an array showing data scatter before Lowess Transformation. The red line shows the regression to which the data is adjusted to. B RI plot for the same array shown in A after Lowess Transformation
A second R package Timecourse  will be used to identify genes significantly differentially regulated over time in B. cinerea infected compared to mock-infected leaf tissue.
Hierarchical clustering will be conducted in SplineCluster  to group genes differentially expressed over time into groups that show similar temporal expression profile patterns (Fig. 5).
Figure 5 Shows an example of 16 clusters where each cluster contains a group of genes with similar temporal expression profile patterns. Cluster analysis was performed in Timecourse .
State Space Modeling (SSM) can only model 50-100 genes at once. Therefore prior information gained from literature and gene ontologies (using FatoGO or GOStat) will be used as selection criteria for choosing genes from cluster groups identified in SplineCluster  to use in SSM.
Key regulatory genes or ‘hubs’ (Fig. 6) identified in the resulting preliminary transcriptional gene regulatory networks will be subjected to experimental validation. Key regulatory gene functions will be investigated using knockout or over expression lines for the gene in question to investigate susceptibility of such mutants to B. cinerea and the influence of such the regulatory gene on predicted gene targets at the transcriptional or protein level. These experimental results will subsequently be used as prior information and be used to retrain existing transcriptional gene regulatory networks in an iterative fashion using SSM.
Figure 6 An example of complex gene regulatory network with Key regulatory genes or ‘hubs’ indicated by black squares.
- Wu, H., et al., MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments, in The Analysis of Gene Expression Data, E.S.G. Giovanni Parmigiani, Rafael A. Irizarry and Scott L. Zeger, Editor. 2003, Springer London. p. 313-341.
- Tai, Y.C. and T.P. Speed, A multivariate empirical Bayes statistic for replicated microarray time course data. Annals of Statistics, 2006. 34(5): p. 2387-2412.
- Heard, N.A., et al., Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. 2005. p. 16939-16944.