Prof Odd Aalen, Dept of Biostatistics, University of Oslo (3-4pm Rm L4, Science Concourse Main Level) Statistics, causality and dynamic path modelling
Prof Geert Molenberghs, Centre for Statistics, Hasselt University, Belgium (4.30-5.30 Rm LIB1, Library) Model assessment and sensitivity analysis when data are incomplete
Prof Wilfrid Kendall, University of Warwick Short-length routes in low-cost networks via Poisson line patterns (joint work with David Aldous)
How efficiently can one move about in a network linking a configuration of n cities? Here the notion of "efficient" has to balance (a) total network length against (b) short network distances between cities. My talk will explain how to use Poisson line processes to produce networks which are nearly of shortest total length, which make the average inter-city distance almost Euclidean.
Dr Alex Schmidt, Instituto de Matematica - UFRJ, Brazil Modelling multiple series of runoff: The case of Rio Grande
Basin (joint work with Romy R Ravines and Helio S Migon)
This paper proposes a joint model for the rainfall and multiple
series of runoff at a basin, two of the most important hydrological
processes. The proposed model takes into account the different spatial units
in which these variables are measured, and as a natural benefit its
parameters have physical interpretations. Also, we propose to model runoff
and rainfall in their original scales, making no use of any transformation
to reach normality of the data. More specifically, our proposal follows
Bayesian dynamic nonlinear models through the use of transfer function
models. The resultant posterior distribution has no analytical solution and
stochastic simulation methods are needed to obtain samples from the target
istribution. In particular, as the parameters of the dynamic model are
highly correlated, we make use of the Conjugate Updating Backward Sampling
recently proposed by Ravines, Migon and Schmidt (2007), in order to
efficiently explore the space of the parameters. We analyze a sample from a
basin located in the Northeast of Brazil, the Rio Grande Basin. The data
consist of monthly recorded series from January 1984 to September 2004,
at three runoff stations and nine rainfall monitoring stations, irregularly
located in an area of drainage of 37,500 sq km. Model assessment,
spatial interpolation and temporal predictions are part of our analysis.
Results show that our approach is a promising tool for the runoff-rainfall
analysis.
Prof Valentine Genon-Catalot, Paris 5 Explicit filtering of discretized diffusions
Consider
a pair signal-observation ((x_n, y_n), n > 0) where
the unobserved signal (x_n) is a
Markov chain and the observed component is such
that, given the whole sequence (x_n), the
random variables (y_n) are
independent and the conditional distribution of y_n only
depends on the
corresponding state variable x_n. Concrete problems raised
by these observations
are the prediction, filtering or smoothing of (x_n). This requires
the computation of the conditional distributions of x_l given y_n, . . .
, y_1, y_0 for all l, n. We introduce sufficient
conditions allowing to obtain
explicit formulae for these conditional distributions and extend the
notion of finite dimensional filters using mixtures of distributions. The method
is applied to the case where the signal x_n = Xn_ is a
discrete sampling
of a one dimensional diffusion process: Concrete models are
proved to fit in our conditions. Moreover, for these models, exact likelihood
inference based on the observation (y_0, . . .
, y_n) is feasible.
Dr Daniel Farewell, Cardiff University Simple models for informatively censored longitudinal data
Models for
longitudinal measurements truncated by possibly informative dropout have
tended to be either mathematically complex or computationally demanding. I
will review an alternative recently proposed in our RSS discussion paper (Diggle
et al. 2007), using simple ideas from event-history analysis (where censoring
is commonplace) to yield moment-based estimators for balanced, continuous
longitudinal data. I shall then discuss some work in progress: extending
these ideas to more general longitudinal data, while maintaining
simplicity of understanding and implementation.
Prof Gareth Roberts, University of Warwick (Joint Statistics/Econometrics Seminar) This presentation will review recent work in the area of Bayesian inference
for discretely (and partially) observed diffusion. I will concentrate on
Bayesian approaches which necessitate the use of Markov chain Monte Carlo
techniques, and the talk will consider the problem of MCMC algorithm
design. The approach will be presented in a continuous-time framework.
Dr Sumeet Singh, Signal Processing Laboratory, Cambridge Filters for spatial point processes
We consider the inference of a hidden spatial Point Process (PP) X on a CSMS (complete separable metric space) X, from a noisy observation y modeled as the realisation of another spatial PP Y on a CSMS Y. We consider a general model for the observed process Y which includes thinning and displacement and characterise the posterior distribution of X for a Poisson and Gauss-Poisson prior. These results are then applied in a filtering context when the hidden process evolves in discrete time in a Markovian fashion. The dynamics of X considered are general enough for many arget Tracking applications, which is an important study area in Engineering. Accompanying numerical implementations based on Sequential Monte Carlo will be presented.
Professor Malcolm Faddy, Queensland University of Technology, Australia Analysing hospital length of stay data:
models that fit, models that don’t and does it matter?
Hospital length of stay data typically show
a distribution with a mode near zero and a long right tail, and can be hard to
model adequately. Traditional models include the gamma and log-normal
distributions, both with a quadratic variance-mean relationship. Phase-type
distributions which describe the length of time to absorption of a Markov chain
with a single absorbing state also have a quadratic variance-mean relationship.
Covariates of interest include an estimate of the length of stay for an
uncomplicated admission, with excess length of stay modelled relative to this
quantity either multiplicatively or additively. A number of different models
can therefore be constructed, and the results of fitting these models will be
discussed in terms of goodness of fit, significance of covariate effects and
estimation of quantities of interest to health economists.
Dr Elena Kulinskaya, Statistical Advisory Service, Imperial College Meta analysis on the right
scale
This talk is about an approach to meta analysis
and to statistical evidence developed jointly with Stephan Morgenthaler and
Robert Staudte, and now written up in our book 'Meta Analysis: a guide to
calibrating and combining statistical evidence' to be published by Wiley
very soon. The traditional ways of measuring evidence, in particular with
p-values, are neither intuitive nor useful when it comes to making
comparisons between experimental results, or when combining them. We
measure evidence for an alternative hypothesis, not evidence against a null.
To do this, we have in a sense adopted standardized scores for
the calibration scale. Evidence for us is simply a transformation of a
test statistic S to another one (called evidence T=T(S)) whose
distribution is close to normal with variance 1, and whose mean grows from 0
with the parameter as it moves away from the null. Variance stabilization is
used to arrive on this scale. For meta analysis the results from
different studies are transformed to a common calibration scale, where it
is simpler to combine and interpret them.
Dr Richard Samworth, Statistical Laboratory, Cambridge Computing the maximum likelihood estimator of a
multidimensional log-concave density
We show that if
$X_1,...,X_n$ are a random sample from a log-concave density $f$ in
$\mathbb{R}^d$, then with probability one there exists a unique maximum
likelihood estimator $\hat{f}_n$ of $f$. The use of this estimator is
attractive because, unlike kernel density estimation, the estimator is fully
automatic, with no smoothing parameters to choose. The existence proof is
non-constructive, however, and in practice we require an iterative algorithm
that converges to the estimator. By reformulating the problem as one of
non-differentiable convex optimisation, we are able to exhibit such an
algorithm. We will also show how the method can be combined with the EM
algorithm to fit finite mixtures of log-concave densities. The talk will be
illustrated with pictures from the R package LogConcDEAD.
This is
joint work with Madeleine Cule (Cambridge), Bobby Gramacy (Cambridge) and
Michael Stewart (University of Sydney).
Dr Robert Gramacy, Statistical Laboratory Cambridge Importance Tempering Simulated tempering (ST) is an
established Markov Chain Monte Carlo (MCMC) methodology for sampling from a
multimodal density pi(theta). The technique involves introducing an
auxiliary variable k taking values in a finite subset of [0,1] and indexing a
set of tempered distributions, say pi_k(theta) = pi(theta)^k. Small
values of k encourage better mixing, but samples from pi are only
obtained when the joint chain for (theta,k) reaches k=1. However, the
entire chain can be used to estimate expectations under pi of functions
of interest, provided that importance sampling (IS) weights
are calculated. Unfortunately this method, which we call
importance tempering (IT), has tended not work well in practice. This is
partly because the most immediately obvious implementation is naive and
can lead to high variance estimators. We derive a new optimal method
for combining multiple IS estimators and prove that this
optimal combination has a highly desirable property related to the notion
of effective sample size. The methodology is applied in two
modelling scenarios requiring reversible-jump MCMC, where the naive approach
to IT fails: model averaging in treed models, and model selection
for mark--recapture data.
Professor Simon Wood, University of Bath Generalized Additive Smooth Modelling Generalized Additive Models are
GLMs in which the linear predictor is made up, partly, of a sum of smooth
functions of predictor variables. I will talk about the penalized regression
spline approach to GAM, as implemented, for example, in R package mgcv. In
particular I will focus on two interesting aspects: low rank representation
of smooth functions of several covariates and stable computation of both
model coefficients and smoothing parameters. More than half the slides will
have pictures.
Terry Speed, University of California, Berkeley Alternative Splicing in Tumors: Detection and Interpretation In this talk I will discuss using the Affymetrix GeneChip Human Exon
and Human Gene 1.0 ST arrays for the detection of genes spliced differently
in some tumors in comparison with others. I plan to begin by introducing
the arrays and the expression data they produce. Next I will outline the
way in which we use such data in our attempts to identify
exon-tumor combinations exhibiting splicing patterns different from the
majority. This will be illustrated by examples from publicly available tissue
and mixture data. Then I will briefly discuss some of the additional
issues which arise when we seek to enumerate such alternative splicing
patterns on a genome-wide scale. Finally, I will exhibit some of the results
we have found applying these methods to glioblastoma tissue samples
collected as part of The Cancer Genome Atlas (TCGA) project. (This is
joint work with ELizabeth Purdom, Mark Robinson, Ken Simpson, and members of
the Berkeley Cancer Genome Center.)
Alexey Koloydenko & Juri Lember (Joint Talk), University of Nottingham Adjusted Viterbi Training for Hidden Markov Models The Expectation Maximisation (EM) procedure is a principal
tool for parameter estimation in hidden Markov models (HMMs). However, in
applications EM is sometimes replaced by Viterbi training, or extraction,
(VT). VT is computationally less intensive and more stable, and has more of
an intuitive appeal, but VT estimation is biased and does not satisfy the
following fixed point property: Hypothetically, given an infinitely large
sample and initialized to the true parameters, VT will generally move away
from the initial values. We propose adjusted Viterbi training (VA), a new
method to restore the fixed point property and thus alleviate the
overall imprecision of the VT estimators, while preserving the
computational advantages of the baseline VT algorithm. Simulations show
that VA indeed improves estimation precision appreciably in both the
special case of mixture models and more general HMMs.
We will discuss
the main idea of the adjusted Viterbi training. This will also touch on
tools developed specifically to analyze asymptotic behaviour of maximum a
posteriori (MAP) hidden paths, also known as Viterbi alignments. Our VA
correction is analytic and relies on infinite Viterbi alignments and
associated limiting probability distributions. While explicit in the special
case of mixture models, these limiting measures are not obvious to exist for
more general HMMs. We will conclude by presenting a result that under certain
mild conditions, general (discrete time) HMMs do possess the
limiting distributions required for the construction of VA.
Dr Cliona Golden, UCD, Dublin On the validity of ICA for fMRI data Functional
Magnetic Resonance Imaging (fMRI) is a brain-imaging technique which, over
time, records changes in blood oxygenation level that can be associated with
underlying neural activity. However, fMRI images are very noisy and
extracting useful information from them calls for a variety of methods of
analysis.
I will discuss the validity of the use of two popular
Independent Component Analysis (ICA) algorithms, InfoMax and FastICA, which
are commonly used for fMRI data analysis.
Tests of the two algorithms
on simulated, as well as real, fMRI data, suggest that their successes are
related to their ability to detect "sparsity" rather than the independence
which ICA is designed to seek.
Prof Antony Pettitt, Lancaster University Statistical inference for
assessing infection control measures for the transmission of pathogens in
hospitals Patients can acquire infections from pathogen
sources within hospitals and certain pathogens appear to be found mainly in
hospitals.Methicillin-resistant Staphylococcus
Aureus (MRSA) is an example of a hospital acquired pathogen that continues
to be of particular concern to patients and hospital management.Patients infected with MRSA can develop
severe infections which lead to increased patient morbidity and costs for the
hospital.Pathogen transmission to a
patient can occur via health-care workers that do not regularly perform hand
hygiene.Infection control measures that
can be considered include isolation for colonised patients and improved hand
hygiene for health-care workers. The talk develops statistical methods and
models in order to assess the effectiveness of the two control measures (i)
isolation and (ii) improved hand hygiene.For isolation, data from a prospective study carried out in a London hospital is
considered and statistical models based on detailed patient data are used to
determine the effectiveness of isolation.The approach is Bayesian. For hand hygiene it is not possible, for
ethical and practical reasons, to carry out a prospective study to investigate
various levels of hand hygiene.Instead
hand hygiene effects are investigated by simulation using parameter values
estimated from data on health-care worker hand hygiene and weekly colonisation
incidence collected from a hospital ward in Brisbane.The approach uses profile likelihoods.Both approaches involve transmission
parameters where there is little information available and contrasting
compromises have to be made. Conclusions about
the effectiveness of the two infection control measures will be discussed. The talk involves
collaborative work with Marie Forrester, Emma McBryde, Chris Drovandi, Ben
Cooper, Gavin Gibson, Sean McElwain.
Professor Jay Kadane, Carnegie Mellon University Driving While Black: Statisticians Measure Discriminatory Law Enforcement (joint work with John Lamberth) The US Constitution guarantees "equal protection under the
law" regardless of race, but sometimes law enforcement practices have
failed to adhere to this standard.In the 1990's, a suit was brought
alleging that the New Jersey State Police were stopping Blacks
at disproportionately high rates in the southern end of the New
Jersey Turnpike. In this talk I * review the evidence in that case, the
decision, and its immediate aftermath * discuss criticisms of that
decision * examine new evidence that rebuts those criticisms * comment
on the extent to which the Constitutional standard is now being met.
Alastair Young, Imperial College London Objective Bayes and Conditional Inference In Bayesian parametric inference, in the absence of
subjective prior information about the parameter of interest, it is natural
to consider use of an objective prior which leads to posterior probability
quantiles which have, at least to some higher order approximation in terms of
the sample size, the correct frequentist interpretation. Such priors are
termed probability matching priors. In many circumstances, however,
the appropriate frequentist inference is a conditional one. The key
contexts involve inference in multi-parameter exponential families,
where conditioning eliminates the nuisance parameter, and models which
admit ancillary statistics, where conditioning on the ancillary is indicated
by the conditionality principle of inference. In this talk, we
consider conditions on the prior under which posterior quantiles have, to
high order, the correct conditional frequentist interpretation. The
key motivation for the work is that the conceptually simple objective
Bayes route may provide accurate approximation to more complicated
frequentist procedures. We focus on the exponential family context, where it
turns out that the condition for higher order conditional frequentist
accuracy reduces to a condition on the model, not the prior: when the
condition is satisfied, as it is in many key situations, any first order
probability matching prior (in the unconditional sense) automatically yields
higher order conditional probability matching. We provide
numerical illustrations, discuss the relationship between the objective
Bayes inference and the parametric bootstrap, as well as giving a brief
account appropriate to the ancillary statistic context, where
conditional frequentist probability matching is more difficult. [This
is joint work with Tom DiCiccio, Cornell].
Cees Diks, University of Amsterdam Linear and Nonlinear Causal Relations in Exchange Rates and Oil Spot and
Futures Prices Various tests have been proposed
recently in the literature for detecting causal relationships between time
series. I will briefly review the traditional linear methods and some more
recent contributions on testing for nonlinear Granger causality. The
relative benefits and limitations of these methods are then compared in two
different case studies with real data. In the first case study causal
relations between six main currency exchange rates are considered. After
correcting for linear causal dependence using VAR models there is still
evidence presence for nonlinear causal relations between these currencies.
ARCH and GARCH effects are insufficient to fully account for the nonlinear
causality found. The second case study focuses on nonlinear causal linkages
between daily spot and futures prices at different maturities of West Texas
Intermediate crude oil. The results indicate that after correcting for
possible cointegration, linear dependence and multivariate GARCH effects,
some causal relations are still statistically significant. In both case
studies the conclusion is that non-standard models need to be developed to
fully capture the higher-order nonlinear dependence in the data.