Bayesian Analysis of Skewed Longitudinal Data

Principal Investigator: Mark F.J. Steel

Funding research council: EPSRC (UK)

Funding period: 10/01/2005-09/01/2008

EPSRC grant GR/T17908/01

Summary:

In many fields of application of statistics the usual distributional assumptions (such as Normality) are often in conflict with the data. Previously, we have worked in the development of more flexible classes of distributions, which is now a rapid growth area for modelling cross-section at data. In this project, we have investigated the use of flexible distributions for modelling longitudinal data. The statistical paradigm used in this research is Bayesian, which allow us to formally deal with model uncertainty. The latter is a very relevant concept given the large collection of possible distributional shapes but we also consider uncertainty regarding the modelling of the dynamics and the choice of covarlates in regression modelling. We have also investigated in detail the issue of model-based clustering in such dynamic models. Fast inference methods based on Markov chain Monte Carlo algorithms were developed and are now available to applied researchers in the field. In addition, we have conducted substantive applications to the growth of countries and regions.

Research Output:

Model-based clustering of non-Gaussian panel data based on skew-t distributions, M.A. Juarez and M.F.J. Steel, Journal of Business and Economic Statistics, 28, (2010), 52-66.
Matlab code: MCMC sampler for non-Gaussian cluster model, data sets and code for the bridge sampler
Abstract: We propose a model-based method to cluster units within a panel. The underlying model is autoregressive and non-Gaussian, allowing for both skewness and fat tails, and the units are clustered according to their dynamic behaviour, equilibrium level and the effect of covariates. Inference is addressed from a Bayesian perspective and model comparison is conducted using the formal tool of Bayes factors. Particular attention is paid to prior elicitation and posterior propriety. We suggest priors that require little subjective input and possess hierarchical structures that enhance the robustness of the inference. We apply our methodology to GDP growth of European regions and to employment growth of Spanish manufacturing firms.

Non-Gaussian dynamic Bayesian modelling for panel data, M.A. Juarez and M.F.J. Steel, Journal of Applied Econometrics, 25, (2010), 1128-1154.
Abstract: A first order autoregressive non-Gaussian model for analysing panel data is proposed. The main feature is that the model is able to accommodate fat tails and also skewness, thus allowing for outliers and asymmetries. The modelling approach is designed to gain sufficient flexibility, without sacrificing interpretability and computational ease. The model incorporates individual effects and covariates and we pay specific attention to the elicitation of the prior. As the prior structure chosen is not proper, we derive conditions for the existence of the posterior. By considering a model with individual dynamic parameters we are also able to formally test whether the dynamic behaviour is common to all units in the panel. The methodology is illustrated with two applications involving earnings data and one on growth of countries.

Directional log-spline distributions, J.T. Ferreira, M.A. Juarez and M.F.J. Steel, Bayesian Analysis, 3, (2008), 297-316.
Abstract: We introduce a new class of distributions to model directional data, based on hyperspherical log-splines. The class is very flexible and can be used to model data that exhibit features that cannot be accommodated by typical parametric distributions, such as asymmetries and multimodality. The distributions are defined on hyperspheres of any dimension and thus, include the most common circular and spherical cases. Due to the flexibility of hyperspherical log-splines, the distributions can closely approximate observed behaviour and are as smooth as desired. We propose a Bayesian setup for conducting inference with directional log-spline distributions where we pay particular attention to the prior specification and the matching of the priors of the log-splines model and an alternative model constructed through a mixture of von Mises distributions. We compare both models in the context of three data sets: simulated data on the circle, circular data on the movement of turtles and a spherical application on the arrival direction of cosmic rays.