Repeated measures proportional odds logistic regression

Method

Repeated measures proportional odds logistic regression models correlated repeat ordinal scores using the method suggested by Parsons et al. (2006). Code for model fitting runs in the free statistical software R. Before sourcing the repolr functions, download R and the latest version of the GEE solver package gee. Two example data sets are available (Example 1: Achilles tendon data and Example 2: Hip resurfacing data) and code for analysis and a description of implementation is given below.

Code

The source functions for R implementation of repolr must be loaded prior to model fitting.

Example 1: Achilles tendon data

Example data is from a clinical trial of treatment post-surgery after rupture of the Achilles tendon. Reads data from user defined location, loads gee library, fits model and summarises.

# example code

achilles.dat <- read.table(“c:/data location/achilles_data.txt”,

col.names=c("Patient","Treat","Time","Activity"),header=T)

attach(achilles.dat)

library(gee)

source(“c:/function location/repolr_functions.R”)

fitted.mod <- repolr(formula = Activity ~ factor(Treat) * Time ,

subjects = "Patient" , data = achilles.dat ,

categories = 3 ,  times = c(1,2,3) , corstr = "ar1",

tol = 0.001, scalevalue = 1, alpha = 0.5,

po.test=TRUE, fixed=FALSE)

summary(fitted.mod[["gee"]])

fitted.mod[["corr"]]

Implementation

The user is required to specify:

(i) a data set name (data)

(ii) a model formula (formula)

(iii) a cluster identification variable (subjects)

(iv) a time variable (time)

(v) and the number of categories used for the response variable (categories)

The data set may contain records with missing data for either the response variable or the explanatory variables. The response variable must have at least three ordered categories (K>=3) and, as K cut-point parameters are estimated, an additional intercept term can not be explicitly included in the model formula. A subject variable, which takes integer values from 1 to N with no missing values allowed, indicates the data clusters (patients or experimental units) and a time variable indicates the within cluster ordering; times must be ordered integers starting from one and spaced to indicate the relative distance between successive times. For instance, four observations at equally spaced times would be entered as 1, 2, 3 and 4, whereas if the first two observations were separated by half the time interval of the other observations then coding would be 1, 2, 4 and 6. The data must be pre-sorted by time clusters within each subject, and complete, i.e. where data is missing for a particular time it must be indicated as such. The available options for the correlation model (corstr) are AR1, uniform, fixed and independence, with default setting AR1.

Additionally there are a number of other algorithm related options. A fixed scale parameter (scalevalue) is assumed to be 1, but otherwise it can be set to any positive value . The algorithm is generally robust to the initial value for alpha (default setting = 0.5), however a starting value for alpha can be set. The maximum number of iterations allowed (maxiter; default setting = 10) can be specified and convergence (tol) is monitored using the average relative change in the regression parameters at each iteration of the algorithm; the default setting (0.001) is such that the algorithm continues until this drops to below 0.1%. The algorithm will fail if the working correlation matrix is not positive definite and, in order to speed convergence for the AR1 and uniform correlation models, is constrained to be between 0.05 and 0.95. The proportional odds assumption can be tested, using a score test, by setting the option po.test to TRUE (default).

The functions require the user to have pre-installed the latest version of the R GEE solver gee and the fitted model (fitted.mod[["gee"]]) inherits the class attributes of this package; correlation model ouput is available from fitted.mod[["corr"]].

Example 2: Hip resurfacing data

Example data is from a clinical trial of treatment post-surgery after hip resurfacing.

# example code

pain.dat <- read.table(“c:/data location/pain_data.txt”,

col.names=c("Patient","Sex","Time","HHSpain"),header=T)

attach(pain.dat)

library(gee)

source(“c:/function location/repolr_functions.R”)

fitted.mod <- repolr(formula = HHSpain ~ factor(Sex) * factor(Time) ,

subjects = "Patient" , data = pain.dat ,

categories = 4 ,  times = c(1,2,5) , corstr = "ar1",

tol = 0.001, scalevalue = 1, alpha = 0.5,

po.test=TRUE, fixed=FALSE)

summary(fitted.mod[["gee"]])

fitted.mod[["corr"]]

# plot profile likelihood

proflik <- vector(length=81)

alphas <- seq(0.1,0.9,0.01)

for (j in 1:81){

proflik[j] <- srepolr(mod.gee=fitted.mod,alpha=alphas[j],data=HHSpain.dat)

Nproflik <- proflik/srepolr(mod.gee=fitted.mod,alpha=fitted.mod[["corr"]]$alpha,data=HHSpain.dat)

plot(x=alphas,y=Nproflik,type="l",lwd=2)

References

Parsons, NR, Edmondson, RN and Gilmour, SG, (2006) 'A generalized estimating equation method for fitting autocorrelated ordinal score data with an application in horticultural research.' Applied Statistics 55, 507 - 524.