BSGLMM - How to use the software

System Requirements

CUDA
UNIX compatible system

Tested architecture:

Cuda version: CUDA 5.0

OS: GNU/Linux

kernel-version: 3.0.101-0.5

platform: 64-bit

Download

Download source code package and accompanying publication here.

NOTE: In its current version the software only supports UNIX compatible systems and requires an NVIDIA graphics card, i.e. a CUDA capable GPU.

Download example data here (zip-archive).

Installation

Unpack the archive in a directory of your choice. You will need the latest version of GNU GCC to compile the source code.

In order to compile the code, run the Makefile from inside the extracted directory with

$ make

The executable thereby created is named BinCar.

Source files included in the download

main.cu
mcmc.cpp
covar.cpp
covarGPU.cu
read_data.cpp
cholesky.cpp
randgen.cpp
nifti1_read_write.cpp
accompanying <header_files>

Potential Issues during installation

Different version of CUDA.

solution: Update the first lines in the Makefile to your CUDA version.

Shared libraries not found.

solution: Include the following line in your .bashrc file (for CUDA 5.0 in this case):

export LD_LIBRARY_PATH=/usr/local/cuda-5.0/lib64

Running the Code

USAGE

$ ./BinCar  NTypes  NCov  GPU  Design  Mask  WM  [MaxIter  BurnIn]

To see a short description of all intput arguments, try running the program without any command line arguments:

$ ./BinCar

Input arguments

NTYPES: number of different types or groups or classes in the data set

NCOV: total number of covariates (Note: count must include types/groups as dummy covariates)

GPU: run code on GPU (1) or CPU (0)
Note: We strongly recommend running the code on a GPU. (CPU not testet! Use with caution.)

DESIGN: Text file, tab or space separated data file that contains all input data.

MASK: Filename of mask image (must be located in './images' directory); specify '1' to use default (mask.nii.gz).

WM: Filename of withe matter mask image (must be located in './images' directory): specify '0' to use none and '1' to use default (avg152T1_white.nii.gz)

MAXITER (optional): Total number of iterations. Defaults to 1,000,000; use fewer for testing to save time.

BURNIN (optional): Number of burn-in iterations. Defaults to 500,000; try using half the number of MAXITER.

Format of input file DESIGN (e.g. data.dat)

The first row of data.dat specifies the names of the columns (variables). All other rows contain the individual data for each subject.

The first is subject ID, followed by one column for each type/group/class in the data. (E.g. a data set consisting of patients and healthy controls would have two columns of zeros and ones, indicating to which group each subject belongs.)

The next columns specify covariates, one for each covariate. Important note: these covariates need to be mean-centered.

The last column contains the file name of the image for each subject. (Note that the image file does not have a path, all images need to be stored in a subdirectory named "images".)

Here are the first lines of an example data.dat file containing two classes and five additional covariates. (The full file is included in the example data bundle below.)

type1

type2

[additional classes]

covar1

covar2

covar3

covar4

covar5

[additional covariates]

image_file_name

1001

...

-0.38

-18.54

18.24

2.31

-13.08

...

binLesionData_1001.nii.gz

1002

...

-0.38

4.46

-161.76

-1.69

4.92

...

binLesionData_1002.nii.gz

1003

...

0.62

13.46

54.24

-0.69

9.92

...

binLesionData_1003.nii.gz

...

Image files

mask.nii.gz: whole-brain mask
WM_mask.nii.gz (optional): white matter mask. This must be a 8 bit (unsigned char, uint8) image file.
<image_files.nii.gz>: binary lesion masks for each subject.

NOTE: The program expects a subdirectory named "./images" in the main directory. ALL image files (including the mask and WM files) must be in the *.nii.gz format and stored in this subdirectory.

Other input files

seed.dat (optional): containing three starting seed numbers for random number generator

Execution from command line - example

This is an example of how to run the code after it has been compiled:

$ <path_to_exe_dir>/BinCar 2 7 1 data.dat 1 0 10000 5000

For this example, input arguments used are:

NTYPES=2; NCOV=7 (i.e. 5 covariates + 2 types); GPU=1; DESIGN=data.dat; MASK=default; WM=none; MAXITER=10,000; BURNIN=5,000.

Here is an example screenshot of the terminal output after the first 500 iterations. The first few lines give some info about which GPU's are available and used; followed by diagnostic output for every 100 iterations:

screenshot

Output files

Terminal output during runtime

By default only the current number of completed iterations of the MCMC algorithm is displayed. Additional information (e.g. current parameter values, GPU times to compute parameter updates, etc.) can be displayed by un-commenting the respective lines in the source files (mostly found in mcmc.cpp).

Output files

File name

Description

Comments

prb_<NAME_OF_TYPE>.nii

posterior mean probabilities for each type/group/class of data

NAME_OF_TYPE is specified in input file DATA.DAT

spatCoef_<NAME_OF_COVAR>.nii

posterior mean for each covariate

incl. dummy variables for each type

spatCoef_<NAME_OF_COVAR>.Var.nii

corresponding variances for each covariate

NAME_OF_COVAR is specified in input file DATA.DAT

standCoef_<NAME_OF_COVAR>.nii

standardised coefficients for each covariate

posterior mean divided by posterior standard deviation and averaged over all iterations

total_empir_cnt.nii

number of lesions per voxel across the given data set

total_lesion_prb.nii

total lesion count divided by total number of subjects

empir_prb_<NAME_OF_TYPE>.nii

empirical probability for each type

bWM.nii

coefficients * WM mask

only if a WM mask is used

Qhat.dat

predictions based on uniform prior

equal priors: 1/n for n types

Qhat2.dat

predictions based on empirical prior

proportional priors for each type: number of subjects per type divided by total number of subjects

DIC.dat

diagnostics

deviance of the expectation (DE), expectation of deviance (ED), effective degrees of freedom (PD), deviance information criterion (DIC)

File structure of Qhat.dat and Qhat2.dat:

prob type1

prob type2

[additional

classes]

predicted type

true type

0.9999

0.0001

...

0.9999

0.0001

...

0.0000

1.0000

...

The 1st column contains a running count for all subjects (starting from zero). The 2nd, 3rd, 4th, etc. columns (depending on the total number of types) give the predicted probabilities for each type. Most interesting is the second-to-last column which provides a prediction of each subject into one of the given types. For reference and comparison, the last column displays the true type as given in the input data.

Results

Posterior mean probabilities for each type/group/class of data are given in the files prb_<NAME_OF_TYPE>.nii.

The probability maps for spatial and standardised spatial coefficients can be found in spatCoef_<NAME_OF_COVAR>.nii and standCoef_<NAME_OF_COVAR>.nii respectively.

Qhat.dat and Qhat2.dat provide raw data for prediction and classification accuracy using an importance sampling approach (see Sec. 4.2 in the paper). For example, the total prediction accuracy for a given data set is simply the number of correctly classified subjects (predicted type == true type) divided by the total number of subjejcts.

Example Output

Below are example slices of some of the generated output files for the same slice position (voxel 34.6 40.1 50.9)

screen_prb_type1

a) empir_prb_type1.nii

b) prb_type1.nii

c) spatCoef_type1.nii

d) spatCoef_type2.nii

Convergence Diagnostics

MCMC algorithms need to be monitored for convergence. Since saving the chains for all parameters, i.e. voxels, is infeasible we recommend monitoring a group of (say 10) voxels in qualitatively different regions (e.g. regions with both high and low lesion load).

For details please see Sec. 4.4 in the paper.

Example Data

Download example data here ( (ZIP or other archive) demo-data) and extract the archive in a directory of your choice.

This dataset contains binary lesion data from 50 subjects (binLesionData_<subject_ID>.nii.gz) with multiple sclerosis. The subjects are categorised into two disease subtypes (25 relapsing-remitting MS, 25 secondary progressive MS).

The DESIGN file is called data_demo.dat and includes the following five demographic and clinical covariates: sex, age, disease duration, EDSS score, PASAT score.

Additionally, a whole-brain mask (mask.nii.gz) and a white matter mask (avg152T1_white.nii.gz) are provided.

Example Classification Results

As both groups contain the same number of subjects, there is no difference between uniform and empirical priors and thus Qhat.dat and Qhat2.dat give the same results. For this data set prediction accuracy for classification into one of the two types is very high (98%) with only one subject (subj ID 1013) misclassified.