# Missing Values from the Index

One of the main problems we encounter when dealing with a large dataset (covering more than 20 years, over 200 countries and several different variables) is the presence of a large number of missing values.

In order to obtain a numeric  value for our index for a country in a specific year we must have non-missing values for every variable used to calculated the index.  In other words, suppose  that only one of the sixteen variables in Table 1 above is not available for a country in a year,  then the value of the globalisation index for that specific year and country cannot be calculated. This clearly causes a large amount of information to be “lost”.

We  deal with this problem by linear interpolation. This works as follows. Consider the following artificial example, where the observation for the “trade” variable is assumed missing in 1999 and 2000.  Then,  linear interpolation would provide the two values for trade in 1999 and 2000 of

###### Year

1998

.6

.6

1999

missing

.7

2000

missing

.8

2001

.9

.9

Generally, the missing values are assumed to be equal to the initial observation (here 0.6), plus a fraction of the difference between the initial observation and the next available observation (here this difference is 0.3=0.9-0.6).

The fraction is calculated as follows. In any year Y, the fraction is f= (Y-1998)/(2001-1998).  So, if Y = 1999, f= 1/3, and if Y = 2000, f=2/3.  So, this gives a value of trade in 1999 of 0.6 + 0.3/3 = 0.7, and a value of trade in 2000 of 0.6 + 2(0.3)/3 = 0.8, as shown.

Another problem we face is that for many countries, some variables but not others in the index are available for the most recent two or three years, due to different lags in the production of different kinds of data.  We deal with this problem by extrapolation. Specifically, we extrapolate by assuming that the variable takes the value of the last year available.