Against All Odds: An Empirical Investigation into the Factors Affecting the Probability of Being Shot Down in World War Two

by Robin Hudson[1], Department of Economics, University of Nottingham

Abstract

Millions of lives were lost during World War Two. This article examines a specific subset of lives lost: those of RAF Bomber Command from 1939 to 1945. In all 55,573 men died serving the bomber squadrons, embarking on courageous and dangerous mission often with little hope of success. By applying econometric techniques this article aims to examine the specific factors that caused such catastrophic losses. The probability investigation is extended to see just how likely a bomber crew was to be shot down in any given raid. Factors such as the time of day, the altitude of attack and the population density of the target are considered, and the results show exactly how the different variables influenced survival rates, with the time of day and altitude being the most significant factors.

Keywords: RAF, Econometrics, Probit Regression, WW2, Bomber Command, Probability

Introduction

When I began flying operations in March 1943 losses were averaging five percent a night. Our tour would consist of thirty operations and probability said we'd be lucky to make it to the twentieth raid.

L. F. Bradfield (2003) 49 Squadron, shot down 10 August 1943 on his sixteenth raid.

In the face of such daunting figures it is a wonder how any of the commanders could send out so many men, night after night, to their deaths. Bomber Command was formed in 1936 and, by the end of the war in 1945, some 55,573 men had lost their lives in its service (RAF, 2011). This article will examine the factors that caused these catastrophic losses of life and aims to determine the degree to which exogenous variables, such as the weather, and endogenous variables, such as the aircraft used, influenced survival rates. The main purpose is thus to determine whether Bomber Command could conceivably have reduced the number of men who lost their lives.

There is little existing literature that examines this subject, despite the fact that the Royal Air Force (RAF) employed a team of scientists and statisticians to conduct operational research from September 1941. The Operational Research Section (ORS) focused on increasing the destruction inflicted by Allied bombers on enemy targets (Wakelam, 2009). A subsection of the team was tasked with reducing aircraft losses; however it is the viewpoint of this article that they did not place enough emphasis on this task. The overall aim of the ORS was to improve efficiency, defined as increasing the tonnage of bombs dropped per aircraft lost (Wakelam, 2009). If Bomber Command could maximise destruction and minimise losses it could operate efficiently and potentially bring the war to an end. However, its ability to affect policy was questionable and its analysis failed to utilise fully the data that it was recording (Dyson, 1981). Hence this retrospective study aims to make full use of the relevant data and combine it with various discoveries since the war in order to come to a clear-cut, depoliticised conclusion.

Theoretical framework and data

It is important to define exactly what is meant by 'loss' for the purposes of this investigation. Losses can be viewed in a number of ways; the exact definition may refer to the number of aircraft that failed to return from a sortie or it could denote the number of individual crewmembers killed. When an aircraft was shot down it did not necessarily mean that all crew lost their lives; many parachuted to safety or were captured. Hence the accepted definition of loss in this article refers to the number of entire aircraft that failed to land back at base - generally denoted in the operational record books as 'Failed To Return' (FTR).

As this model has not been previously conceived it is not immediately apparent as to which variables will have high explanatory powers. Hence the data will largely dictate the initial independent variables to be included. A significant amount of the primary data was obtained from the original Operational Record Books (ORBs). These are kept at the National Archives in Kew on microfilm and contain a wealth of data relating to the individual operations. As there were over one hundred squadrons filling in ORBs each day from 1939-1945 it was not feasible to gather data for the entire population, consisting of 389,809 individual sorties (Middlebrook and Everitt, 2000: 707). Therefore a random sample of the population was drawn and three squadrons were selected to provide the data. This produced 1623 individual observations of which 129 were losses.

The observations required manually inputting the data from the microfilm scans, while simultaneously cross-referencing the dates with war diaries containing additional information. This ensured that the information was as accurate as possible and, in the case of any inconsistencies; an average could be taken from all available accounts of the operation. For the sample covered the average flight was a Lancaster aircraft flying at approximately 10,900ft, in medium cloud, with roughly 250 other planes, for a 5½-hour nighttime mission, bombing a medially populated area. All of the data gathered was then used to construct the desired model. The econometric model has the dependent variable, y, being binary. y represents the outcome of an operation, with y=1 pertaining to the 'failure' of a flight (shot down). This article considers nine independent variables, six of which are dichotomous: day, wind, blen, well, lanc_i and mine. The binary variable day is equal to unity if the flight took place during daylight hours, where 'daylight' is defined (in this article) as being over the target area at least 2 hours before sunset. This variable will give an idea of the difference between flying at daytime (day=1) compared to night (day=0).

The different types of aircraft required individual binary variables, leaving one out as the base group to avoid perfect co-linearity. As the sample selected covers four different aircraft (Blenheim, Wellington, Lancaster-I and Lancaster-III) three dummy variables (blen, well, lanc_i) were used leaving the Lancaster-III (lanc_iii) out as the control group. Thus the coefficients of the other aircrafts will show the difference in probability of failure between that specific aircraft and the Lancaster-III benchmark.

The binary variable mine denotes mining operations (mine=1) as opposed to bombing missions (mine=0), hence its coefficient will show whether the type of operation affected the probability of being shot down. The final binary variable, wind, denotes the introduction of the radar-scrambling device, Window, from 24 July 1943 onwards (wind=1) otherwise wind=0 (Middlebrook and Everitt, 2000: 411).

There are two variables for which continuous data was not available and hence they have been constructed trichotomously (visibility (cloud) and population (pop)). The data on cloud cover and visibility was combined then deconstructed into three groups; '0' for good or perfect visibility, '1' for average visibility and '2' for poor to zero visibility, which produced the variable 'cloud'. The categorisation was formed from the individual reports on cloud thickness and additional comments on the visibility over the target from both the ORBs and the war diaries.

The target population (pop) was similarly categorised into low, medium and high as measured by contemporary estimates, where 100,000 citizens or fewer was classified as 'low' (variable equal to 0), 100,001 to 500,000 citizens being 'medium' (1) and greater than 500,001 citizens being 'high' (2). The contemporary estimates were drawn from a study of Germany during WWII, and the category boundaries (low, medium, high) are exactly as defined by the author (Knopp, 2001).

The size of the force (force) was taken from The Bomber Command War Diaries (Middlebrook and Everitt, 2000), which provides daily figures of the aircraft attacking each location. This discrete variable corresponds directly to the number of planes flying out on the same mission, taking on values anywhere between 1 and 1055. The final two variables are continuous, taking on values other than pure whole numbers. They denote the length of the flight in hours (dura) and the altitude of attack in thousands of feet (alt). All of the data gathered is summarised below in Table 1.

Variable Mean Min Max St. Dev. Observations
failure 0.0794824 0 1 0.2705736 1623
alt 10.92017 .06 24.3 7.09831 1623
cloud 0.9235983 0 2 0.8047816 1623
force 251.1842 3 1055 249.6827 1623
dura 5.606593 2 11 1.565771 1623
pop 1.235367 0 2 0.8457079 1623
day 0.1041282 0 1 0.3055209 1623
wind 0.0837954 0 1 0.2771662 1623
blen 0.0653112 0 1 0.2471502 1623
well 0.3203943 0 1 0.4667719 1623
lanc_i 0.2236599 0 1 0.4168251 1623
lanc_iii 0.3906346 0 1 0.488043 1623
mine 0.0326556 0 1 0.1777883 1623

Table 1: Summary statistics for the data gathered

Empirical methods and model

The empirical model is of limited dependent variable form and will herein be modeled under the linear probability model (LPM), logit and probit functions. In order to conduct the regression, ordinary least squares (OLS) shall be employed for the LPM, and maximum likelihood estimation (MLE) for the logit and probit models. The statistical software used is STATA 12 (StataCorp., 2012). Each variable has its own coefficient (βi) denoting the effect that it has on the probability of failure. There is also an intercept value (β0) to prevent regression through the origin and to enable a benchmark failure probability to be estimated. The preliminary regression (Equation 1) was run in three forms and the results of each can be seen in Figure 1, 2 & 3 respectively, they are then compared side-by-side in Figure 4.

Equation 1:

failure= β0+ β1alt + β2cloud + β3force + β4dura+ β5pop + β6day + β7wind + β8blen + β9well + β10lanc_i + β11mine

Coef. P>|t|
alt -0.0101096 0.000
cloud -0.037961 0.000
force -0.000082 0.064
dura -0.0119746 0.013
pop 0.0186692 0.054
day 0.01797232 0.000
wind 0.0115501 0.643
blen -0.075668 0.074
well -0.1258673 0.000
lanc_i 0.0158763 0.429
mine -0.0138382 0.720
_cons 0.3121064 0.000

Figure 1: Results for the LPM. Coef. denotes the value of the coefficient, βi, for each variable. β0 is given by _cons, denoting the constant value.

Coef. P>|z|
alt -0.1040272 0.000
cloud -0.6172395 0.000
force -0.0029569 0.006
dura -0.1753292 0.022
pop 0.2557653 0.084
day 1.240943 0.000
wind 0.2780382 0.527
blen -1.141466 0.045
well -1.468348 0.001
lanc_i 0.0673689 0.854
mine -0.2110147 0.630
_cons 0.4291426 0.560

Figure 2: Results for equation 1 under the Logit model. The results are not directly comparable at this stage, however with a few alterations it will later be shown how they can be compared.

Coef. P>|z|
alt -0.0554868 0.000
cloud -0.2892341 0.000
force -0.0011229 0.013
dura -0.0864561 0.025
pop 0.1530681 0.048
day 0.7099118 0.000
wind 0.0530936 0.810
blen -0.4831314 0.093
well -0.7078556 0.000
lanc_i 0.1451363 0.386
mine -0.0457096 0.846
_cons -0.0670042 0.855

Figure 3: Results for equation 1 under the Probit model. The Coef. values will be manipulated in the results section of this article to show the size of impact that each variable has on the probability of failure.

LPM Logit Probit
alt -0.010*** -0.104*** -0.055***
cloud -0.038*** -0.617*** -0.289***
force -0.000 -0.003** -0.001*
dura -0.012* -0.175* -0.086*
pop 0.019 0.256 0.153*
day 0.180*** 1.241*** 0.710***
wind 0.012 0.278 0.053
blen -0.076 -1.141* -0.483
well -0.126*** -1.468*** -0.708***
lanc_i 0.016 0.067 0.145
mine -0.014 -0.211 -0.046
_cons 0.312*** 0.429 -0.067
Log Likelihood   -357.38398 -357.96513
Prob>chi2   0.0000 0.0000
LR-chi2   186.01 184.85

Figure 4: A comparison of the three different models of Equation 1. Estimated coefficients reported; * sig. at 5% level; ** sig. at 1% level; *** sig. at 0.1% level.

As previously mentioned, the coefficients of the logit and probit models are not directly comparable without individual manipulations; however, while it is not possible immediately to infer the size of the effects it is possible to determine their sign and significance. It is reassuring to see identical coefficient signs for each variable under the three different models; a negative coefficient is indicative of negative correlation to the probability of failure and the converse is true of a positive coefficient. The significance levels are also very similar when considering each variable under the different models.

It is immediately clear that both wind and mine appear to have little effect on the probability of being shot down in this model. As mining missions formed fewer than 5% of all observations this does not come as much of a surprise; a wider collection of mining observations would be required to say definitively whether or not the type of mission affected the probability of being shot down. Window was expected to reduce losses however it has an extremely high P-value and hence is not significant in any of the models. One explanation of this may lie in the alternative technologies being implemented by the Germans at the same time (RAF, 2005). The decision was therefore made to drop the variables wind and mine.

lanc_i also appears to be insignificant and shall be removed from the proceeding model. Incidentally; the Lancaster-III was only a minor development of the Lancaster-I, having newer Merlin engines but otherwise being identical (RAF, 2005). Hence it is possible to combine the two and create one dummy variable, thus making all Lancaster aircraft the base group. The coefficients of blen and well therefore show the divergence in probability from the Lancaster control group.

Population is borderline significant at the 5% level so will be left in as its explanatory power may increase when the irrelevant variables are dropped. As expected altitude appears to be negatively correlated to the probability of being shot down as does cloud cover and the size of the force. Interestingly duration has a negative sign, indicating that the longer the aircraft were flying for the less chance they had of being shot down: this is contrary to the initial predictions. Extensive commentary on all coefficients, their values and discussion of logit and probit manipulations will be conducted once the set of variables has been settled upon. Now that the individual variables have been examined, it will be informative to consider the model as a whole.

The log-likelihood values for the logit and probit models can be used to construct the Likelihood Ratio Chi-Squared statistic to test if all variable coefficients are simultaneously zero. The Prob>chi2 tells us the probability of observing a value as extreme as the LR-chi2 under the null hypothesis (Wooldridge, 2009: 581). As these values for both models are 0.0000 we can conclude that at least one of the coefficients in these models is statistically different from zero, thus rejecting the null hypothesis of jointly insignificant variables at the 0.1% level. The log-likelihood value can also be used to compare two models and determine joint significance of omitted variables; this shall be conducted after analysing the modified regression (Equation 2).

Equation 2:

failure= β0+ β1alt + β2cloud + β3force + β4dura+ β5pop + β6day + β7blen + β8well

Coef. P>|t|
alt -0.0103843 0.000
cloud -0.0376896 0.000
force -0.0000878 0.031
dura -0.0127211 0.008
pop -0.0193000 0.044
day 0.1806444 0.000
blen -0.0893907 0.017
well -0.1375313 0.000
_cons 0.3282849 0.000

Figure 5: Results for the secondary equation under the LPM. Note the reduced number of variables, having dropped those that were insignificant in the preliminary model.

Coef. P>|z|
alt -0.1029829 0.000
cloud -0.6043778 0.000
force -0.0030801 0.003
dura -0.1878656 0.012
pop 0.2649053 0.069
day 1.226018 0.000
blen -1.217718 0.003
well -1.551334 0.000
_cons 0.5590895 0.332

Figure 6: Results for the secondary equation under the Logit model. It is also interesting to note the minor change in values for the coefficients now that the insignificant variables have been dropped.

Coef. P>|z|
alt -0.0579073 0.000
cloud -0.2845579 0.000
force -0.0012371 0.005
dura -0.0931593 0.014
pop 0.1586388 0.038
day 0.7176236 0.000
blen -0.6265824 0.007
well -0.8322317 0.000
_cons 0.1000143 0.744

Figure 7: Results for equation 2 under the Probit model. The significance levels have also improved from the preliminary model.

LPM Logit Probit
alt -0.010*** -0.103*** -0.058***
cloud -0.038*** -0.604*** -0.285***
force -0.000* -0.003** -0.001**
dura -0.013** -0.188* -0.093*
pop -0.019* 0.265 0.159*
day 0.181*** 1.226*** 0.718***
blen -0.089* -1.218** -0.627**
well -0.138*** -1.551*** -0.832***
_cons 0.328*** 0.559 0.100
Log Likelihood   -357.80245 -358.46298
Prob>chi2   0.0000 0.0000
LR-chi2   185.17 183.85

Figure 8: A comparison of the three different models of Equation 2. Estimated coefficients reported; * sig. at 5% level; ** sig. at 1% level; *** sig. at 0.1% level.

All variables are now significant at the 5% level under all models except the logit of population, which is significant at the 10% level. The log-likelihood values can be used to construct the likelihood ratio statistic (LRS) to test for the joint significance of variables in much the same way that an F-test is conducted under OLS. Wooldridge (2009: 580) explains how the LRS is just twice the difference in the log-likelihoods of the unrestricted and restricted models. Hence if we consider the initial (unrestricted) probit model, with a log-likelihood of -357.97, and compare it to the second (restricted) probit model with a log-likelihood of -358.46 we obtain an LRS of 0.49.

The multiplication by two is required so that the LRS has an approximate chi-squared distribution under the null hypothesis and hence if we are considering three restrictions (mine, wind, lanc_i) a chi-squared distribution can be used, generating a P-value of 0.8023, hence we can conclude that the omitted variables added no significant explanatory power to the model; justifying their exclusion ex-post. The model's finalised form is therefore that of Equation 2.

In order to interpret these coefficients fully it is first important to understand the nature of each model. We are interested in the marginal effect of xi on the probability of y=failure taking the value 1, which is given by the partial derivative of the probit or logit model with respect to xi (Wooldridge, 2009: 574). The LPM treats the marginal effects as constant and as such the coefficients are directly comprehendible as the marginal effects on y of a change in xi. However the logit and probit functions are non-linear, hence why MLE has been used. MLE finds the value of βi that maximises the likelihood of the observed yi given xi (Maddala, 1996). This in turn produces logit and probit estimators that are asymptotically normal and account for heteroskedasticity as MLE is based on the distribution of y given x, hence they are asymptotically efficient (1996: 71). The resulting marginal effects for the average observation in the sample are referred to as the Partial Effects at the Average (PEAs) and can be computed in STATA. The hypothesis tests on the coefficients are conducted in exactly the same way as they are under OLS. The following section will explore the results obtained from these methods of calculation and thus determine the predictive power of the constructed model.

Results

The probit's PEAs are displayed in the dy/dx column of Figure 9, denoting the partial derivative of the model with respect to each xi. Multiplying these values by one hundred gives the percentage change in probability of failure when the variable increases by one unit, given the fixed values denoted in column X.

Y = Pr(failure) (predict) = 0.07603754

Variable dy/dx P>|z| X
alt -0.0082834 0.000 10.9202
cloud -0.0407046 0.000 0.923598
force -0.000177 0.005 251.184
dura -0.013326 0.024 5.60659
pop 0.0226925 0.040 1.23537
day 0.1613855 0.001 0
blen -0.0562819 0.000 0
well -0.064265 0.000 0

Figure 9: Marginal effects at the mean variable values for Equation 2 under the Probit model.

As an average for a binary variable is not very informative medians have been used for blen, well and day and hence they have all been fixed at zero as suggested by Wooldridge (2009: 581). Thus their coefficients denote the effect on the probability of failure for a change in their value from 0 to 1. The other coefficients show the marginal effects at the average, i.e. for a Lancaster aircraft flying at approximately 10,900ft, in medium cloud, with roughly 250 other planes, for a 5½-hour nighttime mission, bombing a medially populated area.

A cursory examination shows that flying in daylight (versus nighttime) would increase the probability of being shot down by approximately 16.14% holding other factors fixed. This result is significant at the 0.1% level. It can also be seen that flying in a Blenheim or Wellington (as opposed to a Lancaster) decreases the probability of being shot down by approximately 5.63% and 6.43% respectively. These are similarly significant at the 0.1% level as the P-values in the P>|z| column show.

Figure 10 shows a comparison of the three models' PEAs (without the median adjustments to the dummy variables). This allows the predicted probability of failure at the exact average to be observed.

LPM Logit Probit
alt -0.0104*** -0.0044*** -0.0058***
cloud -0.0377*** -0.0256*** -0.0284***
force -0.0001* -0.0001*** -0.0001**
dura -0.0127** -0.0079* -0.0093*
pop 0.0193* 0.0112 0.0158*
day 0.1806*** 0.0828** 0.1124**
blen -0.0894* -0.0331*** -0.0400***
well -0.1375*** -0.0549*** -0.0681***
Pr(failure) 0.0795 0.0443 0.0479

Figure 10: A comparison of the marginal effects at the mean of the three different models of Equation 2. Estimated coefficients reported; * sig. at 5% level; ** sig. at 1% level; *** sig. at 0.1% level.

The Pr(Failure) denotes the predicted probability of failure for the fixed variable values (the averages in this instance). The logit and probit estimates are fairly similar at 4.43% and 4.79% respectively, which are close to the average 5% loss rates of the time. The LPM is slightly higher (7.95%) and gives the exact ratio of failures to successful flights from the data set.

The coefficients of the logit and probit models are fairly similar as the econometric literature suggests (Maddala, 1996). There is however a divergence between these models and the LPM, most notably on day and well. If we examine the variable coefficients individually we can see that, according to the LPM, increasing altitude by one unit (1,000ft) approximately reduces the probability of being shot down by 1%, holding other factors fixed. As this model assumes constant marginal effects this is true whether the aircraft was flying at 100ft or 20,000ft, implying that changing altitude from 1,000ft to 21,000ft would reduce the probability of being shot down by 20%, which may not be a realistic assumption. It is more likely there are diminishing marginal returns to flying at higher altitudes, which shall be explored in depth later. The probit model predicts that flying 1,000ft above the average reduces probability of being shot down by approximately 0.58%, all else being equal. This implies that flying at 12,000ft as opposed to 11,000ft (the approximate average) reduces the chances of being shot down by 0.58% or, conversely, that flying at 10,000ft as opposed to 11,000ft would increase the chance of being shot down by 0.58%. All these results are significant at the 0.1% level, regardless of the model considered.

As it is widely accepted that the probit and logit models do not differ greatly, and having shown the similarity of their results, the focus will now be on the probit model as it is considered to marginally better (Maddala, 1996). The LPM has been useful as a preliminary investigator; however as it is inherently linear in its prediction of the marginal effects the consideration of results shall now solely be with the probit model.

The primary results of the PEAs give a good overview of the marginal effects and their significance, however in order to build up a better understanding of these effects it will be informative to fix the variables at levels other than their averages. It is possible to compute the marginal effects for any given variable values and hence observe how these effects change with the variables. Figure 11 explores the marginal effects at various altitudes.

Alt_100ft Alt_1000ft Alt_10000ft Alt_20000ft
alt -0.017** -0.016** -0.009*** -0.003***
cloud -0.082*** -0.079*** -0.044*** -0.017**
force -0.000** -0.000** -0.000** -0.000
dura -0.027* -0.026* -0.014* -0.005*
pop 0.046 0.044 0.024* 0.009*
day 0.255*** 0.249*** 0.170*** 0.082*
blen -0.134** -0.127** -0.062*** -0.020**
well -0.159*** -0.150*** -0.070*** -0.022**
Pr(failure) 0.2102 0.1955 0.0840 0.0251

Figure 11: Marginal effects for Equation 2 under the Probit model; comparison of different altitudes. Each column represents the differing altitudes. Estimated coefficients reported; * sig. at 5% level; ** sig. at 1% level; *** sig. at 0.1% level.

By systematically varying altitude while fixing all other variables at their averages (binary variables set to medians) we can see the marginal returns at specific altitudes. The coefficients of alt show how, at 100ft, the marginal return to flying 1,000ft higher decreases the probability of being shot down by approximately 1.7% (all else being equal); a similar return is true at 1,000ft (1.6%). However at 10,000ft the return to flying 1,000ft higher diminishes to 0.9% and at 20,000ft the effect is 0.3%; these are all statistically significant at the 1% level. Interestingly the coefficients of day show how flying at 100ft in the daytime (as opposed to nighttime) increase the probability of being shot down by a remarkable 25.5%. At 20,000ft the marginal effect is less, yet still significant and substantial at 8.2%. The diminishing marginal effects for all variables are graphically displayed below (Figure 12).

Figure 12: Graphical presentation of the marginal returns to increased altitude in actual values.

Figure 12 highlights how day has the largest impact on the probability of being shot down. At low altitudes (anything less than 820ft) this model predicts that flying in the daytime increases the probability of being shot down by at least 25%, all else being equal. These coefficients are highly significant at the 0.1% level. The Wellington and Blenheim both have high, negative coefficients predicting that the probability of being shot down was higher for a Lancaster (the control group). The Wellington appears to have been the safest plane to fly, offering an approximate 15% reduction in probability of being shot down on an average flight at 1000ft when compared to a Lancaster on the exact same mission. However as the altitude increases all three planes converge until, at around 24,000ft, there is little difference between them.

Cloud cover at 100ft has a significant impact (at the 0.1% level), implying that a change in cloud cover from average to cloudy would offer a reduction in probability of being shot down by approximately 8.2% or, conversely, if the cloud cover dropped from average to clear it would increase the probability of being shot down by 8.2%. At the higher altitudes this effect diminishes to around 1%. Duration and force have minimal effects but shall be discussed in more depth later.

As expected higher population results in increased probability of being shot down and the results show how, at 100ft, increasing the target population by one unit (from 100,001-500,000 citizens to 500,001+ citizens) increases the probability of being shot down by approximately 4.6%, however this is only significant at the 10% level. The reason for this may lie in the large population boundaries used; hence a more precise measurement would be required in order to refine this model further. Having examined the marginal effects at various altitudes it is possible to speculate on the predicted probability of failure of a hypothetical flight, which shall be analysed in the following section.

Analysis and discussion

If each variable is assigned its extreme value it is possible to predict the probability of being shot down on the most dangerous flight. A lone daytime flight in a Lancaster, bombing a high population target from 50ft with no cloud cover comes out with an approximate 85% predicted probability of failure (Figure 13).

Y = Pr(failure) (predict) = 0.85027737

Variable dy/dx P>|z| X
alt -0.0134849 0.004 0.05
cloud -0.0662653 0.007 0
force -0.0002881 0.027 1
dura -0.0216941 0.000 1
pop 0.0369424 0.022 2
day 0.2247615 0.000 1
blen -0.1907985 0.002 0
well -0.2689103 0.000 0

Figure 13: Marginal effects for the Probit estimation where each variable is taken to its extreme.

However, although an astonishing and harrowing prediction, this may be unrealistic for a number of reasons. There was seldom a lone bombing mission, and virtually none at 50ft over a heavily populated area. Hence Figure 14 compares the coefficients and predicted probabilities from three real missions; one of the most catastrophic (worst); the closest observed operation to the mean (average) and one of the most successful (best).

One of the worst cases was a daytime raid of eleven Lancasters on 17 April 1942 over Augsburg, at 200ft with no clouds. Here the model predicts an approximate 71% probability of failure (eight aircraft predicted to be shot down). On this day seven of the eleven planes (64%) were shot down (Middlebrook and Everitt, 2000: 258), thus highlighting the model's predictive power. Had the same operation been carried out at night the model predicts a 43% loss rate (five aircraft lost), had it also been conducted at 20,000ft the model predicts a 9.5% loss rate (one aircraft lost). Hence, on this mission alone, it may have been possible to save six aircraft (42 men).

One of the most successful missions observed was a night flight in a Lancaster over Essen on 23 October 1944 at 22,000ft in thick cloud with 1055 planes. On this night only eight aircraft out of the entire force were shot down (0.7% lost) (Middlebrook and Everitt, 2000: 606). Here the model predicts a loss rate of 0.69%, which is extremely close to the realised value and further highlights the model's predictive power. These two examples give a good indication that the variables contained within the regression have jointly high explanatory power.

worst average best
alt -0.020*** -0.006*** -0.000
cloud -0.097*** -0.028*** -0.001
force -0.000** -0.000** -0.000
dura -0.032** -0.009* -0.000
pop 0.054* 0.016* 0.000
day 0.275*** 0.112** 0.006
blen -0.239** -0.040*** -0.001
well -0.320*** -0.068*** -0.001
Pr(Failure) 0.7101 0.0479 0.0069

Figure 14: Probit estimation of Equation 2 with marginal effects in three different missions; the worst, the best and an average flight. Estimated coefficients reported; * sig. at 5% level; ** sig. at 1% level; *** sig. at 0.1% level.

From Figure 14 it is apparent that in the best-case scenario the coefficients are extremely small and not statistically different from zero at the 10% level. In these cases the model, while able to predict accurately the probability of being shot down, is not able to attribute significant effects to the variables under consideration. Thus the factors affecting the probability of being shot down in these cases are likely to be made up of omitted variables and random, unaccountable factors. There will always be a stochastic element that plays its part, however it is assumed to be fairly minimal in this model, thus it should be possible to accurately predict and suggest ways of reducing the probability of being shot down.

Had this model been constructed contemporaneously the commanders could have adjusted their operational instructions in order to reduce the predicted probability of losses. For example if Bomber Command ordered a night raid on Berlin, from 20,000ft, in a Lancaster with a force of 250 planes, in medium cloud cover, the model predicts a 3.2% probability of failure (eight aircraft lost). In order to reduce this figure the endogenous variables such as altitude, timing of the raid, size of the force and type of aircraft could have been adjusted accordingly.

Some of the factors were, of course, exogenous to the commanders' choices such as cloud cover, population and distance to the target. However, although variables like cloud cover were exogenous it would still have been informative to calculate an approximate range of loss rates, depending on the feasible cloud formations. In the aforementioned mission the model predicts a range of losses from 1.7% (cloudy) to 5.9% (clear). Hence, even though Bomber Command could do nothing to influence the weather, they could have weighed up the costs of bombing in clear conditions against the benefits of destroying a German target and considered rescheduling to a day with favourable weather.

It is interesting to note in the above instance that if they wanted to reduce the predicted loss rate they could double the force from 250 to 500 planes and the model predicts a 1.6% loss rate, however this still equates to eight aircraft lost; exactly the same number predicted to be shot down in a force of 250 planes. This point has poignancy as it highlights the disregard for life that statistical handling permits. The analysis in this form is insensitive to individual lives; on paper it appears as if the loss rates has been cut in half, however in reality it transcribes to 56 lives lost in both scenarios. In order to determine how many lives might have been saved it will be necessary to examine the composition of raids.

Daylight raids made up 10% of the sample considered in this article and overall accounted for 20% of all sorties flown (Middlebrook and Everitt, 2000). Under this model daytime raids have been shown to increase the loss rate by approximately 16% on the average flight, equating to a 3.2% increase in total losses throughout the war. In all, 8953 aircraft were shot down (Middlebrook and Everitt, 2000: 707), thus if no daytime raids had taken place this model calculates that 287 less aircraft would have been shot down and the lives of 1779 men might have been spared (assuming 6.2 deaths per plane crash on average (Middlebrook and Everitt, 2000: 707). There were, however, inevitable trade-offs between daytime and night missions. While aircraft losses were decreased during night raids, bombing accuracy also diminished as targets were less readily identified. One commander called it '[…] a never-ending struggle to circumvent to law that we cannot see in the dark' (Wakelam, 2009: 20).

When considering altitude, if the average flight was raised from 10,000ft to 20,000ft the model predicts a drop in the loss rates by 6%, all else being equal, equating to 537 planes and 3330 men. As discussed, this altitude increase would also reduce the variance in loss rates between the different aircraft, allowing the commanders to safely use the different planes interchangeably. Combining the daylight and altitude results yields 824 planes and 5109 lives that could have been spared. However, the same trade-off between accuracy and loss rates also existed with altitude. Lower altitudes allowed greater accuracy of results, as pilots could get much closer to their target, whilst higher altitude offered greater protection from ground armaments. Though, in 1941, there was a shift towards area bombing (Wakelam, 2009: 22), which required less accurate results and hence high-altitude night flights could have been sustained without hindering the offensive campaign.

While acknowledging these trade-offs, it is also important to note that the RAF had limited resources. The number of serviceable aircraft limited their bombing capacity and restricted their ability to send out the safest planes. However, as has been shown, provided the altitude was high enough the difference in loss rates between the aircraft was fairly negligible. Thus, all of these factors could have been controlled for to varying degrees in order to reduce the loss rates sustained.

Conclusion

Bomber Command undoubtedly played a crucial role in WWII, maintaining an offensive assault on German soil by destroying many key military targets and civilian morale, but it cost the RAF 55,573 lives. This article has identified some of the key factors that influenced this figure. These factors, while being made up of exogenous and endogenous variables, were well within the hands of the commanders to control. The catastrophic losses sustained during daytime and low-altitude raids could have been mitigated via a more in-depth analysis of the available data. There is no doubt that, had these types of raid not been conducted, the death toll would have been considerably less. The question of course remains as to whether this would have been to the detriment of the offensive campaign on German soil. In order to answer this fully it would be necessary to examine the amount of destruction inflicted in a given raid, to see just how significant the trade-off between safety and accuracy was.

While the constructed model has been shown to have a highly predictive power, there are a number of alterations that would strengthen it. Measurements, such as population, could be adjusted to include exact figures, thus allowing for more precise analysis. There is also scope for incorporating additional variables to investigate the effectiveness of the new technologies that were developed. However, as they were not standard fittings across all aircraft, it would take many hours of research to determine where they were installed. This presents one of the biggest constraints for additional investigation, as all of these improvements would require many more hours of research.

Thus, given the constraints, the model presented has incorporated all feasible variables and data sets. When applied to actual operations it has accurately calculated the number of aircraft that were shot down. Through this methodology it has shown how the lives of over 5000 men might have been spared. However, the question of whether the 55,573 death toll could have been significantly reduced not only comes down to the statistical analysis constructed above, but also to the moral views of whether those lives lost were a fair price to pay for the freedom of the world from tyranny.

List of tables

Table 1: Summary statistics of the data gathered from the National Archives

List of figures

Figure 1: LPM Estimation of Equations 1 STATA Results

Figure 2: Logit Estimation of Equations 1 STATA Results

Figure 3: Probit Estimation of Equations 1 STATA Results

Figure 4: Comparisons of Estimations of Equations 1 STATA Results

Figure 5: LPM Estimation of Equations 2 STATA Results

Figure 6: Logit Estimation of Equations 2 STATA Results

Figure 7: Probit Estimation of Equations 2 STATA Results

Figure 8: Comparison of Estimations of Equations 2 STATA Results

Figure 9: Marginal Effects After Probit

Figure 10: Marginal Effects Comparison of Three Models

Figure 11: Marginal Altitude Effects Probit STATA Results

Figure 12: Graphical Marginal Altitude Effects Probit STATA Results

Figure 13: Extreme Marginal Effects After Probit

Figure 14: Best, Median and Worst Flights; Probit STATA Results Comparison

Notes

[1] Robin Hudson recently completed his undergraduate degree in Economics BSc at the University of Nottingham, obtaining first class honours. He was awarded the Oliver Pawels economics prize and Best Economics Dissertation, 2013. He now plans to spend time developing business ideas and expanding his existing business, The Last Armoury; an antique arms and armour website that he set up whilst at university.

References

Bradfield, L. F. (2003, July 24), 'Life in Bomber Command', [Interview] With Hudson, R., Norwich, Norfolk

Churchill Centre And Museum (2008), 'The Few', available at http://www.winstonchurchill.org/learn/speeches/speeches-of-winston-churchill/1940-finest-hour/113-the-few, accessed 16 March 2013

Dickins, B. D. (1946), 'Operational Research In Bomber Command: Chapters 1 to 10', [Manuscript] The Ronald W. Shephard Operational Research Archive, Laurier Centre for Military Strategic and Disarmament Studies, Wilfrid Laurier University. Box 2-BC, File 0032

Dyson, F. (1981), Disturbing The Universe, London: Pan Books

Dyson, F. (2006), A Failure of Intelligence, available at http://www.technologyreview.com/article/406789/a-failure-of-intelligence/, accessed 11 December 2012

Knopp, G. (2001), Der Jahrhundertkrieg, Berlin: Ullstein Taschenbuchvlg

Maddala, G. S. (1996), Limited-Dependent And Qualitative Variables In Econometrics, Cambridge: Cambridge University Press

Middlebrook, M. and C. Everitt (2000), The Bomber Command War Diaries - An Operational Reference Book 1939-1945, Leicester: Midland Publishing

RAF (2005), 'The Avro Lancaster' available at http://www.raf.mod.uk/rafbramptonwyton/aboutus/avrolancaster.cfm, accessed 20 February 2013

RAF (2005), 'The Ju.88 and the Schrage Musik System', available at http://www.raf.mod.uk/bombercommand/ju88.html, accessed 18 November 2012

RAF (2008), 'Tactics of Electronic Warfare', available at http://www.rafbombercommand.com/tactics_elecwarfare.html, accessed 14 November 2012

RAF (2011), 'Bomber Command Memorial', available at http://www.rafbf.org/1794/bomber-command-memorial.html?gclid=CPfe6ZbxxLMCFUbKt AodnQUA0Q, accessed 13 November 2012

Wakelam, R. T. (2009), The Science of Bombing - Operational Research in RAF Bomber Command, Toronto: University of Toronto Press

Wooldridge, J. G. (2009), Introductory Econometrics - A Modern Approach, Canada: South Western Cengage Learning

To cite this paper please use the following details: Hudson, R. (2013), 'Against All Odds: An Empirical Investigation into the Factors Affecting the Probability of Being Shot Down in World War Two,' Reinvention: an International Journal of Undergraduate Research, BCUR/ICUR 2013 Special Issue, http://www.warwick.ac.uk/reinventionjournal/issues/bcur2013specialissue/hudson/. Date accessed [insert date]. If you cite this article or use it in any teaching or other related activities please let us know by e-mailing us at Reinventionjournal at warwick dot ac dot uk.