# Using ARIMA To Investigate COVID Death (part 3)

### Cranking the handle on the latest daily data sitting in the UK GOV coronavirus dashboard

Today I am going to use the methodology developed for parts 1 and 2 of this series to look at daily certified death across England for the next slice, this being the period Sep 2020 - May 2021 which saw the second and third wave of cases, viral variants and vaccination roll-out (among other things). Patrick Swayze once took to *Dirty Dancing* but here we are ogling at dirty data - I have lost track of changes in definitions and other sleights of hand. It’s all in the hip movement below camera I guess. We shall start with one of those dual time series plots of deaths and cases for the period 3 Sep 2020 - 22 May 2021 (n=262):

This is a cracking mirrored pairing, with cases building in number a few weeks prior to the deaths that invariably followed. My eyes suggest a 3 - 4 week delay and they wouldn’t be wrong because we can reach for a cross-correlation plot that confirms this:

In this plot the tallest bars stick out at lags of 14 and 21 days, though we also see statistically significant correlations at lags of 0, 7, 23 and 29 days. This would suggest I need to dial in a 14 or 21 day case onset to death lag when I come to run the ARIMA model. The peculiar negative correlations (= more cases, less deaths or more deaths, less cases) may arise from genuine population level dynamics and/or artefact.

#### ARIMA Results

Being a belt and braces sort of guy I decided to run both 14 and 21 day lagged case counts to see which provided the lowest RMSE. The contest was won by the 21-day lagged case count series, which fits in with what our eyeballs suggest. Herewith the tabular heart of the model run:

Over in the column marked ‘Sig.’ we see a value of p=0.002 for 21-day lagged case counts which reveals that 21-day lagged case counts are a highly statistically significant predictor of certified death. The all important coefficient of determination that is derived by comparing the null model with the model incorporating case counts fetched-up at R-square = 0.236; that is to say, 23.6% of the variance observed for certified daily death for this period can be explained by the variation in case counts. This is a slight increase on the 19.0% found for the first wave (see part 1) but the broad comparability gives me confidence in both models: cases are translating into deaths in much the same manner.

##### The Pudding

Another tasty dish!

##### The First Jab

With a decent baseline model in our pocket providing a well-tasty pudding we can now go explore interesting things like vaccination roll-out. When I run ARIMA again with the time series for 21-day lagged cases and daily first dose we observe the following:

Thus daily counts of the first dose are not a statistically significant predictor of certified COVID death (p=0.829). Some may argue that initial doses are causing cases in which case we can run the model for first dose alone. This is what transpires:

Again we find no evidence of a statistically significant predictive series for the first dose (p=0.283).

##### Handling Delays

At this point we may concern ourselves with delays between vaccination and death (or lack thereof) in which case we can concoct the acid test of a cross-correlation for first dose with certified COVID death:

There’s nothing much happening here at positive lags (i.e. stuff happening *after *vaccination). A causal link between first dose harm and certified COVID death would result in statistically significant positive bars at positive lags. There’s a hint of such at a lag of 5 days that just crosses the 95% confidence threshold but that’s about it.

In terms of vaccine benefit (revealed by negative correlations at positive lags) all we see is a modest result at a lag of 1 day. When taken at face value this appears to be evidence of jab benefit; namely, *a decline in deaths following the first jab,* but if we stop and think about it we realise this is clinically impossible and the result must be artefact. What we haven’t got in this telling plot is a palisade of negative correlations that build from day 14 onward. With no obvious structure to the outcome of vaccination I decided not to introduce any lagged effects into the dose 1 independent series.

##### Swings & Things

There are some peculiar goings on in that we see two significant positive correlations at lags of -9 and -23 days (a rise in deaths happening *before* a rise jabs) but we must remember we are looking at trends in *population-level* numbers: whilst it is impossible for an individual to get jabbed after death, it is entirely feasible for death counts across the nation to rise before jab counts!

Swings at the population level will also explain the mind-bending negative correlations at negative lags. These point to vaccine benefits that accrued *before *death. Now THAT is what I call a darn clever vaccine! Either this is a numerical illusion arising from the interplay of two ever-changing population dynamics or vaccine manufacturers have access to a time machine. My concern is that pro-vax analysts will ignore such artefact and claim illusory benefits.

##### In Plain English

In plain English the initial jab simply didn’t provide any measurable benefit or significant harm during the second and third waves across England, which is also the conclusion we must come to if we think about the non-significant results attached to the independent variable *Dose 1 (million)* in the above ARIMA models. Bugger all use is a phrase that springs to mind!

In the next newsletter I shall reveal the results for the second jab. Until then…

**Kettle On!**

Excellent: the plot thickens - the game is on, Watson!