In part 7 of this series I introduced the potential of T4253H smoothing to handle outliers in the data originating from year end and holiday period certificate processing artefacts. I rumbled on about issues with the derivation of excess deaths using a 5-year means baseline (the method favoured by the Office for National Statistics) and pointed out the nonsense that will arise if disease of any sort comes along in an asynchronous manner. I also grunted on about unreliable certification of COVID death that will lead to wonky weekly figures and suggested we start looking at all cause death to avoid the mess. All cause death brings up the spectre of confounding factors and I so suggested judicious use of covariates to level the playing field. I ended by mentioning I had managed to secure weekly dose 4 and dose 5 data for NHS England by trawling though archives which may be found here.
This morning as I tackled the washing-up I realised that smoothing, though utterly magical, can lead to problems down the line with any form of time series analysis. An analogy is climbing up a telegraph pole that provides little iron stepping grips as opposed to a smooth pole that somebody has greased. Time series analysis is all about traction! Thus, I set about creating three (event) indicator variables that marked end of year periods, holiday periods and the remains of the monster 2020/21 death peak that coincided with the start of the vaccination programme in 2020/w50. These three indicators absorb the excessive variance brought about by administrative hiccoughs as well as mark a period of elevated death (COVID or otherwise).
But What About The Virus?
In addition to these three indicators of great cunning we need to concoct a covariate that represents what the virus was genuinely doing from 2020/w50 onward. I say ‘genuinely’ because a zillion people poking their nose tells us absolutely nothing about changes in disease prevalence, this being the rate of infection within the population. For this task I elected to use my trusty case detection rate (CDR) time series, this being COVID cases per 100 viral tests. I first introduced this handy variable in Hunting for Vaccine Benefit (part 2), though it gained fame in Vaccines & Death (part 3).
Bolting It Together
With three indicator variables and one covariate all ready for use in the prediction of all cause death across England for the period 2020/w50 - 2022/w33, all that remained was to crank the handle on the ARIMA machine to obtain a decent baseline model prior to introduction of the time series for vaccine dosing. It was then I accidentally pressed the wrong button and came face to face with an enigma or three…
Keep reading with a 7-day free trial
Subscribe to John Dee's Almanac to keep reading this post and get 7 days of free access to the full post archives.