Baking Even Better With ARIMA (part 1)

Assessing the influence of mass viral testing programmes on case positivity: are we experiencing a testdemic? (rev 1.1)

Jul 19, 2022

∙ Paid

With the arrival of some hot weather here in the UK climate alarmism has understandably gone through the roof, pushing out headlines on the Ukraine and COVID. Best not to waste a golden opportunity to peddle some globalism, I guess! I thus decided to place a damp napkin on my head and conjure some equally hot COVID-flavoured stats.

In my recent series on Baking Better with Cochrane-Orcutt (best to start reading that series at this point) I used an econometric technique to handle the problem of serial correlation within linear regression to ensure my bold claim of a testdemic was justified after a spate of generalised linear modelling (see here) revealed this could well be the case.

Introducing ARIMA

Cochrane-Orcutt gets round the problem by estimating the level of serial correlation using a single autoregressive (AR1) parameter, so why not go the whole hog and use Autoregressive Integrated Moving Average (ARIMA) time series modelling instead?!

ARIMA is a mighty powerful technique for handling time series data and is often used to determine the impact of interventions over time: for example, the impact of introducing seat belt laws. In our case we can use the technique to determine the impact of virus testing regimes on daily case counts.

The Procedure In A Nutshell

We start by selecting some data, and I opted to choose the period 30 Aug 2020 - 7 Jun 2022 to ensure the virus had got a good foothold on the nation and to ensure test protocols were firmly established.

We follow through by developing the most parsimonious ARIMA base model in the prediction of the dependent variable. After some thought I opted for new first episodes by specimen date to avoid the wrinkle of the counting of reinfections from Dec 2021 onward (see my short note here).

Once the best base model is established we then start adding in independent variables like new LFD tests by specimen date to see if they pop out as statistically significant predictors of new first episodes by specimen date. Once the hottest of hot expanded models is obtained we can then compare the expanded model with the base model to see if we’ve managed to explain anything extra by considering test activity.

Keep reading with a 7-day free trial

Subscribe to John Dee's Almanac to keep reading this post and get 7 days of free access to the full post archives.