Baking Better With Cochrane-Orcutt (part 1)
Tackling serial correlation in generalised linear modelling of rolling 7-day new cases detected (rev 1.0)
Sausage Rolls & Tarts
Imagine you are a judge at a country fair and an entrant in the baked savoury goods section hands you a 2 foot sausage roll. If that first bite tastes heavenly then the second bite will taste heavenly, and so will the third bite until you’ve scoffed all 24 inches. This is akin to the phenomenon of serial correlation (a.k.a. autocorrelation) that lurks within the vast majority of time series data.
Now imagine moving along to the baked sweet goods section to face a table laden with 24 individual tarts made by 24 individual bakers. Some are going to taste heavenly, some will have soggy bottoms and some will have burnt edges that the baker tried to scrape off. This is akin to data that has been randomly sampled from a large population: you never know what you are going to get on any given day!
In my newsletters Pandemic En Croûte and Pandemic En Croûte – another slice I committed the cardinal sin of ignoring serial correlation within the rolling 7-day new cases detected data series that was subject to generalised linear modelling (GLM). The sausage roll tasted heavenly from start to finish and was always going to win best prize for testdemic of the millennium.
It is quite acceptable to ignore serial correlation when modelling data provided you have a priori (a.k.a. jolly good) reasons for doing so since there is a tendency for the issue to end in minor academic argument and/or very little to show for addressing the problem. That being said I can’t get enough baked goods so always prefer to visit the next table.
In Search of ρ for rolling 7-day new cases detected
This morning I am going to take a leaf from Neter et al (1996) and go in search of ρ (Greek letter rho), this being an index measure of autocorrelation within a data series whereby a value of 1.0 means today’s value is precisely the same as yesterday’s (perfect serial correlation or heavenly sausage roll). Cochrane-Orcutt Estimation is the place to start, though later we will also go and taste goodies baked by Prais-Winsten.
For Cochrane-Orcutt we kick-off by grabbing the residuals left over from the modelling process (a time series denoted by the letter ‘e’) and running them through this machine…
Keep reading with a 7-day free trial
Subscribe to John Dee's Almanac to keep reading this post and get 7 days of free access to the full post archives.