Cross-Correlation: Cases, Admissions & Causality
Now that we’ve got the ball rolling using techniques from the world of signal processing I thought we might as well mix metaphors and dive right into the deep end to see whether COVID cases are translating into increased admissions to hospital.
I appreciate this is a tricky stats subject to get your head around so I’ve attached a very useful Wiki link for background. Back in the day time series techniques were taught in the third year for statistics undergrads and 99% of standard text books don’t cover them which is why you won’t see them popping up that often in published papers. Open any clinical paper at random and you’ll see Pearson bivariate correlation, ordinary least squares regression, logistic regression, analysis of variance, multinomial regression and the like despite use of these all resting on the assumption of independence of measure.
The first slide of the morning reveals the cross correlation function for daily new COVID cases with COVID admissions to hospital for England over the period Mar 2020 – Sep 2021. See that tall bar at lag zero reaching to r = 0.327? That is telling us there is an instant hike in admissions following a hike in new cases, but the correlation is modest at r = 0.327. See that tall bar at lag 8 reaching to r = 0.296? That is telling us that there is a delayed hike in admissions 8 days after a hike in new cases, same for a lag of 16 days and 21 days. So far so good, this is pretty much what we’d expect with folk getting sick over a week or two and ending in hospital. Now for the head bender…
See that tall bar at lag -7 days reaching to r = 0.255? That is telling us there is a hike in admissions 7 days BEFORE a hike in new cases. How can this be? A clue comes from the regularly spaced positive spikes which again points to week-on-week administrative procedures rather than anything the virus may be genuinely doing. Thus people are being squeezed into hospital in an elective manner and the blame is being put on COVID if they happen to test positive i.e. we are not looking at a purely causal relationship in the numbers presented to the public.
The negative spikes are another head bender. At a lag of 26 days we observe a whopping great negative correlation at r = -0.296. This means there is a drop in COVID admissions 26 days after a hike in cases. There’s another drop at a lag of 18 days, another drop at a lag of 12 days and a sizeable drop (r = -0.234) at a lag of 5 days. How can it be possible for there to be an inverse relationship between cases and admissions over time?
Look at the regular weekly spacing again for the negative bars, now look at how the positive bars flip to negative then positive in a steady flip-flop progression. This is the really big clue because I’ve seen this pattern many times before between two data series that are only vaguely causally-related. The pattern they present is largely an illusion generated by the passage of time, my favourite example being incidence of swimming pool drownings and films by Nicholas Cage.
What analysts are doing world-wide is confusing time-induced artefactual correlation with genuine causality because they want to see a result that makes sense and fit their neat viral models. Many are being paid to get those results and if I were in receipt of a grant right now there’s no way you’d be seeing this slide – I’d be showing you one that ‘proves’ that my grant award was well placed. All this being said there may well be some genuinely causal correlation here but it is buried beneath a pile of artefact that few are bothering to fathom.


