Do COVID Cases Translate Into Deaths?
Using the cross correlation function to avoid nonsense
This may sound like a daft question to ask at this stage because we’ve seen COVID cases rise and fall and COVID certified deaths rise and fall. Given the tortuous and ever-changing definitions adopted by the various authorities surely nobody can deny that there is a strong correlation between daily counts of COVID cases and daily counts of COVID deaths at the national level: to query this a person must be mad!
A fanfare offstage. John Dee enters from stage left.
DEE: My name, good subscribers to my letters, is John Dee though this be of no consequence; for I shall wager that I am stark raving mad; thrice mad, in fact!
John Dee exits stage right. A cockerel is heard.
In a quaint English public house in rural England several years ago a well-dressed man sporting a fine silk scarf was blocking the entrance to the bar, ignoring all polite attempts to allow me to pass to get to my beer. Polite throat clearing done I barged through. That man was Nicholas Cage, and I’d like to start by having subscribers read about the correlation between deaths by swimming pool drowning and Nicholas Cage film appearances. Can you see where this may be going?
A Brief History Of Time Traps
When two quantities are moving steadily up and down over time like swimming pool drowning and Nicholas Cage film appearances they can become highly correlated in a statistical manner through no virtue other than they are moving up and down in time together. Two things going up together does not imply causality, neither does two things going down together; neither does one thing going up whilst another goes down. If the price of my beer has gone up over the last decade chances are that the price of Nicholas Cage’s fine silk scarf has gone up also because their common factor is inflation rising over time.
Only a mammering flap-mouthed clotpole would assume the price of silk scarves is forcing the price of beer (or the price of beer is forcing the price of silk scarves). But when it comes to assuming the daily COVID case count is forcing the daily COVID death count we suddenly stop being mammering flap-mouthed clotpoles and become sober citizens of the narrative. In our sobriety we are in danger of falling into the time trap and a long time ago I tried to explain this to young statisticians using this very diagram…
If this is bending your brain it may be best to think of a time series not as univariate data series changing in value over time but as a bivariate relationship between the variable of interest and another variable we call time. Thus, a plot of daily new COVID cases over time is effectively a plot of daily new COVID cases against time. Consequently, a plot of certified COVID death over time is a plot of certified COVID death against time. Since time is a common variable we will obtain a phantom correlation. To be precise we obtain a correlation that comprises a genuine correlation between COVID cases and COVID death that is embedded within a phantom correlation.
A fanfare offstage. John Dee and a citizen enter from stage left.
CITIZEN: How do we eliminate the phantom correlation?
DEE: By eliminating time!
CITIZEN: But how do we eliminate that which we call God’s arrow?
DEE: By taking the first order differential!
John Dee and citizen exit stage right. A cockerel is heard.
If the phrase first order differential sounds übergeek simply think of this as the difference between successive values in a time series. For example, if our time series was 100, 200, 400, 300 then the differences would be +100, +200, -100 and the latter three numbers would be our first order differential series. To avoid falling into the time trap we must first difference our time series data, any correlation that is then left is going to be the real McCoy.
Instant Death Trap
We may have avoided the time trap through the magic of the differential but another trap we need to avoid when we come to look at the correlation between the time series for daily COVID cases and the time series for daily COVID death is the trap of instant death. What I mean by this that if there are 1,000 COVID cases detected on 28th June we can’t expect 1% of deaths from these to also occur on the same day! An infection has to build, the body has to react, the situation needs to deteriorate, the person needs to be admitted to hospital, the hospital team needs to do what they can and the person needs to die. This takes time. How much time we cannot be exactly sure since there are reports of rapid deterioration and not so rapid deterioration, with a consensus around 14 days. Fortunately there is an elegant way of dealing with the time trap and the instant death trap in one swoop and that is resort to cross correlation function analysis (CCF) of the first order differential series.
Cross Correlation Function Analysis – Stage 1
Let’s wade straight in with a cross correlation function plot for the period Mar ’20 – Dec ’21 for the nation of England see if we can fathom a few things…
Anyone familiar with the ubiquitous Pearson correlation coefficient (a.k.a Pearson's r, the Pearson product-moment correlation coefficient, bivariate correlation or the correlation coefficient) is staring at this very value at lag zero. Lag zero is where we correlate COVID cases occurring on, say, Tuesday 20th June with certified COVID deaths occurring on Tuesday 20th June. If we want to correlate COVID cases occurring on Tuesday 20th June with certified COVID deaths occurring on Thursday 22nd June we require a positive 2-day lag. If we want to correlate COVID cases occurring on Tuesday 20th June with certified COVID deaths occurring on Wednesday 28th June we require a positive 8-day lag. Negative lags in this instance don’t make much sense since they’re correlating incidence of death before incidence of a PCR test for infection (though this could occur at post mortem).
See that tall red bar sticking out beyond the dashed 95% confidence interval boundary at +14 days? That is telling us that COVID cases correlate positively with certified COVID deaths after a delay of 14 days. There is a second significant correlation at a delay of 21 days. We may take these as being indicative of incubation times from positive PCR test to death for severe COVID cases. We may note that the correlation coefficients are not that strong (r < 0.20) so this is only a weak correlation that will be hampered by noisy data. What I mean by ‘noise’ in this instance are false positive test results, infection in younger generations who are not at all likely to die and infection in older generations who are not suffering from significant comorbidities and thus less likely to die. This makes total sense but hang on to your hats, we are going for a spin…
Cross Correlation Function Analysis – Stage 2
For the above slide I took pretty much the entire pandemic period of Mar ’20 – Dec ’21 to ease folk into looking at CCF plots. Let us now do the very same, but for each pandemic phase: first wave (Feb ’20 – May ’20), first respite (Jun ’20 – Sep ’20), second & third seasonal wave (Oct ’20 – Mar ’21), second respite (Apr ’21 – Jun ’21), fourth mini wave (Jul ’21 – Oct ’21), fifth seasonal wave (Nov ’21 – Feb ’22).
First wave
Here's the CCF plot for the first wave period. There are some interesting features in that we observe significant positive correlations at lags of 1, 2, 7 and 8 days. This makes sense because these first few cases were already in hospital and rather ill before COVID took hold; PCR testing was in its infancy and was used in response to onset of symptoms. Folk may ask what the positive correlations at lags of -7 and -19 days mean (deaths before testing) and the answer is we may be looking at spurious correlations as well as tests that were undertaken during post mortem.
First respite
Here’s the CCF plot for the first respite period. The really interesting feature here is that there is no interesting feature! There is absolutely no evidence of a correlation between daily COVID case counts and daily COVID death counts once we remove the time trap. This plot is telling us that how we went about testing the population during this period was completely pointless; that is to say, detected cases within the population at large bore no relationship to death as a clinical outcome. We need to note the phrase ‘at large’ because at the level of the individual there obviously will be a connection between severe COVID infection and death, but this is at the level of the individual. When we consider the population as a whole we discover such meaningful relationships are diluted into insignificance. This is what happens when you engage in blunderbuss mass testing programmes instead of testing people when they fall sick.
Second and third (seasonal) waves
The CCF plot for the second and third waves resembles the very first CCF plot presented in that we see statistically significant positive correlations at lags of 14 and 21 days. Note the coefficients for these are larger at around r = 0.35 owing to focusing the time window down to a particularly heavyweight period. During this seasonal peak we once again detect a correlation between daily COVID case counts and daily COVID deaths occurring 14 and 21 days later. The strongest correlation of r = 0.345 (n=182, p<0.001) at a lag of 21 days isn’t particularly strong and indicates that only 11.9% of the variance in daily certified COVID death can be explained by variation in the daily COVID case count. In plain English case counts are largely meaningless in terms of predicting death even during peak season!
Second respite
Here’s the CCF plot for the second respite period. Once again we are looking at the statistically insignificant effects of a blunderbuss. No targeting here, just scatter-gun testing that produced a whole load of meaningless numbers unless you are rather keen on fooling the public.
Fourth mini wave
Here’s the CCF plot for the fourth mini-wave, being an inexplicable creep in case counts over the summer months toward autumn. Variants were thrashing about amidst vaccination issues and more blunderbuss testing. There is one statistically significant positive correlation at a lag of 3 days, which is rather peculiar. This short delay can only come about if there was a tendency to test folk in the last few days of their life since it is generally acknowledged that it takes longer than 3 days for COVID to run its full course from detection to death in the majority of people.
Another peculiar feature of this plot is the palisade of statistically significant negative coefficients spaced at exactly 7 days. These are undoubtedly administrative artefact and it is rather interesting that they do not feature in other plots. This is the hallmark of a national policy that required employers, employees, schools, hospitals etc to test other people or themselves on a strict weekly basis regardless. These weekly pulses in detected cases have modulated the signal and have no sensible statistical relationship with death as the final clinical outcome. If anything they have muddied the water even more!
Fifth (seasonal) wave
Finally we arrive at the CCF plot for the fifth (seasonal) wave. I had fully expected to find a similar pattern to the third (seasonal) wave but, alas, we observe nothing but insignificance - even during the greatest peak in case counts to date! The vast case peak that outstrips everything that came before that sits boldly on the UK GOV coronavirus main page is the biggest load of hot air yet. What is happening here is that they are testing anything and everything that breathes and the consequence is complete annihilation of any case signal that actually correlates with death. We have reached a point where the case count means nothing.
The conclusion I must reach at this point is not a good one. Setting aside the clinical reality of a number of genuinely severe cases that end in death there is plentiful evidence here to suggest that the way we have gone about defining and counting COVID cases holds no clinical value whatsoever; it is an irrelevant and misleading statistic suited to scaring the public and that’s about it.










You say: "... there is plentiful evidence here to suggest that the way we have gone about defining and counting COVID cases holds no clinical value whatsoever; it is an irrelevant and misleading statistic suited to scaring the public and that’s about it..." - 100% true!
There are two strong reasons in defining and counting death cases as covid deaths:
1. Scare public and push it to vaccinate.
2. Hospitals (at least in U.S.) have very strong monetary incentive to qualify death as covid death.
Therefore, the only way to prove or disprove pandemic is to analyze an overall excess mortality regardless of the cause, removing of course a few categories like drug overdose, homicide, suicided, car and other accidental deaths.
Your vocation is as a stats professor! Professor John Dee has a lovely ring to it!!