A step toward understanding (part 3)
Using factor analysis to get a handle on the pandemic
In A step toward understanding (part 1) I kicked off by boiling down 38 pandemically-flavoured independent variables into a mélange of just 5 orthogonal factors I called nationwide testing, COVID cases + clinical tests, COVID deaths, prevalence + CFR, and in-hospital test rate. These five beauties explained 93.4% of the variation available in the 38-dimension dataset and glistened like edible jewels in the sun.
In A step toward understanding (part 2) I used the same technique to boil down 27 NHS England hospital activity variables into a set of just 3 orthogonal factors I called COVID workload, non-COVID workload and Case balance. These three gems explained 93.5% of the variation available in the 27-dimension dataset.
Pearson Correlation
This morning I am going to bring these 8 factors together and see how they mesh, and I shall start with something as plain and simple as a Pearson bivariate correlation matrix with a dash of colour1:
First row…
Now this is rather interesting. We observe a weak negative correlation between the COVID workload faced by hospitals across England and the nationwide (pillar 2) test machine (r = -0.154, p<0.001, n=609). This has likely arisen because the outburst of testing everybody who moved - especially those in education - grew after the first wave had declined. So much for the value of community testing! We also observe a positive correlation between the non-COVID workload faced by hospitals across England and the nationwide (pillar 2) test machine when we’d expect a negative correlation (r = 0.585, p<0.001, n=609). This suggests to me that this correlation has come about by happenstance in that hospitals started treating patients of all kinds again at the same time that community testing kicked-off. This result is coupled to the negative correlation sitting to the right that indicates an increase in community testing associated with a case balance moving in favour of COVID cases as front door admissions rather than COVID cases as inpatients; that is, more people came in to hospital - COVID or otherwise - when the doors were opened again. This sort of bias (we may call it access to service bias) will wreak havoc with analyses if not accounted for!
Second row…
Moving down a row we have three pink cells that tell another interesting story. The correlation between COVID cases + clinical tests and COVID workload is what we fully expect. What we don’t expect is a rise in COVID cases across England in general that is associated with a rise in non-COVID workload in our hospitals (r = 0.573, p<0.001, n=609). This relationship seriously muddies the water and may once again be temporal happenstance in that beds tend to fill up on a seasonal basis. I wonder how many senior NHS managers took this into account when standing in front of the cameras screaming they’re full to bursting? The final pink cell in this row reveals an association with COVID cases across England and a swing from front door admissions to inpatients testing positive (r = 0.473, p<0.001, n=609). This supports my notion of COVID as a nosocomial phenomenon.
Third row…
The single pink cell in this row also yields the strongest correlation in the entire table (r = 0.950, p<0.001, n=609), which tells us that COVID deaths across England are strongly associated with COVID caseloads within England’s hospitals, as we may expect.
Fourth row…
The first pink cell makes a great deal of sense in that it is revealing the association between disease prevalence/CFR2 and hospital COVID workload. The second blue cell indicates a negative relationship between disease prevalence/CFR and the non-COVID workload which also makes total sense: when the virus was out and about hospitals were less likely to be caring for other types of patients. The pink cell to the right of this indicates that the virus was more likely to be found within the hospitals themselves; that is, the inpatient to admissions ratio swung high when disease prevalence swung high. We may deduce that COVID-19 is very much an institutional disease.
Fifth row…
Not much going on down here but we do have a blue cell that reveals a most curious negative relationship between in-hospital test rate and case balance (r = -0.432, p<0.001, n=609). This indicates that a surge in repeated testing of the same inpatients is associated with a swing toward front door admissions. Logic would dictate the opposite; that the more we test inpatients the more we are going to push the case balance in favour of inpatient cases! So what is going on? Well, one thing that would explain this curiosity is that if many cases are coming in through the front door then hospitals have less time to go around looking for more cases among the inpatient population. This is an awkward finding for analysts because it introduces another source of bias that we may call response bias.
Brimming with bias
So far everything seems to hang together quite nicely in that the two sets of factors mesh to tell a sensible story. The big fly buzzing in the ointment is the discovery of two sources of bias I have named access to service bias and response bias which need to be accounted for in any historical analysis. In my next newsletter in this series I’ll have a look some of these relationships over time.
Pink = positive, blue = negative.
Case Fatality Rate.


