Calibrating COVID (part 2)
Using n-dimensional space to determine what COVID is and is not
In Calibrating COVID (part 1) I used a machine learning approach (Multilayer Perceptron) to give me a handle on the reliability of COVID-19 designation for 11,156 in-hospital deaths occurring between 1 Feb ‘20 and 7 Dec ‘20 in a sizeable NHS Trust somewhere in England. We observed a reasonable correspondence between PCR test result and severely symptomatic COVID death (74.5% agreement) and poor correspondence between PCR test result and asymptomatic COVID death (22.15% agreement), though all hinged on what was meant by ‘symptomatic’. For this initial analysis I played a straight bat by relying on acute respiratory diagnoses to flag what is deemed to be a respiratory illness. But is COVID a respiratory illness?
As 2020 unfolded I sat scratching my head as clinical teams and physicians around the globe reported all manner of symptom. Diffuse alveolar damage I could understand; hair loss not so much. This led us down the path of cytokine storms, possibly toward the bradykinin hypothesis. All very interesting indeed, as were all those lung sections gathered in Northern Italy - but not everybody suffered a cytokine storm, lung damage or hair loss. Anybody digging into my work will discover that symptomatic/severe COVID cases - in terms of admissions to the emergency department or in-hospital death - were fewer than we may imagine (at least for the sizeable NHS Trust under study), with asymptomatic COVID being a popular choice amongst clinical coders.
Therein lies the rub: does asymptomatic indicate a false positive test result, a genuine SARS-COV-2 infection that is doing nothing pretty much or a SARS-COV-2 infection that is doing something other than induce acute respiratory conditions? Just what damage does COVID actually seem to have done at the level of an entire NHS Trust over a period of 11 months as opposed to handfuls of case studies? What about false negative test results? Could these be leading to compounded error in diagnosis?
We need to make a distinction here. There is a difference between a thoracic surgeon saying “just look how peculiar these lung sections are” (from the few patients that were under their care) and an epidemiologist saying, “yes, well they’re interesting lung sections but not at all representative!” I thus decided to see what COVID actually looks like from a top down perspective using as many variables as I could muster. In stats-speak I decided to place COVID diagnosis (whether ‘true’ or ‘false’)1 as the dependent variable in staged multivariate logistic regression analysis using in-hospital death as the clinical outcome.
All very exciting indeed for rarely does one throw every darn independent variable in a database into the pot just to see what comes out in the wash! Being brought up ‘statistically proper’ I decided against a vast list of independent variables that would end in a plate of tangled spaghetti that goes by the name of multicollinearity and reach for a mighty splendid statistical tool known as factor analysis to reduce that vast list to a handful of orthogonal factors.
This process became a journey of discovery in itself, and in part 3 of this series I reveal what happened when I ran an array of pandemic measures derived from the UK GOV coronavirus dashboard through factor analysis to obtain my first five factors.
Kettle on!
I’m not sure what ‘true’ and ‘false’ mean in the context of a diagnostic test that is essentially a probability score with associated error at heart.


Mmmm... that reminds me. Must put pasta on the shopping list!