The PCR Test As A Predictor Of Acute Respiratory Conditions (part 3)
I utilise data from an unknown NHS Trust to determine the real-world value of a COVID-19 diagnosis in the EPR of in-hospital deaths
It would appear from part 2 of this series that what the PCR test has gone and done is paint a mirage on the canvas of respiratory illness, with two massive spikes of seeming COVID death that fail to translate into acute respiratory conditions. But how does my newly forged catch-all indicator variable - DDx_VIBAC - square with the acute respiratory condition flag (DDx_Acute)? Time for another cross-tabulation of those 19,457 in-hospital deaths:
Now this is interesting because I would have bet money on a stronger relationship. We observe that 2,216/3,898 (56.8%) of cases with an alleged respiratory pathogen of some sort were also diagnosed with an acute respiratory condition such as pneumonia, ARDS or respiratory arrest. Ideally we would take that 43.2% as a measure of the resilience of the human body supported by advances in modern medicine, but we can’t rule out false positives at this stage. This table also reveals 1,948/15,559 (12.5%) of deaths with an associated acute respiratory condition but no infectious agent, these invariably being folk whose chronic disease state has deteriorated.
I guess we can be pretty sure about those 2,216 infected folk with an acute respiratory condition and those 13,611 non-infected folk not suffering an acute respiratory condition which makes for 15,827/19,457 (81.3%) of the data sample we can have some sort of confidence in. That’s not bad going!
Battle Of The Giants
Subsets are one way of going about unravelling this diagnostic tangle, another is to model the diagnosis of interest then use the predicted values in place of the observed values, this being a technique that is related to propensity score methods. That is to say, we use a fancy multivariate model to formulate the propensity for a COVID diagnosis to be given to an individual patient. This should iron out some of the sample bias and push a positive test result toward something akin to a valid test for infection. That’s the theory in any case, and I’ll now have a stab at this.
In my data sample the indicator variable COVID-19 Dx encodes the performance of the PCR test in a clinical setting, which is a subset of my newly forged DDx_VIBAC that throws the net around all acute respiratory infections. Let us start with COVID-19 Dx and use two modelling techniques: classic multivariate logistic regression and a new-fangled feed forward neural network approach known as multilayer perceptron.
Logistic Regression Method
The model structure for the logistic regression was sizeable, there being 20 main effects and no less than 190 two-way effects and 1,140 three-way effects. I don’t have a tea pot big enough for all those three-way effects so stuck to two-way as my nod to model sophistication. Herewith a summary table listing those 20 main effects so you can see what was thrown into the pot:
Regular readers will have come across this listing a fair few times but it may be worth me mentioning once again that Diagnoses represents the total number of medical diagnoses made on the medical record, which may range from 1 to 10, this being a rough proxy for how sick people are. CDR (case detection rate) is derived from the UK GOV coronavirus dashboard being the rolling 7-day count of new COVID cases detected in England divided by the rolling 7-day count of viral tests undertaken, this figure being a reasonable proxy for disease prevalence across the nation and thus a crude proxy for individual risk of infection.
I’m hoping all other independent variables are largely self-explanatory, and you can determine their incidence within the population of 19,457 adult deaths over the period 2020/w11 – 2021/w36 by eyeballing their mean value. Thus we see that a cancer diagnosis leads the way with mention in 30% of all in-hospital deaths. If the vaccinated figure of 26% causes you to blink/blurt tea then please do realise this is the average for the entire pandemic period!
It took my quad core workstation a fair while to run the logistic model, but it eventually did so in 66 steps. Herewith the resulting monster table of coefficients:
Something this big is rather boggling so my advice is to skim down the Exp(B) column that is conveniently sorted by size of odds ratio. Right at the top is Acute Respiratory (OR = 5.17, p<0.001) which is what we’d expect if the PCR test was doing something useful. In plain English this is telling us that in-patients suffering an acute respiratory condition were five times more likely to be associated with a positive test result.
However, just below this is the interactive term for Injury & Trauma by Inflammatory conditions (OR = 5.00, p<0.001). This is most curious for the injury and trauma category is characterised by falls in the elderly, with generalised inflammatory conditions being characterised by pneumonitis due to food and vomit (31.4% of all cases). We’re talking about elderly folk in a sorry state, being trolleyed into hospital from home or (more likely) a care home. This group is five times more likely to return a positive test result so we’re talking about nosocomial disease and/or excessive rates of testing by nursing staff following protocols. Scanning down this list is rather sobering for it tells the story of hospital life and the rapidly ageing population of England.
The Illusion Uncovered
Being curious I did wonder where vaccination sat in all of this and found a single main effect cowering near the bottom (OR = 0.18, p<0.001). If we are to follow the shallow reasoning of the pro-vax cult then this indicates a near six-fold magical reduction in the likelihood of a positive test result for everything and everybody. I don’t think so!
The reason I don’t think so is that this main effect is not embedded in an interactive term with an odds ratio less than unity that is pointing to a reduction in acute respiratory conditions or chronic respiratory disease in association with vaccination. In stats-bod speak the indicator variable Vaccination is exhibiting independence; that is to say, it is a variable that bears no associative relation to any other variable in a clinical sense. This situation will come about if the decision to run a PCR test is governed by vaccine status and not by medical need. This is the illusion uncovered!
In fact, what we do find are two interactive terms indicating vaccine harm, namely: Inflammatory conditions by Vaccinated (OR = 1.61, p<0.001); Hypertension by Vaccinated (OR = 1.61, p<0.001).
The First Pudding
This brings us to the first pudding, this being the table of classification results and attendant ROC curve. Try a spoon of this:
The model handles the true negative side well, correctly classifying 15,394/16,045 (95.0%) of deaths. Performance for the true positive side is lacking at 1,213/3,412 (35.6%) of deaths, though the overall performance of 85.4% correctly classified deaths is not to be sniffed at, but can we improve on this?
Multilayer Perceptron Method
The same 20 main effects listed at the outset were fed into a neural network package using the multilayer perceptron (MLP) technique to tease out patterns and associations using machine learning. The output is huge and funky so I’ll restrict myself to the classification table, which is split into the results arrived at in training and testing modes:
There is an agreeable consistency in performance between training and testing modes, and the method just pips logistic regression by a short hair, though detailed ‘meaning’ of the model structure is lost to the land of the ‘black box’.  Variables are evaluated according to their ‘importance’ in the building of the neural network and this is presented below in descending order:
Right at the top is the variable indicating the complexity of the diagnostic record and thus the general health of the patient (Diagnoses). There’s injury and trauma again, grabbing the silver medal, with other cardiac conditions (chiefly chronic ischaemic heart disease/atherosclerotic heart disease) grabbing the bronze. Vaccinated once again features way down on the list but we cannot easily tell from a neural network what this means.
Final Score
With two sets of predictions for incidence of positive test result arising from similarly performing but very different models we might want to compare them in a final head-to-head, so here’s a cross-tabulation that does just this:
We observe agreement in 18,438/19,457 (94.8%) of deaths, which is rather splendid and a sign that the outputs of either model will do just as well in replacing COVID-19 Dx as a more truthful indicator of disease status. Since MLP just has the edge in terms of correctly classified deaths, with a slightly greater area under the ROC curve then I shall adopt the output of this method for further exploration of the data.
Summary
A diagnosis of COVID-19 in the EPR of in-hospital deaths does not imply causality, and may not even imply infection with active SARS-COV-2. It is thus an unreliable indicator for use in vaccine benefit studies.
Two methods of multivariate statistical modelling (logistic regression & multilayer perceptron) were employed to produce probabilities for genuine SARS-COV-2 infection leading to COVID-19 symptomology.
A major source of bias within vaccine benefit studies will be eliminated by using estimated probabilities of genuine infection instead of a binary indicator with acknowledged limitations.
Kettle On!
Thanks for adding the summary at the top. A couple thoughts: some (not all) of the early Covid vaccine published trials used symptomatic Covid as a key biomarker/endpoint, rather than just positive tests, recognizing at least implicitly that a positive test by itself was not enough.
Historical "case definitions" have almost always required both a positive test PLUS symptoms. Covid was bizarrely anomalous in that the US and then WHO in that order adopted case definitions that required only a positive test, first only PCR tests but then modified to allow even a positive antigen test to qualify as a "case" of Covid.
Anyway, here's the published trial for the Moderna vaccine, Baden et al. 2021, which did require both a positive test and Covid symptoms ("symptomatic Covid"). This is better than not requiring symptoms, but as you surely know, with the symptom list for Covid so incredibly long and non-specific to Covid, even requiring symptoms alongside a positive Covid test probably did little to weed out large numbers of false positives. https://www.nejm.org/doi/full/10.1056/NEJMoa2035389
And here's the US CDC "case definition" for Covid: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/
The big table only lists about 60 of the 190+ two-way effects, most notably missing AcuteRespiratory by Vaccinated and ChronicRespiratory by Vaccinated. Is there any reason for this, or is this table just a random illustration?