COVID vs. COVID (part 1)
A quick comparison of probabilistic COVID scores as derived from application of machine learning (MLP) with COVID diagnoses as recorded in the EPR
Part 12 of this series packed quite a punch, so I’ll just do a quick plain English recap to ensure we’re all up to speed. What I’m doing is using a popular statistical technique within the medical sciences (Logistic Regression) to determine factors that influence the incidence of acute respiratory conditions (pneumonia, respiratory failure, respiratory arrest, ARDS) in patients who went on to die. The technique simultaneously takes account of age, sex, prevalence of COVID-19, major comorbidities, complexity of case, PCR test result, vaccination status, and the many interactions between all these.
When we replace the declared diagnosis of COVID-19 within the electronic patient record with a probabilistic function derived from a sophisticated model designed to predict presence of a genuine infection from a diagnostic array (true positive) all notion of vaccine benefit evaporates and we are left with results that provide evidence of vaccine harm.
In essence, I am using statistical methods to remotely audit the EPR of deceased inpatients, whose real-world clinical value should be questioned. Most folk these days appreciate the limitations of the PCR test, with a tendency to produce false positives (especially under ridiculously high maximum cycle thresholds), but this is just the tip of the iceberg. Healthcare professionals have confided that over-enthusiastic use of chest X-ray also leads to spurious conclusions, and I have had several clinical coders confide that they’ve been instructed by management to code a case as COVID even though Senior House Officers have scrawled ‘NOT COVID’ on casenotes. To say that this is all rather murky indeed is an understatement! Hence we cannot trust the EPR coding as given, hence my development of a probable COVID variable in an attempt to circumnavigate matters.
Is There Much Difference?
At this point we may ask how probable COVID stacks up against the original COVID diagnosis and the answer comes c/o a classification table that looks like this:
Thus, in terms of identically matched diagnoses, we have a rather healthy rate of 15,396+1,294/19,457 = 85.8%. That’s not bad going and means we can be sure of the accuracy of the original diagnosis in 86% of patients, with just 14% coming into question. In my model run for part 12 I didn’t question the veracity of the machine learning decision and ploughed away with the predicted binary outcome. To give readers a flavour of how the underlying probability scores translate into the diagnosis given in the EPR here’s a boxplot of probable COVID factored by COVID-19 Dx:
As may readily be seen there’s a pretty solid distinction for the great majority of cases, with the mean score for non-COVID cases lying down at P=0.12 compared with P=0.41 for declared COVID cases. This difference is hugely statistically significant (p<0.000001), so we’re not talking about ‘maybe’!
Thus, the machine learning black box has done a cracking job; and, ergo, healthcare professionals must also have done a good job in identifying cases. That being said we must note the whiskers that point to COVID cases that were likely not (false positive) and non-COVID cases that likely were (false negative). In this regard the highly skewed distribution of scores on the left is rather interesting, with a long tail indicative of false negatives (assuming the machine learning model is doing a sensible job). I’m hoping that by sitting with this slide and a decent coffee, readers will reach the happy conclusion that my fiddling hasn’t been a fiddle!
But here’s the extraordinary thing…
By choosing a simple binary variable (Probable COVID) derived from the probabilistic (MLP) values of the y-axis for my logistic model instead of the dichotomous EPR-derived values of the x-axis we’ve gone from seeing vaccine benefit all over the place to seeing evidence of vaccine harm. This suggests that the driver for this 180° flip is due to a modest number of critical patients that sit in those whiskers, which brings us back to that 14% figure.
Summary
Diagnostic agreement between probable COVID, as derived using machine learning, and the diagnosis recorded in the EPR of 19,457 adult in-hospital deaths was found for 16,690 (85.8%) of cases.
Use of Probable COVID-19 disease status eliminates a major source of bias that gives rise to the illusion of blanket vaccine benefit.
Individualised probabilities for risk of COVID infection may be viewed as a proxy for diagnostic confidence.
Kettle On!
I think this must be wrong, John, since very good analysis suggests that we had far far higher rates of false positives at most times during the pandemic than the 15% or so that your analysis suggests. False positives were more like 70-90% most of the time, particularly in areas (most places in the world) that had strong screening programs in place. Or am I misunderstanding your analysis? Is it limited to particular kinds of tests or testing milieus? Here's our writeup on the false positive paradox resulting from screening for Covid. https://www.bmj.com/content/373/bmj.n1411/rr
- Is count of MLP covid cases the sum of fractional cases, or is it a sum of cases with probability>50%?
- As we previously brainstormed: "I have date of vaccination and date of death so it will be interesting to look at the distribution of time elapsed for COVID and non-COVID cases." This was in regards to questioning if EPR covid dx remained in the system after the person already recovered. Another incentive to look at that is specifically regarding these newfound outlier cases for non-covid Dx. If something is really funny about their timing relative to vaccination or death, that may reveal an issue.