Reliability Of Vaccination Status in the EPR (part 2)
I utilise data from an unknown NHS Trust to determine the reliability of vaccination status as encoded in the electronic patient record (EPR)
In part 1 of this series I unleashed an attack on the EPR using machine learning once again, this time to establish the reliability of vaccination status flags. Identification of the NHS number for each and every hospital admission is far from straightforward such that once upon a time I managed a dedicated team that would endeavour to fill the holes left by admissions clerks in order to link as many people as possible to national audits. Data holes are a fact of clinical life and the hallowed NHS ID is no exception. Holes, combined with transcription errors operating in separate patient management systems, are going to lead to incomplete data capture. In terms of the vaccination record this is going to lead to a default coding of unvaccinated at death. Whoops indeed. This is not a good thing if you want to assess vaccine efficacy in an honest manner.
A predictive model was produced that performed exceptionally well, indicating the NHS ID hole could be as big as 11% of cases or thereabouts. However, the model was forged using the diagnosis for COVID status as given in the EPR, which we suspect to be inaccurate if not downright misleading. This morning, therefore, I decided to run the MLP model again but omitting this variable. What came back was most pleasing and somewhat soothing -herewith the gubbins in one geeky dollop:
What we have here, in plain English, is a near-identical model in terms of classification performance and structure but without the worry of a misleading COVID diagnosis. Jobs a good ‘un! What we better look at next is the overall classification for the new model, and here it is:
Now this is rather interesting because that all important false positive rate has jumped from 10.7% to 11.2%, whilst the false negative rate has dropped from 6.2% to 4.9%. Thus, by ignoring the mess that is COVID-19 Dx, we are pushing the model in a rather interesting direction, and so this is the model I shall rely on for the time being.
Difference Over Time
We now have a weekly count of vaccinated in-hospital deaths according to a matching process between national and local databases and a weekly count of vaccinated in-hospital deaths according to a predictive model. Let us then derive a weekly excess (EPR count minus MLP count) and see how this pans out over time:
Well that’s torn it! Instead of a random process bobbing around the zero axis we have a flip-up followed by a massive drop, followed by a steady state of around 40 missing vaccinated cases per week. This points to systematic bias within the EPR rather than a bit of model jitter, and make no mistake!
That flip up peaking during 2021/w3 indicates a sudden surge in vaccinated deaths over and above that predicted by the model, and can be construed as evidence of vaccine harm (all things being equal). Either that or NHS/NIMS database managers enjoyed a temporary period of NHS ID matching brilliance that they couldn’t quite achieve again for some mysterious reason. This doesn’t wash so I’m putting my money on that flip-blip pointing to excess vaccinated death and I am sure Occam is going to place his money on that horse also.
So how about that cliff face? During 2021/w4 we still have a positive excess running at +27 vaccinated deaths, one week later are we are in to a negative excess of -80 deaths. Did the database systems suddenly fail in NHS ID matching or did somebody decide not to go chasing vaccinated deaths with as much vim and vigour? Cliff faces like this are usually the result of a policy change, so my money is on a centralised decision not to chase down NHS ID holes that could turn out to be embarrassing to the government and its vaccination programme.
Consistency in data capture failure over time is what I would expect and this is what we observe over much of the period (there’s always a residual core of hospital admissions you cannot nail on NHS ID no matter how hard you try). Why, then, is that residual rate showing signs of decline toward the end of the period? Did the most fastidious admissions clerk in the world decide to change jobs or are we looking at another political decision point by senior management? Or something else maybe?
Funny Business
Funny business, ain’t it? COVID cases that are probably not COVID, non-COVID cases that are probably COVID, and unvaccinated deaths that were probably vaccinated. It’s enough to make data analysts feel sea sick! So here’s an idea… what about I junk the COVID-19 Dx and Vaccinated binary indicators derived from the EPR and use the predictions derived from my modelling in a big, fat model for acute respiratory conditions? What would we see then in terms of vaccine efficacy, I wonder?
Sounds like an excuse for a really big cake!
Summary
Machine learning (Multilayer Perceptron) was employed in the prediction of vaccination status for 19,547 in-hospital deaths for the period 2020/w11-2021/w6 whilst omitting data regarding COVID status.
Exceptional levels of classification performance were achieved, with an overall true positive classification rate of 88.8%, and overall true negative classification rate of 95.1% when compared with declarations of vaccination status made in the EPR.
The overall false negative rate (predicting vaccination in the absence of an EPR flag) was estimated at 11.2% (1,612/14,418) which suggests that linking local Trust records to the national immunisation database - which is critically dependent on NHS number identification and verification - may have failed in around ten percent of cases.
Kettle On!
I can't help but notice the curve looks like an inverted vaccine rollout curve. Or it also looks like an inverted temporal mortality risk curve, with the first hump being like am inverted healthy vaccine effect. What would it look like if the most important variable - vaccine uptake - were removed?
You are the right man to be doing this vital work. I used to trust my NHS stats, but not any more. Are you planning to publish any of this at some stage - its an object lesson in poor record keeping, isn't it?
I remember as a young NHS management trainee (at the Countess of Chester Hospital oddly enough) trying to count beds, and everyone thinks its easy because well, a bed is a bed is a bed - : its actually fiendish!