Primary Clinical Outcomes For A Single Emergency Department 2017 - 2021 (part 3)
SMLR modelling of 1.9 million admissions records for the emergency departments of an undisclosed NHS Trust for the period Jan 2017 – Sep 2021: predicting hospitalisation
I have to admit that part 2 of this series was a humdinger of a brain mangler. I normally would have provided a pithy bullet point summary at the outset (as I do for all logistic regression work) but I ran foul of the length limit for email distribution and had to cut the pastry short. Short crust, if you will. And there goes the first groan of the day!
What I shall do, therefore, is try something skimpy in this paragraph and say that part 2 was all about predicting which admissions to the ED got treated there and which did not. Age came into it as did sex as did mode of arrival as did total diagnoses made as did nine indicator variables identifying cardiac, cancer, respiratory, infective, CNS, endocrine, blood disorders/CVA, GI diseases and physical injury; as did many interactions between these factors.
Prediction using logistic regression in this manner yields a probability score for each individual which we can then aggregate to see if scores were raised during the COVID era compared to the same time frame in the previous year. It would have been lovely if the answer was a plain ‘yes’ or ‘no’ but we discovered the situation on the ground was more complicated than this with an increased likelihood of treatment within the ED for the less sick cases, and decreased likelihood of treatment within the ED for very sick folk during the COVID era compared to the previous year. We could call this medicine in reverse and to me it suggests that senior management and/or protocols were interfering with emergency medicine, and not in a good way.
Today I am going to repeat the entire process but this time we are going to look at the risk of hospitalisation for ED admissions during the COVID era compared to the same time frame for the previous year. Strap yourselves in…
Raw Ingredients
Let’s remind ourselves of the raw ingredients in terms of summary statistics for the entire sample of 1,530,522 adult admissions over the period January 2017 – September 2021:
As before the mean of any 0,1 binary indicator variable provides a sample estimate of the proportion of cases. Thus we see that adult females slightly outpace males over this period with a proportion of 0.52 (52%). We also see that the leading condition was physical injury (including intoxication and poisoning) at 0.36 (36%). The mean admission age remains low at 49.31 years: front door medicine is for youngsters doing young things and coming a cropper. The arrival mode figure of 0.29 (29%) refers to the proportion arriving by emergency ambulance as opposed to folk making their own way. Treatment Status – the dependent variable for part 2 – now serves as an independent variable in the prediction of hospitalisation, and we note that 0.59 (59%) of admissions received treatment of some kind in the department.
The A&E Disposal route proportion of 0.22 (22%) refers to those who were hospitalised instead of being discharged home or discharged with a referral to another service provider (e.g. physiotherapy). Thus we discover that only 1 in 5 admissions over the study period required hospitalisation, this being the dependent variable I shall now attempt to model.
All in all we’ve got 14 independent variables and 1 dependent variable in that list. When submitted as such in a logistic regression these are known as ‘main effects’. We can procure a more sophisticated model by looking at two-way, three-way, four-way, five-way and more-way interactions but these don’t always make for a better predictive model. Over-parameterization is always a danger as is the risk of analysing a large covariance matrix full of holes or cells with just one or two cases. Over the years I’ve stuck with two-way and occasionally three-way structures to good effect in the field of medical science and shall do the same here.
We may note that 14 independent variables will give rise to a possible 91 two-way interactions but in reality many of these are redundant because ED staff tend to code just one primary condition per admission. After due consideration (and the obligatory beard stroking) I settled upon 56 two-way interactions. As before the model was run using the pre-pandemic sample of 1,082,962 adult admissions to arrive at unbiased scores. By ‘unbiased’ I mean scores generated from the ED department doing its thing under ‘normal’ conditions between January 2017 and February 2020. A score in this sense could also be called a propensity, probability, risk or likelihood of an individual being hospitalised given their personal circumstances, with values ranging from 0 (no chance of hospitalisation) to 1 (guaranteed hospitalisation).
Risk Of Hospitalisation
Risk Score
I guess I better start with a summary table of all the scores generated over the period January 2017 – September 2021 so we can get a feel for the distribution within the sample:
We observe a mean of 0.213 (21.3% probability of hospitalisation), which may be favourably compared with the 20.1% declared in the second table presented in part 1: no funny business here!
Again I’d like to draw your attention to the distribution of score values using the sample period of 2019, this being complex as we might well imagine:
A double-humper with a monster tail of high risk cases. Lots of low risk admissions (sprains, bruises, things in eye) then a burst of might-possibly-need-a-bed folk (injured limbs, back etc).
Geek-nerds will be keen on that ROC stuff, so here it is:
I must confess that I’m pretty chuffed with this! Not a bad bake even if I say so myself.
So that’s the risk of hospitalisation score in basic terms; and now we ought to put such a good score to mighty good use…
Comparison of Periods: Crude Cross-tabulation
We start by doing what most analysts will do and that is to cross-tabulate the study/control phases against the dependent variable:
At a superficial level (and this is very superficial) then we might conclude that the risk of hospitalisation for the control period, at 24.1%, was marginally higher than that for the study period, at 23.8%. Owing to the huge sample size (n=466,226) then this pathetically small difference will be off the scale significant. Except it isn’t quite off the scale with a Fisher Exact Test for a 2 x 2 table yielding p=0.016. In odds terms we may calculate a risk factor of 1.013 for hospitalisation during 2019 compared with 2020 and realise this isn’t anything to write home about. We’ve been here before!
Comparison of Periods: Mean Risk Score
So… the rate of hospitalisation didn’t change much between control and study periods but how about the mean risk score for hospitalisation? Try this…
Well, it looks like a marginally higher risk score for the study period and if I run a nonparametric test for medians across groups I get a p-value off the scale as expected (p<0.001) owing to the huge sample size. In odds terms the ratio for median values is only 1.042 so again it’s nothing to be writing home about. What I better do now is finish up by settling the score.
Settling The Score
Once again we must settle the score by running a logistic regression for the prediction of hospitalisation using both the period indicator (control vs. COVID) and the risk score. As before we are using a hunk of bread (pre-pandemic risk of hospitalisation score) to mop up the gravy (variance inherent in the dependent variable) in order to see the peas (truth of the situation). This may sound complicated but all we are doing is accounting for the predisposition of ED admissions to be hospitalised given their age, sex, mode of arrival and condition before we say anything about control vs. COVID period observed rates of hospitalisation. Herewith the final mini-model:
Ignore the ultra massive black hole that is the OR of 611.09 for the hospitalisation score for this is simply doing a grand job of levelling the playing field across samples. Geeks might like to think of this in terms of propensity matching by modulating variance space. Others might like to do something more sensible like munching on a biscuit whilst waiting for my final conclusion.
Again we have an interaction between risk score and the period indicator that makes it tricky to understand what is going on, so I shall resort to a colourful slide to put us all in the picture:
Thar she blows! There’s not much in it, innit? If we take the control period of 2019/w11 – 2019/w49 as the reference then we observe a greater tendency not to hospitalise admissions during the COVID era if the risk score was on the low side. The crossover point is observed around a risk of 0.62 after which both curves are pretty much matched.
What does this mean?
It means the ED medics of this NHS Trust were eminently sensible and only sent home those few low risk cases that could be sent home whilst they and the nation reeled under draconian measures better suited to a dystopian novel.
On The Stove
I rather like this modelling approach and it appears to be giving us sensible answers with real world meaning. What I’m inclined to do next is borrow the method and use it to compare the post-vaccine era of 2020/w50 – 2021/w37 within the sample against the control period of 2018/w50 – 2019/w37, with risk of treatment and risk of hospitalisation set as the primary clinical outcomes as before.
Since I have vaccination status data then I might explore this as a factor once I’ve established the base models but the problem here is that the vaccine cohort is almost certainly going to differ from the non-vaccine cohort in ways I may not be able to account for.
Kettle On!