I utilise data from an unknown NHS Trust in the development of a staged multivariate logistic regression model in the prediction of acute respiratory conditions
Cheers boss. This sort of work is akin to constipation - I've been building up to a release for some time! I'm currently fiddling with ideas for modelling probabilistic assessment of vaccination status, this being another loophole for cultists. Wowie, that's some lump of data to render!
That is just brilliant work. The way you work the reader through the explanation is outstanding. It helps that my bias is it’s all about the vaccine harm. Can’t imagine your kettle is ever off.
Might have to share this, well the headline at least, with some of the mutton crew on Twitter that have significant cognitive bias at play. And of course they would never pay to receive access to actual facts.
I would be happier with that, instead they ignore as it’s right over the target and irrefutable (if that’s a thing) and prefer the name calling approach to silence dissent, “you have mental health problems”, “conspiracy theorist’s addiction”, “here comes the nutritionist” “your on the dole your website has no visitors” etc. They certainly don’t like Substack as a medium to exchange views that don’t support the narrative.
Well there's another can of worms. I've brewed a shortlist of 33 ICD-10 codes centred around pneumonia, respiratory failure, ARDS and respiratory arrest, but to quote our most senior cardiac surgeon (who was grilling a registrar at the time), "nobody dies of respiratory failure these days!" Multicausal death is the reality upon which we try to lever simplistic notions in order to get our models to work.
Just run a quick crosstabs - probable COVID assigned binary grouping (by MLP) yielded a diagnostic match of 16,690/19,457 (85.8%), which is handsome, but I preferred to use the probability scores in the logistic regression instead for added tang. All this assumes that vaccination status, as recorded in the EPR, is error-free and I doubt that it is. It's like wrestling a greased pig!
It sure is accounted for! Age sits right at the top of the coefficient table as a main effect. It also sits further down as an interaction with comorbidity, total diagnoses, disease prevalence and probable COVID Dx. You are also correct in inferring that probable COVID also takes account of age related biases, so it's a bit belt 'n' braces!
Thanks for the nice analyses! If you'll please permit some statistical nitpicking offered in the spirit of making the workflow as rigorous and robust as possible:
1. I was always taught it is inadvisable to drop a main effect (e.g. Vaccinated) while leaving in interactions that contain it. The interpretation of a main effect test in the presence of significant interactions is already tricky because we are averaging over heterogeneous estimates, and it seems better to just leave the main effects in the model in order to properly adjust all other effects.
2. When doing stepwise, are you looking at out-of-fold tests and adjusting for multiple testing? Concerned that you may be overfitting the data.
3. Gradient boosted tree methods like XGBoost gracefully find and accommodate higher order interactions automatically and efficiently and typically outperform polynomial regression models (often by a large margin) in honest out-of-sample predictive comparisons. If you are willing to share the data I'll happily run this and share back.
1. I've been in two minds over the handling of main effects which is why you'll see me force these in certain models, but this time round I let the automated procedure do its thing. I shall be developing the model in a number of ways (part 13 already written) and forcing main effects would be nice touch, but all depends on what is coming out in the wash since their retention doesn't always make a difference.
2. I've been avoiding stepwise for these models and permit forward selection only. What you see is the final table but the work starts with exploration of subsets of main effects and one or two key interactions to get a feel of how things are hanging together. If there is stability then I'm happy to let the module rip, but if contradictory results are popping up I halt the process and try to figure what might be happening. Folk get to see less than 1% of the work I put in!
3. Now that sounds like it could save me a great deal of work! However, owing to confidentially agreements I am not permitted to share the data.
I've just re-run the model for part 13 but forcing the vaccination main effect at stage 3 and allowing interactions to come to the party at stage 4. This popped out as OR = 1.12, p=0.548 and made a teeny weeny bit of difference to other coefficients but it's nice to know it's sitting pretty if nothing else!
Good work, John Dee! When you're done, I have 8 years of death certificate data you will definitely be interested to delve into.
Cheers boss. This sort of work is akin to constipation - I've been building up to a release for some time! I'm currently fiddling with ideas for modelling probabilistic assessment of vaccination status, this being another loophole for cultists. Wowie, that's some lump of data to render!
That is just brilliant work. The way you work the reader through the explanation is outstanding. It helps that my bias is it’s all about the vaccine harm. Can’t imagine your kettle is ever off.
Might have to share this, well the headline at least, with some of the mutton crew on Twitter that have significant cognitive bias at play. And of course they would never pay to receive access to actual facts.
Thank you kindly! The mutton crew will tell you that I wear the wrong colour socks.
I would be happier with that, instead they ignore as it’s right over the target and irrefutable (if that’s a thing) and prefer the name calling approach to silence dissent, “you have mental health problems”, “conspiracy theorist’s addiction”, “here comes the nutritionist” “your on the dole your website has no visitors” etc. They certainly don’t like Substack as a medium to exchange views that don’t support the narrative.
The 'devil' is to be found in the detail once more.
I'm confused as to how acute respiratory diagnosis itself originates, what it is, and wonder if it is tied to any other unreliable variable?
Well there's another can of worms. I've brewed a shortlist of 33 ICD-10 codes centred around pneumonia, respiratory failure, ARDS and respiratory arrest, but to quote our most senior cardiac surgeon (who was grilling a registrar at the time), "nobody dies of respiratory failure these days!" Multicausal death is the reality upon which we try to lever simplistic notions in order to get our models to work.
Are covid PCR tests really out of the picture in this model? Like what if:
Vaccination --> No PCR --> AcuteResp dx instead of Covid dx
This would throw off ProbableCovid by Vaccination term, right?
How well does ProbableCovid predict PCR test (or I guess Covid dx as a proxy for PCR)?
Not completely out of the picture - have another look at the following article...
https://jdee.substack.com/p/the-pcr-test-as-a-predictor-of-acute-d9b
Just run a quick crosstabs - probable COVID assigned binary grouping (by MLP) yielded a diagnostic match of 16,690/19,457 (85.8%), which is handsome, but I preferred to use the probability scores in the logistic regression instead for added tang. All this assumes that vaccination status, as recorded in the EPR, is error-free and I doubt that it is. It's like wrestling a greased pig!
Fascinating stuff!
Regarding the potential harms, please could you clarify whether age confounding is taken into account.
For example, vx take up was most likely higher in the older age groups in your study, and older age groups are more likely to become chronically ill?
Or does your hueristic probable covid propensity score take care of this confounding?
Apologies if this is a dumb question!
It sure is accounted for! Age sits right at the top of the coefficient table as a main effect. It also sits further down as an interaction with comorbidity, total diagnoses, disease prevalence and probable COVID Dx. You are also correct in inferring that probable COVID also takes account of age related biases, so it's a bit belt 'n' braces!
My gast is flabbered.
Thanks for the nice analyses! If you'll please permit some statistical nitpicking offered in the spirit of making the workflow as rigorous and robust as possible:
1. I was always taught it is inadvisable to drop a main effect (e.g. Vaccinated) while leaving in interactions that contain it. The interpretation of a main effect test in the presence of significant interactions is already tricky because we are averaging over heterogeneous estimates, and it seems better to just leave the main effects in the model in order to properly adjust all other effects.
2. When doing stepwise, are you looking at out-of-fold tests and adjusting for multiple testing? Concerned that you may be overfitting the data.
3. Gradient boosted tree methods like XGBoost gracefully find and accommodate higher order interactions automatically and efficiently and typically outperform polynomial regression models (often by a large margin) in honest out-of-sample predictive comparisons. If you are willing to share the data I'll happily run this and share back.
1. I've been in two minds over the handling of main effects which is why you'll see me force these in certain models, but this time round I let the automated procedure do its thing. I shall be developing the model in a number of ways (part 13 already written) and forcing main effects would be nice touch, but all depends on what is coming out in the wash since their retention doesn't always make a difference.
2. I've been avoiding stepwise for these models and permit forward selection only. What you see is the final table but the work starts with exploration of subsets of main effects and one or two key interactions to get a feel of how things are hanging together. If there is stability then I'm happy to let the module rip, but if contradictory results are popping up I halt the process and try to figure what might be happening. Folk get to see less than 1% of the work I put in!
3. Now that sounds like it could save me a great deal of work! However, owing to confidentially agreements I am not permitted to share the data.
I've just re-run the model for part 13 but forcing the vaccination main effect at stage 3 and allowing interactions to come to the party at stage 4. This popped out as OR = 1.12, p=0.548 and made a teeny weeny bit of difference to other coefficients but it's nice to know it's sitting pretty if nothing else!