I utilise data from an unknown NHS Trust in the development of a staged multivariate logistic regression model in the prediction of acute respiratory conditions
1. "Nine independent variables will generate 36 two-way interactions"
[Are the terms independent and interaction oxymoronic?]
2. How can an OR of 1.00 have a low p-value? What does that mean?
3. How can these p-values be explained?:
ChronicResp OR = 0.82 p = 0.244
ChronicResp by ProbCovid OR = 0.01 p = 0.000
Shouldn't interaction terms have higher p-values due to smaller samples and thus lower confidence?
4. Budesonide inhaler was reported early on to be effective for covid. Not sure how that has aged, but I suspect it has done so very well. Same goes for nasal sprays, and nasal and mouth rinses. I don't understand why ChronicResp only has OR=0.84 whereas the interaction of ChronicResp by ProbCovid would make it 0.01. I would think everyone with chronic resp would be on fairly equal amounts of regular steroid inhaler so maybe that can't explain it.
2. Low p-values merely indicate high confidence that the OR is unity or very close to unity (i.e. that there is no modification of risk.
3. ChronicResp as a pooled main effect is not statistically significant (a surprising finding that needs checking). All depends on how the sample sizes break down in relation to the dependent. Ideally I'd crosstabulate everything to see what sample sizes we are dealing with and whether ProbCOVID is throwing a spanner in the works.
4. Yes, it doesn't make sense. I'll need to pull things apart to see why the main effect failed when evidence suggests otherwise. It could easily be a typo or I might have looked at the wrong line of output.
Just run a stripped-down model with just three terms and it is making more sense:
ChronResp OR = 2.10, p<0.001; ProbCOVID OR = 51.00, p<0.001; ChronResp*ProbCOVID OR = 0.241, p<0.001.
Cell-wise n drops to 154 for ChronResp with COVID for non-acute respiratory conditions. Small yes, but that shouldn't throw the model. I'll need to start from scratch to see at what stage ChronResp flips to an unexpected insignificance.
I have found the culprit! The interaction term Dxtotal*PROD has bent the covariance matrix out of shape. Without it I get a very similar model but with ChronResp fetching-up at OR = 2.15, p<0.001. ChronResp*ProbCOVID yields OR = 0.01, p<0.001 as before. I'll revise the article with the new model but the conclusions will be virtually the same apart from the ChronResp result. So... ChronResp isn't a good place to be for risk of acute conditions in general, but use of inhalers saves your bacon if you contract severe COV. Sure makes more sense!
JD. I’ve followed your series of statistical analysis of the efficacy of the vaccines with the a priori (biased) assumption the vaccines (on the balance of probability) might improve the chance of survival given a Covid infection. Now the empirical evidence: Mrs N and I were “lucky” enough to have the full vaccination program. We do not have co-morbid risk but in our late 70’s we avoid any obvious health risk. We did indeed change our life-style to avoid infection as much as we could. So, what do we observe given 7 doses to date: (mainly Pfizer and one Moderna). After 5 jabs (including the one Moderna), we both contracted Covid (Flow test and PCR detected: We were also enrolled in ONS survey). Both of us were hit quite hard by the virus: I did not recover full lung function for 6 months. We are a peculiar data item: Identical life-style, identical contacts, identical vaccination program but also contract Covid at the same time (February 2023). Do the vaccines work? No. Might we have been hospitalised without the jabs? I doubt it! Did they do us harm? Well my arm still hurts from the latest jab 3 days ago!
I'm sorry to hear that, Norman. I've just had a text from my aunt with the same story - boosted as far as she can go and now suffering a second bout of flu-like illness despite being clear of such for many years.
I would like to add my thanks for creating this very detailed analysis series. It really has served to illustrate the bear traps created by poor data with biasses.
I have a couple of general questions:
1) Based on your experience, how confident are you in the main findings of the study?
2) If you were to apply a similar analysis of observational data to 'known' non-experimental medical treatments (hopefully where we have good RCT data), what kind of results emerge. ie how good can a retrospective observational trial be, even when you try to correct for biasses?
Showing folk the holes by falling down them myself is part of my cunning plan!
1. Given the sample size, the provenance of the data, and the stability of the umpteen models readers have not seen (I publish less than 1% of my research output) then I'm highly confident that the jabs haven't reduced severity of symptoms if we define this as onset of acute respiratory conditions. In a future lengthy series you'll get to see the results of work I've done on risk of hospitalisation for cases coming through the 'front door' as well as likelihood of high-dependency care that will confirm matters for a large sample of those who managed to stay alive.
2. Now that's a pudding that needs to be eaten before we can pass judgement!
John Campbell did a great series of interviews with immunologist Prof Clancy - your finding accords with his expectations given the biological mechanisms of immunology.
I look forwards to your further adventures within medical data...
2) In the past, I never really considered the robustness of the medical data on which interventions and treatments are made. Experience of the last few years has made me extremely wary. Treatments and interventions such as Statins, flu jabs and various screening tests spring to mind.
Thank you very much for this article series!!
Thanks! I think I'm done with this type of modelling for the time being - I'm flipping to survival analysis next and see what that brings.
Looking forward to that!
1. "Nine independent variables will generate 36 two-way interactions"
[Are the terms independent and interaction oxymoronic?]
2. How can an OR of 1.00 have a low p-value? What does that mean?
3. How can these p-values be explained?:
ChronicResp OR = 0.82 p = 0.244
ChronicResp by ProbCovid OR = 0.01 p = 0.000
Shouldn't interaction terms have higher p-values due to smaller samples and thus lower confidence?
4. Budesonide inhaler was reported early on to be effective for covid. Not sure how that has aged, but I suspect it has done so very well. Same goes for nasal sprays, and nasal and mouth rinses. I don't understand why ChronicResp only has OR=0.84 whereas the interaction of ChronicResp by ProbCovid would make it 0.01. I would think everyone with chronic resp would be on fairly equal amounts of regular steroid inhaler so maybe that can't explain it.
1. They're called independent variables by convention (though I prefer 'explanatory' or 'predictor' variables. https://en.wikipedia.org/wiki/Dependent_and_independent_variables.
2. Low p-values merely indicate high confidence that the OR is unity or very close to unity (i.e. that there is no modification of risk.
3. ChronicResp as a pooled main effect is not statistically significant (a surprising finding that needs checking). All depends on how the sample sizes break down in relation to the dependent. Ideally I'd crosstabulate everything to see what sample sizes we are dealing with and whether ProbCOVID is throwing a spanner in the works.
4. Yes, it doesn't make sense. I'll need to pull things apart to see why the main effect failed when evidence suggests otherwise. It could easily be a typo or I might have looked at the wrong line of output.
Just run a stripped-down model with just three terms and it is making more sense:
ChronResp OR = 2.10, p<0.001; ProbCOVID OR = 51.00, p<0.001; ChronResp*ProbCOVID OR = 0.241, p<0.001.
Cell-wise n drops to 154 for ChronResp with COVID for non-acute respiratory conditions. Small yes, but that shouldn't throw the model. I'll need to start from scratch to see at what stage ChronResp flips to an unexpected insignificance.
I have found the culprit! The interaction term Dxtotal*PROD has bent the covariance matrix out of shape. Without it I get a very similar model but with ChronResp fetching-up at OR = 2.15, p<0.001. ChronResp*ProbCOVID yields OR = 0.01, p<0.001 as before. I'll revise the article with the new model but the conclusions will be virtually the same apart from the ChronResp result. So... ChronResp isn't a good place to be for risk of acute conditions in general, but use of inhalers saves your bacon if you contract severe COV. Sure makes more sense!
Article updated with the new model and blurb. Many thanks for pushing me toward a tasty revision!
JD. I’ve followed your series of statistical analysis of the efficacy of the vaccines with the a priori (biased) assumption the vaccines (on the balance of probability) might improve the chance of survival given a Covid infection. Now the empirical evidence: Mrs N and I were “lucky” enough to have the full vaccination program. We do not have co-morbid risk but in our late 70’s we avoid any obvious health risk. We did indeed change our life-style to avoid infection as much as we could. So, what do we observe given 7 doses to date: (mainly Pfizer and one Moderna). After 5 jabs (including the one Moderna), we both contracted Covid (Flow test and PCR detected: We were also enrolled in ONS survey). Both of us were hit quite hard by the virus: I did not recover full lung function for 6 months. We are a peculiar data item: Identical life-style, identical contacts, identical vaccination program but also contract Covid at the same time (February 2023). Do the vaccines work? No. Might we have been hospitalised without the jabs? I doubt it! Did they do us harm? Well my arm still hurts from the latest jab 3 days ago!
I'm sorry to hear that, Norman. I've just had a text from my aunt with the same story - boosted as far as she can go and now suffering a second bout of flu-like illness despite being clear of such for many years.
Thank you
I would like to add my thanks for creating this very detailed analysis series. It really has served to illustrate the bear traps created by poor data with biasses.
I have a couple of general questions:
1) Based on your experience, how confident are you in the main findings of the study?
2) If you were to apply a similar analysis of observational data to 'known' non-experimental medical treatments (hopefully where we have good RCT data), what kind of results emerge. ie how good can a retrospective observational trial be, even when you try to correct for biasses?
Showing folk the holes by falling down them myself is part of my cunning plan!
1. Given the sample size, the provenance of the data, and the stability of the umpteen models readers have not seen (I publish less than 1% of my research output) then I'm highly confident that the jabs haven't reduced severity of symptoms if we define this as onset of acute respiratory conditions. In a future lengthy series you'll get to see the results of work I've done on risk of hospitalisation for cases coming through the 'front door' as well as likelihood of high-dependency care that will confirm matters for a large sample of those who managed to stay alive.
2. Now that's a pudding that needs to be eaten before we can pass judgement!
Thanks John.
1) That's good science!
John Campbell did a great series of interviews with immunologist Prof Clancy - your finding accords with his expectations given the biological mechanisms of immunology.
I look forwards to your further adventures within medical data...
2) In the past, I never really considered the robustness of the medical data on which interventions and treatments are made. Experience of the last few years has made me extremely wary. Treatments and interventions such as Statins, flu jabs and various screening tests spring to mind.