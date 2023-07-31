It would appear from part 2 of this series that what the PCR test has gone and done is paint a mirage on the canvas of respiratory illness, with two massive spikes of seeming COVID death that fail to translate into acute respiratory conditions. But how does my newly forged catch-all indicator variable - DDx_VIBAC - square with the acute respiratory condition flag (DDx_Acute)? Time for another cross-tabulation of those 19,457 in-hospital deaths:

Now this is interesting because I would have bet money on a stronger relationship. We observe that 2,216/3,898 (56.8%) of cases with an alleged respiratory pathogen of some sort were also diagnosed with an acute respiratory condition such as pneumonia, ARDS or respiratory arrest. Ideally we would take that 43.2% as a measure of the resilience of the human body supported by advances in modern medicine, but we can’t rule out false positives at this stage. This table also reveals 1,948/15,559 (12.5%) of deaths with an associated acute respiratory condition but no infectious agent, these invariably being folk whose chronic disease state has deteriorated.

I guess we can be pretty sure about those 2,216 infected folk with an acute respiratory condition and those 13,611 non-infected folk not suffering an acute respiratory condition which makes for 15,827/19,457 (81.3%) of the data sample we can have some sort of confidence in. That’s not bad going!

Battle Of The Giants

Subsets are one way of going about unravelling this diagnostic tangle, another is to model the diagnosis of interest then use the predicted values in place of the observed values, this being a technique that is related to propensity score methods. That is to say, we use a fancy multivariate model to formulate the propensity for a COVID diagnosis to be given to an individual patient. This should iron out some of the sample bias and push a positive test result toward something akin to a valid test for infection. That’s the theory in any case, and I’ll now have a stab at this.

In my data sample the indicator variable COVID-19 Dx encodes the performance of the PCR test in a clinical setting, which is a subset of my newly forged DDx_VIBAC that throws the net around all acute respiratory infections. Let us start with COVID-19 Dx and use two modelling techniques: classic multivariate logistic regression and a new-fangled feed forward neural network approach known as multilayer perceptron.

Logistic Regression Method

The model structure for the logistic regression was sizeable, there being 20 main effects and no less than 190 two-way effects and 1,140 three-way effects. I don’t have a tea pot big enough for all those three-way effects so stuck to two-way as my nod to model sophistication. Herewith a summary table listing those 20 main effects so you can see what was thrown into the pot: