Beyond the Odds

Unravelling the Enigma of Non-Collapsibility

OR
Collapsibility
Author

Ryan Batten

Published

June 9, 2023

Measures of Effect

Epidemiologists and clinical researchers use different metrics to quantify the relationship between the exposure and outcome. Commonly used measures include the risk ratio (RR), risk difference (RD), odds ratio (OR), mean difference (MD) and hazard ratio (HR). Each one of these has benefits and drawbacks, including the type of outcome (binary, continuous, time-to-event), how these can apply to other samples/populations (often referred to as transportability or portability) (Doi et al. 2022) and collapsibility. For this post, we’re going to focus on the last one.

Example time!

Rather than bore you with a bunch of jargon up front, it’s always better to work through an example first and then bore you with jargon (just kidding!….maybe…maybe not).

Let’s say that we have two animal types (turtles and lions) and whether they caught a ball (yes/no). There is also sometimes a zookeeper that is around. Now this zookeeper can throw the animals a ball, which makes a difference on if they catch it or not. The first thing to do is to look at our fancy pants data we have. Let’s separate it by whether a zookeeper was present or not.

Code
library(tidyverse)

set.seed(123)

n = 500

animal_type <- rbinom(n, size = 1, prob = 0.5) # 0 for turtle, 1 for lion
zookeeper_present <- rbinom(n, size = 1, prob = 0.5) # 0 for absent, 1 for present
ball_catch <- rbinom(n, 1, plogis(-1 + 2*animal_type + zookeeper_present))

df <- data.frame(animal_type, zookeeper_present, ball_catch)
Zookeeper Present
Ball Not Caught Ball Caught Total
Turtle 65 70 135
Lion 13 110 123
Total 78 180 258
Zookeeper Absent
Ball Not Caught Ball Caught Total
Turtle 95 35 130
Lion 30 82 112
Total 125 117 242
Overall
Ball Not Caught Ball Caught Total
Turtle 160 105 265
Lion 43 192 235
Total 203 197 500

Now let’s get down to some analysis. If we calculate the odds ratio for each of these tables, we’d find ORs of: 7.86 for when the zookeeper is present, 7.42 when the zookeeper is absent and 6.80 overall. You may be wondering where this difference comes from. Perhaps these are just different numbers and we should expect this? At first, you’d be tempted to think that it’s just a difference and perhaps if we average them it’ll fix it. However that’s not the case, the weighted average of 7.86 and 7.42 won’t give us 6.80. So what do we do? Maybe we’ll try a GLM.

What if we use a GLM?

Let’s fit two different models. One will be unadjusted, to estimate the marginal effect while the other we will adjust for zookeeper type (aka include it in the model).

Code
fit1 <- glm(ball_catch ~ animal_type, 
            family = binomial(link = "logit"), 
            data = df)

fit2 <- glm(ball_catch ~ animal_type + zookeeper_present, 
            family = binomial(link = "logit"),
            data = df)

round(exp(coef(fit1)['animal_type']), 2)
animal_type 
        6.8 
Code
round(exp(coef(fit2)['animal_type']), 2)
animal_type 
        7.6 

As you can see, the marginal effect is still 6.8 while the conditional effect is 7.6. “Gotta be confounding…or…some weird magical bias or…”. If you’re thinking that, relax. It’s a known “issue” (more on if it’s an issue later), that happens with odds ratios as a measure. We have simulated the data so we know it’s not due to confounding (zookeeper being present isn’t a confounder but is predictive of the outcome). What is happening here is something called non-collapsibility.

What exactly do we mean by collapsible?

Greenland and Pearl define collapsibility as “when an adjustment does not alter a measure, the measure is said to be collapsible over C or invariant with respect to the adjustment. Conversely, if an adjustment alters a measure the measure is said to be non-collapsible over C” (Greenland and Pearl 2011). So what exactly does this mean?

Well it means that even when there is no confounding whether we include a covariate in our model matters for the magnitude of our treatment (if that covariate impacts the outcome) (Daniel, Zhang, and Farewell 2021). The aforementioned paper, illustrates a good way to tell if we should expect a measure to be collapsible or not. I won’t go into the mathematics behind this. Based on these magical mathematics, we know that odds ratios and hazard ratios are non-collapsible.

Magical Mathematics

If you are actually interested in what I’m referring to, here’s the nitty gritty (keep in mind it’s a very, very brief summary). For generalized linear models, an important component is the link function. (Daniel, Zhang, and Farewell 2021) highlight five link functions: identity, log, logit, complementary log-log and probit. They use a function called the characteristic collapsibility function which can be used to explain the difference in collapsible vs noncollapsible measures.

\[ g_\nu(.) = f^{-1}{f(.) + \nu} \]

Using this, they demonstrate that by not allowing models to predict probabilities outside of [0,1] causes these bendy bits in the graph. As a result of this, there is noncollapsibility for ORs and HRs.

So what’s the problem?

Selecting the appropriate effect measure and the inclusion of variables in a model is an important part of any study. This noncollapsibility issue isn’t a be all end all. There’s a great blog post by Frank Harrell that goes into more detail if you’re interested (“Unadjusted Odds Ratios Are Conditional” 2020).

Basically, as long as we know that it’s a fault of the OR then we can be aware of that. Everything in life has faults, but knowing this is an issue allows us to be careful of it. I work mostly with observational data, where not adjusting for covariates isn’t really an option whether it be through weighting, matching or outcome regression.

We just need to be mindful of comparing ORs across studies, especially if pooling them together. There are great resources of how to tackle this, or to convert from marginal to conditional ORs. Basically, to sum up it’s important to ask “what is being estimated?” including whether the measure is marginal or conditional.

References

Daniel, Rhian, Jingjing Zhang, and Daniel Farewell. 2021. “Making Apples from Oranges: Comparing Noncollapsible Effect Estimators and Their Standard Errors After Adjustment for Different Covariate Sets.” Biometrical Journal 63 (3): 528–57.
Doi, Suhail A, Luis Furuya-Kanamori, Chang Xu, Tawanda Chivese, Lifeng Lin, Omran AH Musa, George Hindy, Lukman Thalib, and Frank E Harrell Jr. 2022. “The Odds Ratio Is ‘Portable’ Across Baseline Risk but Not the Relative Risk: Time to Do Away with the Log Link in Binomial Regression.” Journal of Clinical Epidemiology 142: 288–93.
Greenland, Sander, and Judea Pearl. 2011. “Adjustments and Their Consequences—Collapsibility Analysis Using Graphical Models.” International Statistical Review 79 (3): 401–26.
“Unadjusted Odds Ratios Are Conditional.” 2020. https://www.fharrell.com/post/marg/.