Generalized Linear Mixed-Effect Regression Models: Binary Outcomes (work in progress)

Dataset description

The following tutorial is based on chapter 9 from Donald Hedeker and Robert Gibbons’ textbook “Longitudinal Analysis” and focuses on specifying mixed effect regression models in R and their interpretation using helper functions from the broom.mixed and gt_summary packages.

The example dataset comes from the National Institutes of Mental Health Schizophrenia Collaborative Study. In this study, patients with schizophrenia were randomized into one of two groups, a placebo group and a drug group. The drug group received 1 of 3 anti-psychotic medications including chlorpromazine, fluphenazine, or thioridizine. All patients receiving an anti-psychotic medication were combined into one group for the following analyses. Most patients were measured at weeks 0, 1, 3, and 6, but the some were measured at weeks 2, 4, and 5.

Hedeker & Gibbons focus on Item 79 (“Severity of Illness”) of the Inpatient Multidimensional Psychiatric Scale (IMPS). Item 79 was originally captured as a 7-point scale with responses ranging from “normal, not at all ill” to “among the most extremely ill.” In order to illustrate the implementation of mixed effect regression models for binary outcomes, Item 79 was dichotomized.

The primary research question is to determine if there is a difference between the two treatment groups in how Imps79 scores change over time.

library(tidyverse)
library(gtsummary)
library(dfmtbx)

Exploratory data analysis

# Table 9.1
# Number of non-missing values by week and treatment group
data %>%
  drop_na(tx) %>%
  mutate(tx = ifelse(tx == 0, "Placebo", "Drug")) %>%
  group_by(tx) %>%
  summarize(
    N = sum(week == 0),
    Week0 = sum(!is.na(imps79) & week == 0),
    Week1 = sum(!is.na(imps79) & week == 1),
    Week2 = sum(!is.na(imps79) & week == 2),
    Week3 = sum(!is.na(imps79) & week == 3),
    Week4 = sum(!is.na(imps79) & week == 4),
    Week5 = sum(!is.na(imps79) & week == 5),
    Week6 = sum(!is.na(imps79) & week == 6)
  ) %>%
  format_gt_tbl()
Reproduction of Table 9.1 Number of non-missing values by week and treatment group
tx N Week0 Week1 Week2 Week3 Week4 Week5 Week6
Drug 329 327 321 9 287 9 7 265
Placebo 108 107 105 5 87 2 2 70

Generate summary statistics

# Tabulate the proportion of participants with an imps79 score of 3.5 or more
tab_9.2.1 <- data %>%
  drop_na(tx) %>%
  filter(week %in% c(0, 1, 3, 6)) %>%
  mutate(week = str_c("Week", week)) %>%
  group_by(tx, week) %>%
  summarise(
    prop = mean(imps79_bin, na.rm = TRUE), .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = week,
    values_from = prop) %>%
  mutate(tx = ifelse(tx == 0, "Placebo", "Drug")) %>%
  ungroup()

tab_9.2.1 %>%
  format_gt_tbl()
Observed proportions of >= “moderately ill”
tx Week0 Week1 Week3 Week6
Placebo 0.981 0.905 0.885 0.714
Drug 0.988 0.822 0.659 0.423
# Calculate the odds for each group in weeks 0, 1, 3, and 6
tab_9.2.2 <- data %>%
  drop_na(tx) %>%
  filter(week %in% c(0, 1, 3, 6)) %>%
  group_by(tx, week) %>%
  summarise(
    ill = sum(imps79_bin, na.rm = TRUE),
    not_ill = sum(imps79_bin == 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(odds = ill / not_ill) %>%
  select(tx, week, odds) %>%
  mutate(week = str_c("Week", week)) %>%
  pivot_wider(names_from = week, values_from = odds) %>%
  mutate(tx = ifelse(tx == 0, "Placebo", "Drug"))

tab_9.2.2 %>%
  format_gt_tbl()
Observed odds of >= “moderately ill”
tx Week0 Week1 Week3 Week6
Placebo 52.50 9.50 7.70 2.50
Drug 80.75 4.63 1.93 0.73
# Calculate the log(odds) [logits]
tab_9.2.3 <- tab_9.2.2 %>%
  mutate(across(-tx, ~ log(.x)))

tab_9.2.3 %>%
  format_gt_tbl()
Observed log odds (logits) of >= “moderately ill”
tx Week0 Week1 Week3 Week6
Placebo 3.96 2.25 2.04 0.92
Drug 4.39 1.53 0.66 −0.31
# Differences in the log(odds) for each group in each week
tab_9.2.4 <- tab_9.2.3 %>%
  arrange(tx) %>%
  select(-tx) %>%
  summarise_all( ~ diff(.x))
  
# Exponentiated differences in log odds
tab_9.2.5 <- tab_9.2.4 %>%
  mutate(across(everything(), ~ exp(.x)))

# Bind rows and display
bind_rows(
  tab_9.2.4,
  tab_9.2.5) %>%
  mutate(value = c("diff", "exp(diff)")) %>%
  select(value, everything()) %>%
  format_gt_tbl()
Difference in logits bettween groups and the exponentiated difference
value Week0 Week1 Week3 Week6
diff −0.43 0.72 1.38 1.23
exp(diff) 0.65 2.05 3.99 3.42

Visualize data

One approach to visualizing data for a logistic regression is to plot the logits, since these effectively serve as the dependendent variables. In the first plot, we notice an “elbow” or knot at week 1 that shows a steeper rate of change when compared to time > week 1. In the text, Hedeker & Gibbons mention that one approach to address this situation would be to employ polynomial trends in the model. However, they choose to model time expressed as the square root of week, shown in the 2nd plot. Using the square root of week produces a much more linear relationship and is more parsimonious than adding multiple polynomial terms. The drawback lies in the interpretation of the time variable.

# Figure 9.6
# Plot the logits (log(odds)) vs time in weeks
tab_9.2.3 %>%
  pivot_longer(cols = -tx, names_to = "week", values_to = "logit") %>%
  mutate(week = as.numeric(substr(week, 5, 5))) %>%
  ggplot(aes(x = week, y = logit, color = tx, group = tx)) +
  geom_line() +
  geom_point() +
  theme_minimal() +
  theme(legend.position = "bottom")

Observed logits vs Time in weeks
# Figure 9.7
# Plot the logits (log(odds)) vs time in square root of weeks
tab_9.2.3 %>%
  pivot_longer(cols = -tx, names_to = "week", values_to = "logit") %>%
  mutate(week = as.numeric(substr(week, 5, 5))) %>%
  mutate(week = sqrt(as.numeric(week))) %>%
  ggplot(aes(x = week, y = logit, color = tx, group = tx)) +
  geom_line() +
  geom_point() +
  theme_minimal() +
  theme(legend.position = "bottom")

Observed logits vs Time in square root of weeks

Random intercept model

The random intercept model presented in the text was likely fitted using SAS’s glimmix procedure and the quadrature option. To obtain reasonably similar results we set the nAGQ option to 15 in the specification of the glmer() function.

# Random Intercept logistic regression model
model_tab_9.4 <- lme4::glmer(
  imps79b ~ tx*sweek + (1 | id),
  family = "binomial",
  data = data,
  nAGQ = 15
)

broom.mixed::tidy(model_tab_9.4, conf.int = TRUE) %>%
  select(-(effect:group)) %>%
  format_gt_tbl()
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 5.39 0.631 8.54 < 0.001 4.15 6.62
tx −0.02 0.653 −0.04 0.970 −1.31 1.26
sweek −1.50 0.291 −5.16 < 0.001 −2.07 −0.93
tx:sweek −1.01 0.334 −3.04 0.002 −1.67 −0.36
sd__(Intercept) 2.12




Interpretation

  1. The coefficient for the (Intercept) represents the predicted log odds of being moderately ill or greater in the placebo group at baseline when the time variable sweek is equal to 0.
  2. The treatment main effect is -0.02. This represents the difference in odds of being moderately ill or greater between the drug and placebo baseline (sweek = 0). The coefficient, -0.02, is approximately equal to 0.98 when exponentiated. This suggests that the odds of moderately ill or greater are 2% lower the drug group compared to the placebo group while holding time constant. However this effect is not statistically significant.
  3. The time main effect is -1.5. This coefficient represents the change in odds of being moderately ill or greater for each 1-unit increase in time for the placebo group. The exponentiated coefficient is approximately equal to 0.22 and suggests that each 1-unit increase in time is associated with a 78% decrease in the odds of being moderately ill or greater. This effect is statisitically significant and suggest improvement in Imps79 scores over time.
  4. The interaction effect is -1.01. This coefficient represents how the time slope in the drug group differs from the placebo group. The exponentiated coefficient is approximately equal to 0.36 and suggests that the effect of time in the drug group is 64% lower than that of the placebo group. -1.5 + (-1.01) = -2.51. The exponentiated value of -2.51 is approximately 0.08. This suggests that each 1-unit increase in time is associated with a 92% decrease in the odds of being moderately ill or greater in the drug group.

Random intercept and trend model

One limitation of the glmer() function is that setting the nAGC option to a value of greater than 1 is only available for models with one random effect. Adding a random trend to the model and using nAGC = 15 will produce an error. To overcome this, we can add the 2nd random effect for trend and use the nAGC option with a value of greater than 1 in the mixed_model() function of the GLMMadaptive package. The GLMMadaptive package uses a slightly different syntax. The random effects are specified as a separate argument rather than including them in the model formula with the fixed effects. In this respect, the structure more closely mimics how random effects are specified in SAS using a combination of the MODEL and RANDOM statements.

model_tab_9.6 <- GLMMadaptive::mixed_model(
  imps79b ~ tx * sweek,
  ~ 1 + sweek | id,
  family = "binomial",
  data = data,
  nAGQ = 15
)

broom.mixed::tidy(model_tab_9.6) %>%
  bind_rows(.,
    tibble("term" = "sd__(Intercept)",
    "estimate" = 2.62,
    "std.error" = NA,
    "statistic" = NA,
    "p.value" = NA,
    "conf.low" = NA, 
    "conf.high" = NA)
    ) %>%
  format_gt_tbl()
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 5.91 0.941 6.28 < 0.001 4.07 7.76
tx 0.28 0.741 0.38 0.702 −1.17 1.74
sweek −1.40 0.472 −2.96 0.003 −2.32 −0.47
tx:sweek −1.61 0.479 −3.35 < 0.001 −2.54 −0.67
sd__(Intercept) 2.62




Interpretation

  1. The coefficient for the (Intercept) represents the predicted log odds of being moderately ill or greater in the placebo group at baseline when the time variable sweek is equal to 0.
  2. The treatment main effect is 0.28. This represents the difference in odds of being moderately ill or greater between the drug and placebo baseline (sweek = 0). The coefficient, 0.28, is approximately equal to 1.32 when exponentiated. This suggests that the odds of moderately ill or greater are 32% higher in the drug group compared to the placebo group while holding time constant. However this effect is not statistically significant.
  3. The time main effect is -1.4. This coefficient represents the change in odds of being moderately ill or greater for each 1-unit increase in time for the placebo group. The exponentiated coefficient is approximately equal to 0.25 and suggests that each 1-unit increase in time is associated with a 75% decrease in the odds of being moderately ill or greater. This effect is statisitically significant and suggest improvement in Imps79 scores over time.
  4. The interaction effect is -1.61. This coefficient represents how the time slope in the drug group differs from the placebo group. The exponentiated coefficient is approximately equal to 0.20 and suggests that the effect of time in the drug group is 80% lower than that of the placebo group. -1.40 + (-1.61) = -3.01. The exponentiated value of -3.01 is approximately 0.05. This suggests that each 1-unit increase in time is associated with a 95% decrease in the odds of being moderately ill or greater in the drug group.

References

Hedeker, Donald; Gibbons, Robert D. (2006). Longitudinal Data Analysis. Wiley.