Bayesian logistic regression

Learning Objectives

Learn how to fit logistic regression model with brms
Understand how to interpret and report regression posterior results

1. Logistic regression

1.1 Models for Binary Data

For binary outcome Y, we are interested to model the probability of $\pi_i = P(Y_i=1)$, $i=1, \ldots, n$

\[Y_i \mid \pi_i, n_i \sim Bern(\pi_i)\]

\[ E(Y_i \mid \pi_i) = \pi\]

\[logit( \pi_i) = \beta_0 + \beta_1 X_{i1}\]

logistic regression model featuring logit link that connects binary outcome to a conventional linear regression format, is part of the broader class of generalized linear models
Logit link convert probabilities to log odds.

\[\log(\frac{\pi_i}{1-\pi_i}) = \log(\frac{P(Y_i=1)}{1- P(Y_i=1)})\]

\[\log(\frac{\pi_i}{1-\pi_i}) = \beta_0 + \beta_1 X_{i1}\]

Code

p1<-ggplot(data.frame(x = c(0, 1)), aes(x)) + 
  stat_function(fun = qlogis, n = 501) + 
  xlab(expression(pi)) + 
  ylab(expression(logit(pi))) + 
  ggtitle("logit function")+
  theme_bw()
p2<- ggplot(data.frame(x = c(-5, 5)), aes(x)) + 
  stat_function(fun = plogis, n = 501) + 
  ylab(expression(pi)) +
  xlab(expression(logit(pi)))+
  ggtitle("logit function")+
  theme_bw()
ggarrange(p1, p2, nrow = 1)

1.2 Logistic regression model and assumptions

Consider the logistic regression model $Y$ with covariates $X_i, \ldots, X_p$:

\[ \log(\frac{\pi_i}{1-\pi_i}) = \beta_0 + \beta_1 X_{i1} + \ldots + \beta_p X_{ip} \]

logistic regression coefficients is interpreted as log odds ratio!
Furthermore, we can recalculate probability using the inverse logit function (aka, expit function)

\[\pi_i = \frac{\exp(\beta_0 + \beta_1 X_{i1} + \ldots + \beta_1 X_{ip})}{1+\exp(\beta_0 + \beta_1 X_{i1} + \ldots + \beta_p X_{ip})}\]

Assumptions of logistic regression models
1. Independent observations. This assumption can be modified if working with clustered data.
2. Linear relationship between continuous covariates and log-odds We can check this assumption by examining marginal plots comparing the model predicted relationship between outcome (log-odds or logit-probability) and each continuous covariates.
3. Multicollinearity Multicollinearity is the phenomenon when a number of the explanatory variables are strongly correlated.
4. Correctly specified regression model This means that all relevant predictors for the response variable have been included in the model. This is often difficult to verify, but we can use posterior predictive distribution to check for regression fit. We can also use WAIC and LOO to compare different models.
Comparing to linear regression model we no-longer require the residuals to be normally distributed and homoscedasticity.

1.3 Working example: diagnostic Sensitivity analysis

10 studies taken from (Bastos et al. 2020) on Diagnostic accuracy of serological tests for covid-19: systematic review and meta-analysis. BMJ. 2020
Each study reports observed positive chemiluminescent immunoassa (CLIA, $r_i$) and number having positive reference standard reverse transcriptase polymerase chain reaction (RT-PCR, $n_i$)
Interest lies in summarizing this collection of values

Code

dat <- read.table("data/COVIDtest.txt", 
                  sep = "",
                  header=T)
dat <- dat %>% 
  mutate(r = TP,
         n = TP + FN,
         age = round(rnorm(10, 70*sqrt(TP/(TP + FN)), 8)))


dat %>% 
  select(Study, r, n, age) %>%
  datatable(
  rownames = T,
  class = "compact",
  options = list(
    dom = 't',
    ordering = FALSE,
    paging = FALSE,
    searching = FALSE,
    columnDefs = list(list(className = 'dt-center', 
                      targets = 0:3))))

Unpooled model

recall from last week, We can assume a binomial likelihood for each study (unpooled),

\[r_i \sim Bin(p_i, n_i) \]

we can assume that the 10 true sensitivities $p_i$ are independent by fitting different prior to each one: \[p_i \sim Beta(1,1)\]
we will obtain 10 separate posteriors $P(p_i \mid r_i, n_i)$
- For example, we do not assume that $p_1$ tells us anything about the value of $p_2$

Pooled

Suppose we wanted to estimate the overall sensitivity of the CLIA test
- We do not want 10 separate estimates but a single estimate
If we thought that the studies all had the same true sensitivity p0, we could fit a model like this:

Assume a binomial likelihood for each study

\[r_i \sim Bin(p_i, n_i) \]

Assume that all studies have the same true sensitivity, $p_0$: \[ p_i = p_0\]
Put a prior on $p_0$: $p_0 \sim Beta(1,1)$

Different prior on $p_0$

The beta prior is useful for models with little additional structure
- e.g., estimating a single proportion; estimating a collection of proportions
It is difficult to use it to a model, where we want to include predictors of $p_i$
For most modelling approaches to binary data, we will use
- the logit transformation of $p_i$
- normal priors for parameters on this logit scale (e.g., normal prior for log-odds)

Logistic models

We still have the same likelihood $r_i \sim Bin(p_i, n_i)$
We define a new parameter using the logit transformation function:

\[logit(p_i) = log(\frac{p_i}{1-p_i}) = \alpha_i\] \[p_i = \frac{\exp(\alpha_i)}{1+\exp(\alpha_i)}\]

$0<p_i<1 \Rightarrow -\infty < \alpha_i < \infty$
Thus, it’s quite reasonable to use normal prior to characterize $\alpha_i$

Model 1. Pooled model with non-informative prior

\[r_i \mid p_i, n_i \sim Bin(p_i, n_i)\] \[log(\frac{p_i}{1-p_i}) = \alpha\] \[\alpha \sim N(0, \sigma = 5)\]

fit1 <- brm(r | trials(n) ~ 1,
  data = dat,
  prior = prior(normal(0,5), class=Intercept),
  family = binomial,
  sample_prior = T, #asking brms to generate prior sample!
  seed = 123)
saveRDS(fit1, "data/chp5_binary1")

Code

fit1 <- readRDS("data/chp5_binary1")
print(fit1)

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 
   Data: dat (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.33      0.06     1.21     1.46 1.01     1425     1875

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Model 2. Unpooled model with non-informative prior

\[r_i \mid p_i, n_i \sim Bin(p_i, n_i)\] \[log(\frac{p_i}{1-p_i}) = \beta_i \ \text{Study}_i\]

\[\beta_i \sim N(0, \sigma = 5)\]

fit2 <- brm(r | trials(n) ~ 0+Study,
  data = dat,
  prior =  c(prior(normal(0, 5), class = b)),
  family = binomial,
  seed = 123)
saveRDS(fit2, "data/chp5_binary2")

Code

fit2 <- readRDS("data/chp5_binary2")
print(fit2)

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 0 + Study 
   Data: dat (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
StudyCai           0.29      0.12     0.06     0.53 1.00     7694     2738
StudyInfantino     0.97      0.29     0.41     1.54 1.00     7996     2987
StudyJin          -0.07      0.38    -0.84     0.68 1.00     7548     2814
StudyLin           1.56      0.29     1.02     2.13 1.01     7693     3507
StudyLou           1.86      0.32     1.26     2.53 1.00     6928     2783
StudyMa            3.45      0.39     2.76     4.27 1.00     7110     3015
StudyQian          1.81      0.13     1.56     2.07 1.00     7295     2846
StudyXie           2.99      1.12     1.20     5.54 1.00     6428     2723
StudyYangchun      0.86      0.15     0.57     1.16 1.00     8293     2770
StudyZhong         4.13      1.13     2.37     6.87 1.00     5127     1952

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Model 3. Partial pooling model with non-informative prior

\[r_i \mid p_i, n_i \sim Bin(p_i, n_i)\] \[log(\frac{p_i}{1-p_i}) = \alpha + b_i\] \[\alpha \sim N(0, \sigma = 5)\] \[b_i \sim N(0, \sigma )\] \[\sigma \sim half-t(0,10, df=3)\]

fit3 <- brm(r | trials(n) ~ 1+(1|Study),
  data = dat,
  prior =  c(prior(normal(0, 5), class = Intercept),
             prior(student_t(3, 0, 10), class = sd)),
  family = binomial,
  seed = 123)
saveRDS(fit3, "data/chp5_binary3")

Code

fit3 <- readRDS("data/chp5_binary3")
print(fit3)

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | Study) 
   Data: dat (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~Study (Number of levels: 10) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.38      0.45     0.77     2.47 1.00      725     1311

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.62      0.45     0.74     2.59 1.01      698     1030

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# comparing models;
loo1 <- loo(fit1)
loo2 <- loo(fit2)
loo3 <- loo(fit3)
loo_compare(loo1, loo2, loo3)

     elpd_diff se_diff
fit2   0.0       0.0  
fit3  -1.1       0.6  
fit1 -96.7      45.0

# define expit function; 
# transform model estimated log odds to probability scale;
expit <- function(x){exp(x)/(1+exp(x))}

## 1.  Create one row per Study with n = 1 so that
##     the expected number of successes equals the success-probability
newdat <- dat %>% 
  distinct(Study) %>% 
  mutate(n = 1)      # one Bernoulli trial

# 2. Draw posterior expectations that INCLUDE the random intercept
draws <- posterior_epred(
  fit3,
  newdata    = newdat,
  re_formula = NULL   # keep group-level effects
)                   

# summarize posterior probability;
prob_summaries <- t(apply(draws, 2, quantile, probs = c(.025, .5, .975))) %>% 
  as.data.frame() %>% 
  setNames(c("median", "lower_95",  "upper_95")) %>% 
  mutate(Study = newdat$Study)

prob_summaries

      median  lower_95  upper_95     Study
1  0.7372276 0.8262236 0.8970372       Lin
2  0.9338387 0.9640941 0.9834530        Ma
3  0.5183541 0.5753933 0.6332119       Cai
4  0.6106512 0.7307572 0.8314371 Infantino
5  0.8869331 0.9625789 0.9934809     Zhong
6  0.3354143 0.5192255 0.6983388       Jin
7  0.7578684 0.9154367 0.9854626       Xie
8  0.6438882 0.7064076 0.7646523  Yangchun
9  0.8266485 0.8589106 0.8881886      Qian
10 0.7807128 0.8639924 0.9254228       Lou

both model 2 and model 3 fit the data well. There is no substantial difference in model fit using LOO between model 2 and 3. If we are to select the best model, we will proceed with model 3.

Generalized linear models

We can specify the type of model by two features - Family: type of outcome (binomial, normal, Poisson, gamma) - Link: relationship between mean parameter and predictor (identity, logic, log)

we have models in the form

\[ Y_i \sim Dist(\mu_i, \tau)\] \[g(\mu_i) = \beta_0 + \beta_1 X_{i1} + \ldots + \beta_1 X_{ip}\]

$g()$ is called the link function
Distributions family (exponential family) can be specified as Normal, Poisson, Binomial, Bernoulli, etc.
Usually, we use the logit for binary data and gets odds ratios (logistic model)
- If we use the log link, we can estimate relative risks

Sample from priors in brms

We can specify in brms to generate samples from our specified prior as part of the output
we can use these prior samples
1. to compare with the posteriors to see how much the prior is updated by the data
2. to check what information is in the prior
using the pooled model as an example

\[r_i \mid p_i, n_i \sim Bin(p_i, n_i)\] \[log(\frac{p_i}{1-p_i}) = \alpha\] \[\alpha \sim N(0, \sigma = 5)\]

fit1 <- brm(r | trials(n) ~ 1,
  data = dat,
  prior = prior(normal(0,5), class=Intercept),
  family = binomial,
  sample_prior = T, #asking brms to generate prior sample!
  seed = 123)

checking posterior results
- computing posterior predicted probability of event (in this case the sensitivity of the CLIA test)

#posterior draws of log-odds estimate - the intercept;
posterior <- posterior_samples(fit1)

#posterior predicted probability of event;
prob.pooled <- expit(posterior$b_Intercept)

mean(prob.pooled)

[1] 0.7914033

quantile(prob.pooled, c(0.025, 0.975))

     2.5%     97.5% 
0.7711078 0.8119101

comparing prior and posterior distribution visually

data.frame(Probability = c(inv_logit_scaled(posterior[,"b_Intercept"]),
                inv_logit_scaled(posterior[,"prior_Intercept"])),
           Type=rep(c("Posterior","Prior"),each=nrow(posterior))) %>%
  ggplot(aes(x=Probability,fill=Type))+
  geom_density(size=1,alpha=0.5)+
  labs(title="Prior: alpha ~ N(0,5); probability = exp(alpha)/(1+exp(alpha))")+
  theme_bw()+
  scale_fill_manual(values=c("goldenrod","steelblue"))

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

2. Logistic regression with predictors

Adding one predictor, age, for each study in modelling CLIA sensitivity
We have the following Bayesian model

\[r_i \mid p_i, n_i \sim Bin(p_i, n_i)\] \[log(\frac{p_i}{1-p_i}) = \alpha + \beta \ age_i\] \[\alpha \sim N(0, \sigma = 5)\] \[\beta \sim N(0, \sigma = 5)\]

$\beta$ is interpreted as the changes on the log-OR associate 1 unit increase in age.

fit4 <- brm(r | trials(n) ~ age,
  data = dat,
  prior = c(prior(normal(0, 5), class = Intercept),
             prior(normal(0, 5), class = b)),
  family = binomial,
  cores = 4,
  seed = 123)
saveRDS(fit4, "data/chp5_binary4")

checking posterior results
- computing posterior predicted probability of event (in this case the sensitivity of the CLIA test)

fit4 <- readRDS("data/chp5_binary4")

#posterior draws of log-odds estimate - the intercept;
posterior <- posterior_samples(fit4)

# beta posterior summary
exp(mean(posterior$b_age))

[1] 1.070023

exp(quantile(posterior$b_age, c(0.025, 0.975)))

    2.5%    97.5% 
1.057597 1.083063

#posterior predicted probability testing positive for each study;
pp<-posterior_predict(fit4)
mean.pp<-colMeans(pp)/dat$n
lci.pp<- apply(pp, 2, function(x) quantile(x, 0.025))/dat$n
uci.pp<- apply(pp, 2, function(x) quantile(x, 0.975))/dat$n

pp.table <- data.frame(Study = dat$Study, 
                       r = dat$r, 
                       n = dat$n,
                       Prob_obs = round(dat$r/dat$n,2),
                       Prob_post = round(mean.pp,2),
                       CI_post = paste0("(",round(lci.pp,2),",",round(uci.pp,2),")"))

pp.table %>% 
  datatable(
  rownames = F,
  class = "compact",
  options = list(
    dom = 't',
    ordering = FALSE,
    paging = FALSE,
    searching = FALSE,
    columnDefs = list(list(className = 'dt-center', 
                      targets = 0:5))))

2.1 Centring continuous variable

our model says that sensitivity is different across studies according to the average age

\[P(\text{test positive}) = \frac{\exp(\alpha + \beta \ age)}{1+ \exp(\alpha + \beta \ age)}\]

We could obtain a prediction at the average of the ages mean(dat$age) across the studies

\[P(\text{test positive}) = \frac{\exp(\alpha + \beta \times mean( age))}{1+ \exp(\alpha + \beta \times mean( age))}\]

We could “centre” age at its average so that the intercept refers to the probability at the average age

\[r_i \mid p_i, n_i \sim Bin(p_i, n_i)\] \[log(\frac{p_i}{1-p_i}) = \alpha + \beta (\ age_i - mean(age))\] \[\alpha \sim N(0, \sigma = 5)\] \[\beta \sim N(0, \sigma = 5)\]

Using a centred predictor also reduces correlation between intercept and slope. This can speed up MCMC convergence in some models (computationally more efficient).

#centring variable age;
dat$age.c <- dat$age - mean(dat$age)

fit4c <- brm(r | trials(n) ~ age.c,
            data=dat,
            prior = c(prior(normal(0, 5), class = Intercept),
                      prior(normal(0, 5), class = b)),
            family = binomial,
            cores = 4,
            seed = 123)

saveRDS(fit4c, "data/chp5_binary4c")

comparing posterior distribution between centred and uncentred models

fit4c <- readRDS("data/chp5_binary4c")

#posterior draws of log-odds estimate - the intercept;
posteriorc <- posterior_samples(fit4c)

# beta posterior summary
exp(mean(posteriorc$b_age))

[1] 1.069927

exp(quantile(posteriorc$b_age, c(0.025, 0.975)))

    2.5%    97.5% 
1.057407 1.082997

posterior predicted probability testing positive for each study comparing between the centred and uncentred models
we can see the results are almost identical and we achieved this with only 5000 iterations for the centred model.

We can investigate the correlation between two posterior regression parameters, $\alpha$ (intercept) and $\beta$ (log-OR for age).
The two parameters should be uncorrelated by model assumption! If you see visible correlation, you can consider thinning your MCMC or run more iterations.

Code

p1 <- ggplot(posterior, 
             aes(b_Intercept, b_age))+
  geom_point(alpha=.1)+
  theme_bw()+
  ggtitle("Uncentred age")

p2 <- ggplot(posteriorc, aes(b_Intercept, b_age.c))+
  geom_point(alpha=0.25)+
  theme_bw()+
  ggtitle("Centred age")

ggarrange(p1,p2, nrow=1)

2.2 Model diagnostics

Posterior predictive graphic check for model fit

p1<-pp_check(fit4c, ndraws = 50)
p2<-pp_check(fit4c, type = "stat_2d", stat = c("max", "min"))
ggarrange(p1,p2, nrow = 1)

3 Additional topics on Bayesian logistic regression models

In this example, we will conduct an analysis using a Bayesian logistic regression model.

Data come from a cross-sectional study of 1,225 smokers over the age of 40.
Each participant was assessed for chronic obstructive pulmonary disease (COPD), and characteristics of the type of cigarette they most frequently smoke were recorded.
The objective of the study was to identify associations between COPD diagnosis and cigarette characteristics

We will use the following variables from this data set:

TYPE: Type of cigarette, 1 = Menthol, 0 = Regular
NIC: Nicotine content, in mg
TAR: Tar content, in mg
LEN: Length of cigarette, in mm
FLTR: 1 = Filter, 0 = No Vlter
copd: 1 = COPD diagnosis, 0 = no COPD diagnosis

We fit a simple logistic regression, predicting the risk of Copd by Type of cigarette, Nicotine content, and Filter status.
We specify uninformative prior on log.odds, $N(0,10)$

m1_bern <- brm(copd ~ TYPE + NIC + FLTR, 
               data = copd, 
               family = bernoulli(link = "logit"), 
               prior = c(prior(normal(0,10), class = "b"), 
                         prior(normal(0,10), class = "Intercept")),
               seed = 123, silent = 2, refresh = 0)

saveRDS(m1_bern, file="data/chp6_logistic")

m1_bern <- readRDS("data/chp6_logistic")
summary(m1_bern)

 Family: bernoulli 
  Links: mu = logit 
Formula: copd ~ TYPE + NIC + FLTR 
   Data: copd (Number of observations: 1225) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -2.94      0.64    -4.20    -1.71 1.00     3108     2483
TYPE          0.25      0.21    -0.19     0.66 1.00     3480     2501
NIC           1.30      0.35     0.61     1.99 1.00     3306     2723
FLTR         -0.60      0.44    -1.42     0.29 1.00     3344     2878

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# Directly from the posterior of exp(beta):
draws_beta_TYPE <- as.matrix(m1_bern, pars = "b_TYPE")
exp_beta_TYPE <- exp(draws_beta_TYPE)
quantile(exp_beta_TYPE, probs = c(.025, 0.5, .975))

     2.5%       50%     97.5% 
0.8286483 1.2832394 1.9293951

plot(m1_bern)

Posterior Distribution + Trace plot combining all chains

Good mixing from trace plot
Rhats’ are all less then 1.01, convergence!
Other diagnostic plots below

mcmc_dens_overlay(m1_bern, pars = c("b_Intercept", "b_TYPE", "b_NIC", "b_FLTR"))

mcmc_acf_bar(m1_bern, pars = c("b_Intercept", "b_TYPE", "b_NIC", "b_FLTR"))

Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
ℹ Please use the `rows` argument instead.
ℹ The deprecated feature was likely used in the bayesplot package.
  Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.

3.1 Regression regularization and shrinkage priors

When the number of parameters to be estimated is large relative to sample size, estimation using non-informative or weakly informative priors tend to overfit.
One way to avoid overfitting is to perform regularization, that is, to shrink some of the parameters to closer to zero. This makes the model fit less well to the existing data, but will be much more generalizable to an independent data set.
Sparsity-Inducing Priors
The horseshoe prior is a type of hierarchical prior for regression models by introducing a global scale and local scale parameters on the priors for the regression coefficients.
Specifically, with $p$ predictors

\[Y_i \sim N(\mu_i, \sigma^2)\] \[\mu_i = \beta_0 + \sum_{m=1}^p\beta_m X_m\] \[\beta_0 \sim N(0,10)\] \[\beta_m \sim N(0,\tau\lambda_m)\] \[\lambda_m \sim Cauchy(10)\] \[\tau \sim Cauchy(\tau_0)\]

local scale $\lambda_m$ can flexibly shrink the coefficient to close to zero.

Code

dhs <- Vectorize(
  function(y, df = 1) {
  ff <- function(lam) dnorm(y, 0, sd = lam) * dt(lam, df) * 2
  if (y != 0) integrate(ff, lower = 0, upper = Inf)$value
  else Inf
  }
)
ggplot(data.frame(x = c(-6, 6)), aes(x = x)) +
  stat_function(fun = dhs, args = list(df = 1), n = 501,
                aes(col = "HS"), linetype = 1) +
  stat_function(fun = dnorm, n = 501, 
                aes(col = "norm"), linetype = 2) +
  scale_color_manual("", values = c("steelblue", "black"),
                     labels = c("horseshoe(1)", "N(0, 1)")) +
    xlab("y") + ylab("density") + ylim(0, 0.75)+
  theme_bw()

3.2 COPE example revisit

We fit a simple logistic regression, predicting the risk of copd by NIC, TYPE, FLTR, as well as all interaction terms between the three.

copd2 <- copd %>% select("NIC","TYPE","FLTR","copd")
fit4.1 <- brm(copd ~ (.)^2, 
               data = copd2, 
               family = bernoulli(link = "logit"), 
               prior = c(
                 prior(normal(0,10), class = "Intercept"),
                 # Prior guess of 25% of the terms are non-zero
                 prior(horseshoe(par_ratio = 2 / 8), class = "b")),
            iter = 7500,
            warmup = 5000,
            chains = 4,
            cores = getOption("mc.cores", 5), 
            seed = 123, 
            silent = 2, 
            refresh = 0)
fit4.2 <- brm(copd ~ (.)^2, 
               data = copd2, 
               family = bernoulli(link = "logit"), 
               prior = c(
                 prior(normal(0,10), class = "Intercept"),
                 prior(normal(0,10), class = "b")),
            iter = 7500,
            warmup = 5000,
            chains = 4,
            cores = getOption("mc.cores", 5), 
            seed = 123, 
            silent = 2, 
            refresh = 0)
saveRDS(fit4.1, file="data/chap7_h_example4.1")
saveRDS(fit4.2, file="data/chap7_h_example4.2")

fit4.1 <- readRDS("data/chap7_h_example4.1")
fit4.2 <- readRDS("data/chap7_h_example4.2")

summary(fit4.2)

 Family: bernoulli 
  Links: mu = logit 
Formula: copd ~ (NIC + TYPE + FLTR)^2 
   Data: copd2 (Number of observations: 1225) 
  Draws: 4 chains, each with iter = 7500; warmup = 5000; thin = 1;
         total post-warmup draws = 10000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -4.14      1.64    -7.70    -1.24 1.00     4232     4591
NIC           2.10      1.06     0.24     4.40 1.00     4173     4468
TYPE          0.36      7.00   -13.20    14.35 1.00     5180     4838
FLTR          0.62      1.71    -2.44     4.27 1.00     4082     4299
NIC:TYPE     -0.18      0.78    -1.72     1.39 1.00     6997     6004
NIC:FLTR     -0.82      1.17    -3.30     1.32 1.00     4112     4156
TYPE:FLTR     0.04      6.99   -13.85    13.62 1.00     5163     4770

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

summary(fit4.1)

 Family: bernoulli 
  Links: mu = logit 
Formula: copd ~ (NIC + TYPE + FLTR)^2 
   Data: copd2 (Number of observations: 1225) 
  Draws: 4 chains, each with iter = 7500; warmup = 5000; thin = 1;
         total post-warmup draws = 10000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -3.35      0.48    -4.13    -2.19 1.00     3095     4336
NIC           1.35      0.37     0.58     2.02 1.00     3972     4408
TYPE         -0.02      0.39    -1.34     0.46 1.03      174       28
FLTR         -0.10      0.29    -0.97     0.18 1.00     2445     1961
NIC:TYPE      0.01      0.15    -0.30     0.35 1.01      988      399
NIC:FLTR     -0.04      0.17    -0.50     0.18 1.01      711     5295
TYPE:FLTR     0.09      0.43    -0.22     1.55 1.03      175       29

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Using the informative data-driven (variable selection) shrinkage priors, we can see that TYPE and NIC:TYPE as minimal effect on the risk of COPD. Therefore, we can reduce our model by removing these two fixed effects.

fit4.3 <- brm(copd ~ NIC + FLTR + NIC:FLTR + TYPE:FLTR, 
               data = copd2, 
               family = bernoulli(link = "logit"), 
               prior = c(
                 prior(normal(0,10), class = "Intercept"),
                 prior(normal(0,10), class = "b")),
            iter = 7500,
            warmup = 5000,
            chains = 4,
            cores = getOption("mc.cores", 5), 
            seed = 123, 
            silent = 2, 
            refresh = 0)
saveRDS(fit4.3, file="data/chap7_h_example4.3")

fit4.3 <- readRDS("data/chap7_h_example4.3")
summary(fit4.3)

 Family: bernoulli 
  Links: mu = logit 
Formula: copd ~ NIC + FLTR + NIC:FLTR + TYPE:FLTR 
   Data: copd2 (Number of observations: 1225) 
  Draws: 4 chains, each with iter = 7500; warmup = 5000; thin = 1;
         total post-warmup draws = 10000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -4.15      1.64    -7.66    -1.25 1.00     3205     3950
NIC           2.10      1.06     0.21     4.37 1.00     3153     3764
FLTR          0.69      1.67    -2.28     4.23 1.00     3196     3918
NIC:FLTR     -0.89      1.12    -3.28     1.17 1.00     3213     4343
FLTR:TYPE     0.25      0.21    -0.15     0.66 1.00     6182     5308

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# comparing models;
loo1 <- loo(fit4.1)
loo2 <- loo(fit4.2)
loo3 <- loo(fit4.3)
loo_compare(loo1, loo2, loo3)

       elpd_diff se_diff
fit4.1  0.0       0.0   
fit4.3 -0.9       1.3   
fit4.2 -1.9       1.3

References

Bastos, Mayara Lisboa, Gamuchirai Tavaziva, Syed Kunal Abidi, Jonathon R Campbell, Louis-Patrick Haraoui, James C Johnston, Zhiyi Lan, et al. 2020. “Diagnostic Accuracy of Serological Tests for Covid-19: Systematic Review and Meta-Analysis.” Bmj 370.