Learn how to fit logistic regression model with brms
Understand how to interpret and report regression posterior results
1. Logistic regression
1.1 Models for Binary Data
For binary outcome Y, we are interested to model the probability of \(\pi_i = P(Y_i=1)\), \(i=1, \ldots, n\)
\[Y_i \mid \pi_i, n_i \sim Bern(\pi_i)\]
\[ E(Y_i \mid \pi_i) = \pi\]
\[logit( \pi_i) = \beta_0 + \beta_1 X_{i1}\]
logistic regression model featuring logit link that connects binary outcome to a conventional linear regression format, is part of the broader class of generalized linear models
Independent observations. This assumption can be modified if working with clustered data.
Linear relationship between continuous covariates and log-odds We can check this assumption by examining marginal plots comparing the model predicted relationship between outcome (log-odds or logit-probability) and each continuous covariates.
Multicollinearity Multicollinearity is the phenomenon when a number of the explanatory variables are strongly correlated.
Correctly specified regression model This means that all relevant predictors for the response variable have been included in the model. This is often difficult to verify, but we can use posterior predictive distribution to check for regression fit. We can also use WAIC and LOO to compare different models.
Comparing to linear regression model we no-longer require the residuals to be normally distributed and homoscedasticity.
1.3 Working example: diagnostic Sensitivity analysis
Each study reports observed positive chemiluminescent immunoassa (CLIA, \(r_i\)) and number having positive reference standard reverse transcriptase polymerase chain reaction (RT-PCR, \(n_i\))
Interest lies in summarizing this collection of values
Family: binomial
Links: mu = logit
Formula: r | trials(n) ~ 1
Data: dat (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.33 0.06 1.21 1.46 1.01 1425 1875
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Model 2. Unpooled model with non-informative prior
fit3 <-brm(r |trials(n) ~1+(1|Study),data = dat,prior =c(prior(normal(0, 5), class = Intercept),prior(student_t(3, 0, 10), class = sd)),family = binomial,seed =123)saveRDS(fit3, "data/chp5_binary3")
Code
fit3 <-readRDS("data/chp5_binary3")print(fit3)
Family: binomial
Links: mu = logit
Formula: r | trials(n) ~ 1 + (1 | Study)
Data: dat (Number of observations: 10)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Group-Level Effects:
~Study (Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.38 0.45 0.77 2.47 1.00 725 1311
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.62 0.45 0.74 2.59 1.01 698 1030
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# define expit function; # transform model estimated log odds to probability scale;expit <-function(x){exp(x)/(1+exp(x))}## 1. Create one row per Study with n = 1 so that## the expected number of successes equals the success-probabilitynewdat <- dat %>%distinct(Study) %>%mutate(n =1) # one Bernoulli trial# 2. Draw posterior expectations that INCLUDE the random interceptdraws <-posterior_epred( fit3,newdata = newdat,re_formula =NULL# keep group-level effects) # summarize posterior probability;prob_summaries <-t(apply(draws, 2, quantile, probs =c(.025, .5, .975))) %>%as.data.frame() %>%setNames(c("median", "lower_95", "upper_95")) %>%mutate(Study = newdat$Study)prob_summaries
median lower_95 upper_95 Study
1 0.7372276 0.8262236 0.8970372 Lin
2 0.9338387 0.9640941 0.9834530 Ma
3 0.5183541 0.5753933 0.6332119 Cai
4 0.6106512 0.7307572 0.8314371 Infantino
5 0.8869331 0.9625789 0.9934809 Zhong
6 0.3354143 0.5192255 0.6983388 Jin
7 0.7578684 0.9154367 0.9854626 Xie
8 0.6438882 0.7064076 0.7646523 Yangchun
9 0.8266485 0.8589106 0.8881886 Qian
10 0.7807128 0.8639924 0.9254228 Lou
both model 2 and model 3 fit the data well. There is no substantial difference in model fit using LOO between model 2 and 3. If we are to select the best model, we will proceed with model 3.
Generalized linear models
We can specify the type of model by two features - Family: type of outcome (binomial, normal, Poisson, gamma) - Link: relationship between mean parameter and predictor (identity, logic, log)
computing posterior predicted probability of event (in this case the sensitivity of the CLIA test)
#posterior draws of log-odds estimate - the intercept;posterior <-posterior_samples(fit1)#posterior predicted probability of event;prob.pooled <-expit(posterior$b_Intercept)mean(prob.pooled)
[1] 0.7914033
quantile(prob.pooled, c(0.025, 0.975))
2.5% 97.5%
0.7711078 0.8119101
comparing prior and posterior distribution visually
data.frame(Probability =c(inv_logit_scaled(posterior[,"b_Intercept"]),inv_logit_scaled(posterior[,"prior_Intercept"])),Type=rep(c("Posterior","Prior"),each=nrow(posterior))) %>%ggplot(aes(x=Probability,fill=Type))+geom_density(size=1,alpha=0.5)+labs(title="Prior: alpha ~ N(0,5); probability = exp(alpha)/(1+exp(alpha))")+theme_bw()+scale_fill_manual(values=c("goldenrod","steelblue"))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
2. Logistic regression with predictors
Adding one predictor, age, for each study in modelling CLIA sensitivity
computing posterior predicted probability of event (in this case the sensitivity of the CLIA test)
fit4 <-readRDS("data/chp5_binary4")#posterior draws of log-odds estimate - the intercept;posterior <-posterior_samples(fit4)# beta posterior summaryexp(mean(posterior$b_age))
[1] 1.070023
exp(quantile(posterior$b_age, c(0.025, 0.975)))
2.5% 97.5%
1.057597 1.083063
#posterior predicted probability testing positive for each study;pp<-posterior_predict(fit4)mean.pp<-colMeans(pp)/dat$nlci.pp<-apply(pp, 2, function(x) quantile(x, 0.025))/dat$nuci.pp<-apply(pp, 2, function(x) quantile(x, 0.975))/dat$npp.table <-data.frame(Study = dat$Study, r = dat$r, n = dat$n,Prob_obs =round(dat$r/dat$n,2),Prob_post =round(mean.pp,2),CI_post =paste0("(",round(lci.pp,2),",",round(uci.pp,2),")"))pp.table %>%datatable(rownames = F,class ="compact",options =list(dom ='t',ordering =FALSE,paging =FALSE,searching =FALSE,columnDefs =list(list(className ='dt-center', targets =0:5))))
2.1 Centring continuous variable
our model says that sensitivity is different across studies according to the average age
Using a centred predictor also reduces correlation between intercept and slope. This can speed up MCMC convergence in some models (computationally more efficient).
comparing posterior distribution between centred and uncentred models
fit4c <-readRDS("data/chp5_binary4c")#posterior draws of log-odds estimate - the intercept;posteriorc <-posterior_samples(fit4c)# beta posterior summaryexp(mean(posteriorc$b_age))
[1] 1.069927
exp(quantile(posteriorc$b_age, c(0.025, 0.975)))
2.5% 97.5%
1.057407 1.082997
posterior predicted probability testing positive for each study comparing between the centred and uncentred models
we can see the results are almost identical and we achieved this with only 5000 iterations for the centred model.
We can investigate the correlation between two posterior regression parameters, \(\alpha\) (intercept) and \(\beta\) (log-OR for age).
The two parameters should be uncorrelated by model assumption! If you see visible correlation, you can consider thinning your MCMC or run more iterations.
p1<-pp_check(fit4c, ndraws =50)p2<-pp_check(fit4c, type ="stat_2d", stat =c("max", "min"))ggarrange(p1,p2, nrow =1)
3 Additional topics on Bayesian logistic regression models
In this example, we will conduct an analysis using a Bayesian logistic regression model.
Data come from a cross-sectional study of 1,225 smokers over the age of 40.
Each participant was assessed for chronic obstructive pulmonary disease (COPD), and characteristics of the type of cigarette they most frequently smoke were recorded.
The objective of the study was to identify associations between COPD diagnosis and cigarette characteristics
We will use the following variables from this data set:
TYPE: Type of cigarette, 1 = Menthol, 0 = Regular
NIC: Nicotine content, in mg
TAR: Tar content, in mg
LEN: Length of cigarette, in mm
FLTR: 1 = Filter, 0 = No Vlter
copd: 1 = COPD diagnosis, 0 = no COPD diagnosis
We fit a simple logistic regression, predicting the risk of Copd by Type of cigarette, Nicotine content, and Filter status.
We specify uninformative prior on log.odds, \(N(0,10)\)
m1_bern <-brm(copd ~ TYPE + NIC + FLTR, data = copd, family =bernoulli(link ="logit"), prior =c(prior(normal(0,10), class ="b"), prior(normal(0,10), class ="Intercept")),seed =123, silent =2, refresh =0)saveRDS(m1_bern, file="data/chp6_logistic")
Family: bernoulli
Links: mu = logit
Formula: copd ~ TYPE + NIC + FLTR
Data: copd (Number of observations: 1225)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -2.94 0.64 -4.20 -1.71 1.00 3108 2483
TYPE 0.25 0.21 -0.19 0.66 1.00 3480 2501
NIC 1.30 0.35 0.61 1.99 1.00 3306 2723
FLTR -0.60 0.44 -1.42 0.29 1.00 3344 2878
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# Directly from the posterior of exp(beta):draws_beta_TYPE <-as.matrix(m1_bern, pars ="b_TYPE")exp_beta_TYPE <-exp(draws_beta_TYPE)quantile(exp_beta_TYPE, probs =c(.025, 0.5, .975))
2.5% 50% 97.5%
0.8286483 1.2832394 1.9293951
plot(m1_bern)
Posterior Distribution + Trace plot combining all chains
Good mixing from trace plot
Rhats’ are all less then 1.01, convergence!
Other diagnostic plots below
mcmc_dens_overlay(m1_bern, pars =c("b_Intercept", "b_TYPE", "b_NIC", "b_FLTR"))
Posterior density plot by chain
mcmc_acf_bar(m1_bern, pars =c("b_Intercept", "b_TYPE", "b_NIC", "b_FLTR"))
Warning: The `facets` argument of `facet_grid()` is deprecated as of ggplot2 2.2.0.
ℹ Please use the `rows` argument instead.
ℹ The deprecated feature was likely used in the bayesplot package.
Please report the issue at <https://github.com/stan-dev/bayesplot/issues/>.
Posterior density plot by chain
3.1 Regression regularization and shrinkage priors
When the number of parameters to be estimated is large relative to sample size, estimation using non-informative or weakly informative priors tend to overfit.
One way to avoid overfitting is to perform regularization, that is, to shrink some of the parameters to closer to zero. This makes the model fit less well to the existing data, but will be much more generalizable to an independent data set.
Sparsity-Inducing Priors
The horseshoe prior is a type of hierarchical prior for regression models by introducing a global scale and local scale parameters on the priors for the regression coefficients.
Family: bernoulli
Links: mu = logit
Formula: copd ~ (NIC + TYPE + FLTR)^2
Data: copd2 (Number of observations: 1225)
Draws: 4 chains, each with iter = 7500; warmup = 5000; thin = 1;
total post-warmup draws = 10000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -4.14 1.64 -7.70 -1.24 1.00 4232 4591
NIC 2.10 1.06 0.24 4.40 1.00 4173 4468
TYPE 0.36 7.00 -13.20 14.35 1.00 5180 4838
FLTR 0.62 1.71 -2.44 4.27 1.00 4082 4299
NIC:TYPE -0.18 0.78 -1.72 1.39 1.00 6997 6004
NIC:FLTR -0.82 1.17 -3.30 1.32 1.00 4112 4156
TYPE:FLTR 0.04 6.99 -13.85 13.62 1.00 5163 4770
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary(fit4.1)
Family: bernoulli
Links: mu = logit
Formula: copd ~ (NIC + TYPE + FLTR)^2
Data: copd2 (Number of observations: 1225)
Draws: 4 chains, each with iter = 7500; warmup = 5000; thin = 1;
total post-warmup draws = 10000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.35 0.48 -4.13 -2.19 1.00 3095 4336
NIC 1.35 0.37 0.58 2.02 1.00 3972 4408
TYPE -0.02 0.39 -1.34 0.46 1.03 174 28
FLTR -0.10 0.29 -0.97 0.18 1.00 2445 1961
NIC:TYPE 0.01 0.15 -0.30 0.35 1.01 988 399
NIC:FLTR -0.04 0.17 -0.50 0.18 1.01 711 5295
TYPE:FLTR 0.09 0.43 -0.22 1.55 1.03 175 29
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Using the informative data-driven (variable selection) shrinkage priors, we can see that TYPE and NIC:TYPE as minimal effect on the risk of COPD. Therefore, we can reduce our model by removing these two fixed effects.
fit4.3<-brm(copd ~ NIC + FLTR + NIC:FLTR + TYPE:FLTR, data = copd2, family =bernoulli(link ="logit"), prior =c(prior(normal(0,10), class ="Intercept"),prior(normal(0,10), class ="b")),iter =7500,warmup =5000,chains =4,cores =getOption("mc.cores", 5), seed =123, silent =2, refresh =0)saveRDS(fit4.3, file="data/chap7_h_example4.3")
Bastos, Mayara Lisboa, Gamuchirai Tavaziva, Syed Kunal Abidi, Jonathon R Campbell, Louis-Patrick Haraoui, James C Johnston, Zhiyi Lan, et al. 2020. “Diagnostic Accuracy of Serological Tests for Covid-19: Systematic Review and Meta-Analysis.”Bmj 370.