Bayesian linear regression

Learning Objectives
  • Learn how to specify a Bayesian linear regression model for a continuous outcome
  • Learn how to fit Bayesian linear regression models in brms, including prior specification
  • Learn how to interpret posterior regression coefficients, fitted values, and predictions
  • Learn how to conduct model diagnostics and model comparison
  • Learn the basic ideas behind spline regression and shrinkage priors in the Bayesian framework

1. Linear regression

What is regression? (McElreath 2018)

Regression uses one or more predictor variables to model the distribution of an outcome variable.

In this lecture, we focus only on continuous outcomes, so we will use Bayesian linear regression.

The main ideas are the same as in ordinary linear regression:

  • we model the conditional mean of the outcome using predictors
  • we quantify residual variability around that mean
  • we interpret coefficients as conditional associations, given the other predictors in the model

What changes in the Bayesian framework is that:

  • we place prior distributions on the unknown parameters
  • inference is based on the posterior distribution
  • we can make direct probability statements about parameters and predictions
Bayesian linear regression model

For subject \(i\),

\[ Y_i \mid \mu_i, \sigma \sim N(\mu_i, \sigma) \]

with linear predictor

\[ \mu_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}. \]

Equivalently,

\[ Y_i = \mu_i + e_i, \qquad e_i \sim N(0, \sigma). \]

Interpretation

  • \(\beta_0\) is the intercept
  • \(\beta_j\) is the conditional change in the mean outcome associated with a one-unit increase in predictor \(x_j\), holding the other predictors fixed
  • \(\sigma\) is the residual standard deviation

1.2 Main assumptions of linear regression

  1. Independent observations
  2. Linear or adequately modeled mean relationship between the outcome and predictors
  3. Residuals are approximately normal
  4. Homoscedasticity: residual variation is roughly constant across fitted values
  5. Multicollinearity is not too severe
  6. Model is adequately specified for the main patterns in the data

In practice, Bayesian workflow places strong emphasis on model checking, not only on writing down assumptions.

2. Model specification

2.1 Likelihood and linear predictor

For a continuous outcome, a standard Bayesian linear regression model is

\[ Y_i \mid \mu_i, \sigma \sim N(\mu_i, \sigma) \]

\[ \mu_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}. \]

This model has two main components:

  • the likelihood, which describes the distribution of the outcome given the mean and residual standard deviation
  • the linear predictor, which describes how the mean depends on the predictors

2.2 A group comparison is also a regression model

A two-group comparison is just a special case of linear regression.

If \(x_i = 0\) for group 1 and \(x_i = 1\) for group 2, then

\[ \mu_i = \beta_0 + \beta_1 x_i. \]

  • for group 1, the mean is \(\beta_0\)
  • for group 2, the mean is \(\beta_0 + \beta_1\)
  • so \(\beta_1\) is the mean difference between groups

This is one reason why regression provides a unified framework for many familiar analyses.

3. Running example: Howell data

We will use the Howell data on anthropometric measurements.

Code
dat <- read.table("data/Howell.txt", header = TRUE, sep = ";") %>%
  mutate(male = factor(male, labels = c("female", "male")))

3.1 Descriptive summary

Code
dat %>% 
  select(height, weight, age, male) %>% 
  tbl_summary(
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} / {N} ({p}%)"
    ),
    digits = all_continuous() ~ 2
  )
Characteristic N = 5441
height 138.26 (27.60)
weight 35.61 (14.72)
age 29.34 (20.75)
male
    female 287 / 544 (53%)
    male 257 / 544 (47%)
1 Mean (SD); n / N (%)

3.2 Visual overview

Code
p1 <- ggplot(aes(x = weight, y = height), data = dat) + 
  geom_point(size = 0.7, alpha = 0.7) +
  geom_smooth(se = FALSE)

p2 <- ggplot(aes(x = age, y = height), data = dat) + 
  geom_point(size = 0.7, alpha = 0.7) +
  geom_smooth(se = FALSE)

p3 <- ggplot(aes(x = male, y = height), data = dat) + 
  geom_boxplot()

ggarrange(p1, p2, p3, nrow = 1)

From these plots:

  • height increases with weight
  • the relationship between age and height is not linear
  • there is some difference between males and females

These plots suggest a model such as

\[ height_i \sim N(\mu_i, \sigma) \]

\[ \mu_i = \beta_0 + \beta_1\,weight_i + \beta_2\,age_i + \beta_3\,age_i^2 + \beta_4\,I(male_i = \text{male}). \]

4. Prior specification

For the Howell example, we will use the following weakly informative priors:

\[ \beta_0 \sim N(0, 100) \]

\[ \beta_j \sim N(0, 10), \qquad j=1,2,3,4 \]

\[ \sigma \sim \text{Student-}t(3, 0, 20), \qquad \sigma > 0 \]

These priors are broad enough to let the data dominate, but still give some regularization.

  • the intercept prior is diffuse
  • the slope priors allow both positive and negative associations
  • the prior on \(\sigma\) ensures positivity and allows a wide range of residual standard deviations

4.1 Visualizing the priors

Code
p1 <- tibble(x = seq(from = -350, to = 350, by = 0.2)) %>% 
  ggplot(aes(x = x, y = dnorm(x, mean = 0, sd = 100))) +
  geom_line() +
  scale_x_continuous(breaks = seq(from = -300, to = 300, by = 100)) +
  labs(title = "Prior: N(0, 100)",
       y = "Density",
       x = "Intercept")

p2 <- tibble(x = seq(from = -35, to = 35, by = 0.1)) %>% 
  ggplot(aes(x = x, y = dnorm(x, mean = 0, sd = 10))) +
  geom_line() +
  scale_x_continuous(breaks = seq(from = -35, to = 35, by = 5)) +
  labs(title = "Prior: N(0, 10)",
       y = "Density",
       x = expression(beta))

p3 <- tibble(x = seq(from = 0, to = 100, by = 0.1)) %>% 
  ggplot(aes(x = x, y = 2 * dt(x / 20, df = 3) / 20)) +
  geom_line() +
  labs(title = "Prior: Student-t(3, 0, 20)",
       y = "Density",
       x = expression(sigma))

ggarrange(p1, p2, p3, nrow = 1)

4.2 Checking parameter classes in brms

Code
get_prior(
  height ~ weight + age + I(age^2) + male,
  data = dat,
  family = gaussian()
)
                   prior     class     coef group resp dpar nlpar lb ub tag
                  (flat)         b                                         
                  (flat)         b      age                                
                  (flat)         b   IageE2                                
                  (flat)         b malemale                                
                  (flat)         b   weight                                
 student_t(3, 148.6, 16) Intercept                                         
     student_t(3, 0, 16)     sigma                                 0       
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
      default
      default

5. Model implementation in brms

We now fit the Bayesian linear regression model in brms.

Code
fit1 <- brm(
  data = dat,
  family = gaussian(),
  height ~ weight + age + I(age^2) + male,
  prior = c(
    prior(normal(0, 100), class = "Intercept"),
    prior(normal(0, 10), class = "b"),
    prior(student_t(3, 0, 20), class = "sigma")
  ),
  iter = 10000,
  warmup = 5000,
  chains = 4,
  cores = 4,
  seed = 123,
  refresh = 0,
  silent = 2
)

saveRDS(fit1, "data/chp4_reg_example")

5.1 What the brms formula means

The formula

height ~ weight + age + I(age^2) + male

means that:

  • height is the continuous outcome
  • the expected mean height depends on weight, age, age squared, and sex
  • I(age^2) adds a quadratic age term
  • male is treated as a factor, and brms creates the necessary indicator variable automatically

6. Posterior results and interpretation

6.1 Posterior summaries

Code
fit1 <- readRDS("data/chp4_reg_example")
summary(fit1)
 Family: gaussian 
  Links: mu = identity 
Formula: height ~ weight + age + I(age^2) + male 
   Data: dat (Number of observations: 544) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    75.52      1.00    73.54    77.48 1.00    22865    17295
weight        1.26      0.06     1.15     1.37 1.00     8032    11579
age           1.06      0.11     0.84     1.29 1.00     7384    10434
IageE2       -0.01      0.00    -0.01    -0.01 1.00     7736    11039
malemale      1.86      0.79     0.29     3.39 1.00    12359    12021

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     8.70      0.27     8.19     9.24 1.00    14863    13046

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Using posterior means, the fitted regression equation can be written approximately as

\[ \widehat{height}_i = \beta_0 + \beta_1\,weight_i + \beta_2\,age_i + \beta_3\,age_i^2 + \beta_4\,I(male_i = \text{male}). \]

Each coefficient is interpreted conditionally on the other predictors.

For example:

  • the coefficient of weight describes the expected change in height for a one-unit increase in weight, holding age and sex fixed
  • the coefficient of male describes the expected height difference between males and females of the same age and weight
  • the age effect is nonlinear because both age and age^2 are included

6.2 Posterior probability statements

What is the posterior evidence that males are taller than females, adjusting for age and weight?

Code
hypothesis(fit1, "malemale > 0", alpha = 0.025)
Hypothesis Tests for class b:
      Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (malemale) > 0     1.86      0.79     0.29     3.39     104.82      0.99    *
---
'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

What is the posterior evidence that males are taller than females by 2 cm, adjusting for age and weight?

Code
hypothesis(fit1, "malemale > 2", alpha = 0.025)
Hypothesis Tests for class b:
          Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
1 (malemale)-(2) > 0    -0.14      0.79    -1.71     1.39       0.74      0.43
  Star
1     
---
'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

This is a nice example of a Bayesian probability statement that is more direct than a classical p-value.

6.3 Conditional effects plots

Code
plot(conditional_effects(fit1, "weight"), points = TRUE, point_args = list(size = 0.5))

Code
plot(conditional_effects(fit1, "age"), points = TRUE, point_args = list(size = 0.5))

Code
plot(conditional_effects(fit1, "male"))

These plots are often easier to interpret than the coefficient table alone.

  • they show the fitted mean relationship between each predictor and the outcome
  • they display posterior uncertainty using interval bands
  • they help check whether linear terms appear adequate

6.4 Bayesian \(R^2\)

Code
bayes_R2(fit1)
    Estimate   Est.Error      Q2.5     Q97.5
R2 0.9011375 0.002571426 0.8955066 0.9055716

Bayesian \(R^2\) summarizes how much of the variation in the outcome is explained by the model, while accounting for posterior uncertainty.

6.5 Fitted means vs predictions for a new individual

Suppose we want to predict height for a female who is 33 years old and weighs 60 kg.

Posterior for the expected mean height

Code
mean_pred <- posterior_epred(
  fit1,
  newdata = data.frame(weight = 50, age = 33, male = "female")
)

mean(mean_pred)
[1] 161.6473
Code
posterior_interval(mean_pred, prob = 0.95)
         2.5%    97.5%
[1,] 160.2126 163.0788

Posterior predictive distribution for a new observed height

Code
ind_predic <- posterior_predict(
  fit1,
  newdata = data.frame(weight = 50, age = 33, male = "female")
)

mean(ind_predic)
[1] 161.6597
Code
posterior_interval(ind_predic, prob = 0.95)
         2.5%    97.5%
[1,] 144.4122 178.8027
Code
mean(ind_predic > 160) #P(ind_predic > 160);
[1] 0.57235
Fitted value vs posterior prediction
  • posterior_epred() gives the posterior distribution of the mean outcome
  • posterior_predict() gives the posterior predictive distribution for a new observed outcome

The predictive interval is wider because it includes both parameter uncertainty and residual variation.

7. Model diagnostics

Bayesian workflow puts strong emphasis on checking both:

  • whether the MCMC algorithm converged properly
  • whether the model fits the data adequately

7.1 MCMC convergence

Code
plot(fit1)

Useful diagnostics include:

  • trace plots
  • posterior density by chain
  • \(\hat{R}\) values close to 1
  • adequate effective sample size (ESS)
    • ESS compares your autocorrelated MCMC draws to an equivalent number of independent posterior draws.
    • e.g., ???How many independent posterior samples would give me roughly the same Monte Carlo precision as these MCMC samples????
    • The Bulk_ESS and Tail_ESS values are all very large, far above common practical thresholds such as several hundred effective draws. This means that posterior intervals and tail summaries are estimated with little Monte Carlo error. Good Convergence!
Code
summary(fit1)$fixed[, c("Rhat", "Bulk_ESS", "Tail_ESS")]
              Rhat  Bulk_ESS Tail_ESS
Intercept 1.000293 22865.009 17294.68
weight    1.000921  8031.831 11579.04
age       1.001066  7384.440 10434.08
IageE2    1.000847  7735.917 11038.66
malemale  1.000326 12359.105 12020.54
Code
summary(fit1)$spec_pars[, c("Rhat", "Bulk_ESS", "Tail_ESS")]
          Rhat Bulk_ESS Tail_ESS
sigma 1.000058 14863.04 13046.32

7.2 Posterior predictive checks

A good Bayesian model should generate replicated data that resemble the observed data.

Code
pp_check(fit1, ndraws = 50)

Code
pp_check(fit1, type = "stat_2d", stat = c("max", "min"))

Questions to ask:

  • Does the model reproduce the overall shape of the data?
  • Does it reproduce the range of the data?
  • Are there obvious discrepancies between observed and replicated outcomes?

7.3 Residual plot

Code
res_df <- fit1$data %>% 
  mutate(
    predict_y = predict(fit1)[, "Estimate"], 
    std_resid = residuals(fit1, type = "pearson")[, "Estimate"]
  )

ggplot(res_df, aes(predict_y, std_resid)) + 
  labs(y = "Standardized residuals", x = "Fitted mean") +
  geom_point(size = 0.8) + 
  stat_smooth(se = FALSE) + 
  theme_bw()

This plot helps assess:

  • remaining nonlinearity
  • heteroscedasticity
  • gross lack of fit

7.4 Posterior dependence among coefficients

Code
pairs(
  fit1,
  off_diag_args = list(size = 0.5, alpha = 0.25)
)

If some coefficients are strongly correlated in the posterior, this may indicate:

  • multicollinearity
  • weak identification
  • the need for stronger priors
  • possible simplification of the model

8. Model comparison

We often compare multiple plausible regression models rather than assuming one model is automatically correct.

8.1 Example: add a quadratic term for weight

We compare two models:

  • Model 1: height depends on weight, age, age\(^2\), and sex
  • Model 2: same as Model 1, but also includes weight\(^2\)
Code
fit2 <- brm(
  data = dat,
  family = gaussian(),
  height ~ weight + I(weight^2) + age + I(age^2) + male,
  prior = c(
    prior(normal(0, 100), class = "Intercept"),
    prior(normal(0, 10), class = "b"),
    prior(student_t(3, 0, 20), class = "sigma")
  ),
  iter = 10000,
  warmup = 5000,
  chains = 4,
  cores = 4,
  seed = 123,
  refresh = 0,
  silent = 2
)
saveRDS(fit2, "data/chp4_height2")

8.2 WAIC and LOO

Two common Bayesian model comparison tools are:

  • WAIC: Watanabe-Akaike Information Criterion, a smaller value indicates a better-fitting model with higher predictive accuracy.
  • LOO: leave-one-out cross-validation

Both are based on predictive performance.

Code
fit2 <- readRDS("data/chp4_height2")

waic1 <- waic(fit1)
waic2 <- waic(fit2)
compare_ic(waic1, waic2)
               WAIC    SE
fit1        3902.84 35.22
fit2        3383.70 38.88
fit1 - fit2  519.14 40.30
Code
loo1 <- loo(fit1)
loo2 <- loo(fit2)
loo_compare(loo1, loo2)
     elpd_diff se_diff
fit2    0.0       0.0 
fit1 -259.6      20.1 
Interpreting LOO and ELPD
  • Larger ELPD means better estimated predictive performance
  • If the ELPD difference is very small, the models have very similar predictive performance
  • A more complex model should only be preferred if it gives a meaningful predictive improvement

Which model has better fit in this case?

8.3 Posterior predictive comparison

Code
pp_check(fit2, ndraws = 50)

Model comparison is not only about a numerical criterion. We should also ask whether the more flexible model improves fit in a meaningful way.

9. Extension 1: Bayesian splines

9.1 Why splines?

A quadratic term allows some curvature, but it imposes a specific shape.

A spline is more flexible. Instead of forcing a straight line or a quadratic curve, we allow the data to inform a smooth function.

In brms, this is specified using s(age).

9.2 Fit a spline model in brms

Code
fit_spline <- brm(
  data = dat,
  family = gaussian(),
  height ~ weight + s(age) + male,
  prior = c(
    prior(normal(0, 100), class = "Intercept"),
    prior(normal(0, 10), class = "b"),
    prior(student_t(3, 0, 20), class = "sigma")
  ),
  iter = 10000,
  warmup = 5000,
  chains = 4,
  cores = 4,
  seed = 123,
  refresh = 0,
  silent = 2
)
saveRDS(fit_spline, "data/chp4_heightsp")

9.3 Visualize the smooth age effect

Code
fit_spline <- readRDS("data/chp4_heightsp")
plot(conditional_effects(fit_spline, "age"), points = TRUE, point_args = list(size = 0.5))

9.4 Compare spline and polynomial models

Code
loo_spline <- loo(fit_spline)
loo_compare(loo2, loo_spline)
           elpd_diff se_diff
fit_spline   0.0       0.0  
fit2       -13.9      20.4  
Code
pp_check(fit_spline, ndraws = 50)

Key message about splines

Splines are useful when:

  • the mean relationship is clearly nonlinear
  • a straight line is too restrictive
  • a polynomial is too rigid or unstable

Spline effects are usually interpreted graphically, not coefficient by coefficient.

10. Extension 2: Bayesian lasso and shrinkage priors

10.1 Why shrinkage?

When the number of predictors is large relative to the sample size, weak, standard priors can lead to:

  • unstable coefficient estimates
  • overfitting
  • poor out-of-sample prediction

Shrinkage priors pull many coefficients toward zero unless the data strongly support a larger effect.

10.2 Bayesian lasso as an idea

The Bayesian lasso uses a Laplace prior on each regression coefficient:

\[ \beta_j \sim \text{Laplace}(0, \tau). \]

Compared with a normal prior, the Laplace prior places more mass near zero, so smaller coefficients are shrunk more strongly.

10.3 Visual intuition: normal vs Laplace prior

Practical note for brms

In current brms workflow, shrinkage is commonly implemented using priors such as the horseshoe prior.

So in this lecture:

  • we introduce Bayesian lasso as the main shrinkage idea
  • we implement shrinkage in brms using the horseshoe prior, which is convenient and effective in practice

10.4 Simulated example with a continuous outcome

To illustrate shrinkage, consider a linear regression with many candidate predictors, but only a few truly matter.

Code
set.seed(123)

n <- 150
p <- 10

X <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(X) <- paste0("x", 1:p)

sim_dat <- as_tibble(X) %>%
  mutate(y = 2 * x1 - 1.5 * x3 + 1.2 * x7 + rnorm(n, sd = 2))

10.5 Compare standard priors vs shrinkage priors

Code
fit_full <- brm(
  y ~ .,
  data = sim_dat,
  family = gaussian(),
  prior = c(
    prior(normal(0, 5), class = "Intercept"),
    prior(normal(0, 2), class = "b"),
    prior(student_t(3, 0, 5), class = "sigma")
  ),
  iter = 5000,
  warmup = 2500,
  chains = 4,
  cores = 4,
  seed = 123,
  refresh = 0
)

fit_hs <- brm(
  y ~ .,
  data = sim_dat,
  family = gaussian(),
  prior = c(
    prior(normal(0, 5), class = "Intercept"),
    prior(horseshoe(par_ratio = 3 / 10), class = "b"),
    prior(student_t(3, 0, 5), class = "sigma")
  ),
  iter = 5000,
  warmup = 2500,
  chains = 4,
  cores = 4,
  seed = 123,
  refresh = 0
)
saveRDS(fit_full, "data/chp4_vs_full")
saveRDS(fit_hs, "data/chp4_vs_hs")

10.6 Compare posterior coefficient summaries

Code
fit_full <- readRDS("data/chp4_vs_full")
fit_hs <- readRDS("data/chp4_vs_hs")

coef_full <- as.data.frame(fixef(fit_full)) %>%
  rownames_to_column("term") %>%
  filter(term != "Intercept") %>%
  transmute(term, estimate = Estimate, lower = Q2.5, upper = Q97.5, model = "regular prior")

coef_hs <- as.data.frame(fixef(fit_hs)) %>%
  rownames_to_column("term") %>%
  filter(term != "Intercept") %>%
  transmute(term, estimate = Estimate, lower = Q2.5, upper = Q97.5, model = "Horseshoe prior")

coef_plot <- bind_rows(coef_full, coef_hs) %>%
  mutate(term = fct_reorder(term, estimate))

ggplot(coef_plot, aes(x = estimate, y = term, xmin = lower, xmax = upper, colour = model)) +
  geom_point(position = position_dodge(width = 0.5)) +
  geom_errorbarh(position = position_dodge(width = 0.5), height = 0) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
  labs(x = "Posterior estimate", y = NULL, colour = NULL)

10.7 Compare predictive performance

Code
loo_full <- loo(fit_full)
loo_hs   <- loo(fit_hs)
loo_compare(loo_full, loo_hs)
         elpd_diff se_diff
fit_hs    0.0       0.0   
fit_full -1.7       1.7   

The shrinkage model may improve out-of-sample prediction by shrinking noise coefficients more aggressively toward zero.

10.8 More on Bayesian variable selection

  • Projection predictive variable selection is probably the most principled approach if we want a sparse model for interpretability or prediction.
  • The projpred package implements this ??? it projects the full posterior onto submodels and finds the minimal set of variables that preserves predictive performance without refitting in the model. (Package website)[https://mc-stan.org/projpred/index.html] (Piironen et al. 2025)
Code
h <- c("x1 > 0","x2 > 0","x3 > 0","x4 > 0","x5 > 0","x6 > 0","x7 > 0","x8 > 0","x9 > 0","x10 > 0")
hypothesis(fit_hs, h)  # P(effect exceeds MCID)
Hypothesis Tests for class b:
   Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1    (x1) > 0     1.81      0.17     1.52     2.10        Inf      1.00    *
2    (x2) > 0    -0.09      0.14    -0.36     0.08       0.36      0.27     
3    (x3) > 0    -1.30      0.16    -1.58    -1.03       0.00      0.00     
4    (x4) > 0    -0.05      0.12    -0.27     0.12       0.55      0.35     
5    (x5) > 0    -0.02      0.10    -0.21     0.14       0.73      0.42     
6    (x6) > 0    -0.05      0.11    -0.27     0.10       0.49      0.33     
7    (x7) > 0     1.45      0.16     1.20     1.72        Inf      1.00    *
8    (x8) > 0     0.12      0.14    -0.05     0.40       4.29      0.81     
9    (x9) > 0    -0.15      0.15    -0.44     0.04       0.19      0.16     
10  (x10) > 0    -0.03      0.11    -0.23     0.14       0.72      0.42     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
Code
library(projpred)

ncores = 10
refm_obj <- get_refmodel(fit_hs)

# via cross-validation;
cv_fits <- run_cvfun(refm_obj, k=4)

# For running projpred's CV in parallel (see cv_varsel()'s argument `parallel`):
# Note: Parallel processing is disabled during package building to avoid issues
use_parallel <- FALSE  # Set to TRUE for actual parallel processing
if (use_parallel) {
  doParallel::registerDoParallel(ncores)
}
# Final cv_varsel() run:
cvvs <- cv_varsel(
  refm_obj,
  cv_method = "kfold",
  cvfits = cv_fits,
  ### for the sake of speed (not recommended in general):
  method = "L1",
  nclusters_pred = 10,
  ###
  nterms_max = 8,
  parallel = use_parallel,
  ### In interactive use, we recommend not to deactivate the verbose mode:
  verbose = 0
  ### 
)
# Tear down the CV parallelization setup:
if (use_parallel) {
  doParallel::stopImplicitCluster()
  foreach::registerDoSEQ()
}

options(projpred.plot_vsel_show_cv_proportions = TRUE)
plot(cvvs, stats = "mlpd", deltas = TRUE)

Code
suggest_size(cvvs, stat = "mlpd") # suggested minimal variable set;
[1] 3
Code
ranking(cvvs) #ranking of the top predictors across validation set;
$fulldata
[1] "x1" "x3" "x7" "x9" "x8" "x2" "x6" "x4"

$foldwise
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "x1" "x3" "x7" "x9" "x8" "x2" "x4" "x6"
[2,] "x1" "x7" "x3" "x2" "x9" "x5" "x4" "x8"
[3,] "x1" "x7" "x3" "x4" "x8" "x2" "x9" "x5"
[4,] "x1" "x3" "x7" "x9" "x8" "x2" "x5" "x4"
[5,] "x1" "x7" "x3" "x6" "x9" "x8" "x2" "x4"

attr(,"class")
[1] "ranking"

References

McElreath, Richard. 2018. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. Chapman; Hall/CRC.
Piironen, Juho, Markus Paasiniemi, Alejandro Catalina, Frank Weber, Osvaldo Martin, and Aki Vehtari. 2025. projpred: Projection Predictive Feature Selection.” https://mc-stan.org/projpred/.