A quick LCA example in R using poLCA package - from the package developer

Author

Kuan Liu

2000 National Election Studies survey

  • Survey data from the 2000 American National Election Study.

  • Two sets of six questions with four responses each, asking respondents’ opinions of how well various traits (moral, caring, knowledgable, good leader, dishonest, intelligent) describe presidential candidates Al Gore and George W. Bush.

  • The responses are (1) Extremely well; (2) Quite well; (3) Not too well; (4) Not well at all. Many respondents have varying numbers of missing values on these variables.

  • The data set also includes potential covariates

    • VOTE3, the respondent’s 2000 vote choice (when asked); (1) Gore; (2) Bush; (3) Other.
    • AGE, the respondent’s age;
    • EDUC, the respondent’s level of education; (1) 8 grades or less; (2) 9-11 grades, no further schooling; (3) High school diploma or equivalency; (4) More than 12 years of schooling, no higher degree; (5) Junior or community college level degree; (6) BA level degrees, no advanced degree; (7) Advanced degree.
    • GENDER, the respondent’s gender; (1) Male; (2) Female.
    • and PARTY, the respondent’s Democratic-Republican partisan identification. (1) Strong Democrat; (2) Weak Democrat; (3) Independent-Democrat; (4) Independent-Independent; (5) Independent-Republican; (6) Weak Republican; (7) Strong Republican.
  • A data frame with 1785 observations on 17 survey variables. Of these, 1311 individuals provided responses on all twelve candidate evaluations.

  • Source: The National Election Studies (https://electionstudies.org/). THE 2000 NATIONAL ELECTION STUDY [dataset]. Ann Arbor, MI: University of Michigan, Center for Political Studies

library(poLCA)
library(DT)
library(tidyverse)
library(gtsummary)
library(cardx)
data(election)
datatable(election)
election %>% 
  tbl_summary(
  missing_text = "(Missing)") 
Characteristic N = 1,7851
MORALG
    1 Extremely well 423 (25%)
    2 Quite well 820 (49%)
    3 Not too well 287 (17%)
    4 Not well at all 133 (8.0%)
    (Missing) 122
CARESG
    1 Extremely well 277 (16%)
    2 Quite well 713 (42%)
    3 Not too well 464 (28%)
    4 Not well at all 232 (14%)
    (Missing) 99
KNOWG
    1 Extremely well 461 (27%)
    2 Quite well 997 (58%)
    3 Not too well 212 (12%)
    4 Not well at all 59 (3.4%)
    (Missing) 56
LEADG
    1 Extremely well 258 (15%)
    2 Quite well 728 (43%)
    3 Not too well 522 (31%)
    4 Not well at all 185 (11%)
    (Missing) 92
DISHONG
    1 Extremely well 133 (8.2%)
    2 Quite well 312 (19%)
    3 Not too well 629 (39%)
    4 Not well at all 557 (34%)
    (Missing) 154
INTELG
    1 Extremely well 494 (28%)
    2 Quite well 995 (57%)
    3 Not too well 182 (10%)
    4 Not well at all 65 (3.7%)
    (Missing) 49
MORALB
    1 Extremely well 340 (21%)
    2 Quite well 841 (52%)
    3 Not too well 330 (21%)
    4 Not well at all 98 (6.1%)
    (Missing) 176
CARESB
    1 Extremely well 155 (9.2%)
    2 Quite well 625 (37%)
    3 Not too well 562 (33%)
    4 Not well at all 342 (20%)
    (Missing) 101
KNOWB
    1 Extremely well 274 (16%)
    2 Quite well 933 (54%)
    3 Not too well 379 (22%)
    4 Not well at all 133 (7.7%)
    (Missing) 66
LEADB
    1 Extremely well 266 (16%)
    2 Quite well 842 (50%)
    3 Not too well 407 (24%)
    4 Not well at all 166 (9.9%)
    (Missing) 104
DISHONB
    1 Extremely well 70 (4.4%)
    2 Quite well 288 (18%)
    3 Not too well 653 (41%)
    4 Not well at all 574 (36%)
    (Missing) 200
INTELB
    1 Extremely well 329 (19%)
    2 Quite well 967 (56%)
    3 Not too well 306 (18%)
    4 Not well at all 110 (6.4%)
    (Missing) 73
VOTE3
    1 586 (51%)
    2 529 (46%)
    3 45 (3.9%)
    (Missing) 625
AGE 45 (34, 58)
    (Missing) 9
EDUC
    1 61 (3.4%)
    2 111 (6.2%)
    3 512 (29%)
    4 373 (21%)
    5 167 (9.4%)
    6 372 (21%)
    7 183 (10%)
    (Missing) 6
GENDER
    1 786 (44%)
    2 999 (56%)
PARTY
    1 344 (20%)
    2 272 (15%)
    3 266 (15%)
    4 201 (11%)
    5 230 (13%)
    6 212 (12%)
    7 235 (13%)
    (Missing) 25
1 n (%); Median (IQR)
election2 <- election[complete.cases(election),]
  • Run LCA with 2 clusters
f.party <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG,
                 MORALB,CARESB,KNOWB,LEADB,DISHONB,INTELB)~1
nes.party2 <- poLCA(f.party,
                   election2,
                   nclass=2,
                   verbose=F,
                   graphs = T)

# log-likelihood: -16222.32

nes.party2
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$MORALG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.4334       0.5145         0.0446            0.0075
class 2:            0.1132       0.4533         0.2910            0.1425

$CARESG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.3091       0.5685         0.1047            0.0177
class 2:            0.0280       0.3018         0.4323            0.2379

$KNOWG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.4713       0.5095         0.0093            0.0098
class 2:            0.1397       0.5811         0.2260            0.0532

$LEADG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.2547       0.6116         0.1199            0.0137
class 2:            0.0160       0.2513         0.5105            0.2222

$DISHONG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0204       0.0571         0.4274            0.4951
class 2:            0.1483       0.2992         0.3782            0.1743

$INTELG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.4596       0.4941         0.0338            0.0125
class 2:            0.1371       0.6350         0.1728            0.0551

$MORALB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0788       0.4863         0.3397            0.0952
class 2:            0.3697       0.5439         0.0695            0.0169

$CARESB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0081       0.1238         0.4968            0.3712
class 2:            0.1930       0.5923         0.1898            0.0249

$KNOWB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0554       0.3843         0.3902            0.1701
class 2:            0.2369       0.6756         0.0849            0.0026

$LEADB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0199       0.3457         0.4628            0.1716
class 2:            0.3148       0.6177         0.0577            0.0098

$DISHONB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0471       0.2493         0.5086            0.1950
class 2:            0.0099       0.0846         0.3626            0.5429

$INTELB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0901       0.4196         0.3563             0.134
class 2:            0.2918       0.6639         0.0443             0.000

Estimated class population shares 
 0.4663 0.5337 
 
Predicted class memberships (by modal posterior prob.) 
 0.4705 0.5295 
 
========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 880 
number of estimated parameters: 73 
residual degrees of freedom: 807 
maximum log-likelihood: -11352.91 
 
AIC(2): 22851.82
BIC(2): 23200.76
G^2(2): 11011.59 (Likelihood ratio/deviance statistic) 
X^2(2): 7146792398 (Chi-square goodness of fit) 
 
  • Run LCA with 3 clusters
f.party <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG,
                 MORALB,CARESB,KNOWB,LEADB,DISHONB,INTELB)~1
nes.party3 <- poLCA(f.party,
                   election2,
                   nclass=3,
                   verbose=F,
                   graphs = T)

nes.party3
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$MORALG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.5468       0.4099         0.0321            0.0112
class 2:            0.1134       0.6108         0.2278            0.0479
class 3:            0.1685       0.3582         0.2623            0.2111

$CARESG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.4308       0.4909         0.0637            0.0146
class 2:            0.0260       0.5077         0.3726            0.0937
class 3:            0.0542       0.2182         0.3834            0.3443

$KNOWG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.6552       0.3344         0.0000            0.0104
class 2:            0.0577       0.7852         0.1482            0.0089
class 3:            0.2515       0.4154         0.2347            0.0983

$LEADG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.3593       0.5486         0.0772            0.0150
class 2:            0.0188       0.4459         0.4534            0.0819
class 3:            0.0292       0.2241         0.4224            0.3242

$DISHONG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0238       0.0407         0.3620            0.5735
class 2:            0.0565       0.2020         0.5130            0.2285
class 3:            0.2169       0.3326         0.2666            0.1838

$INTELG
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.6352       0.3464         0.0000            0.0184
class 2:            0.0587       0.7987         0.1343            0.0084
class 3:            0.2475       0.4615         0.1926            0.0984

$MORALB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0945       0.3981         0.3729            0.1345
class 2:            0.0803       0.7354         0.1762            0.0080
class 3:            0.6467       0.3045         0.0175            0.0312

$CARESB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0050       0.0827         0.4375            0.4748
class 2:            0.0000       0.4975         0.4307            0.0718
class 3:            0.3992       0.5170         0.0520            0.0317

$KNOWB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0651       0.2799         0.4086            0.2464
class 2:            0.0149       0.7579         0.2203            0.0069
class 3:            0.4767       0.4935         0.0250            0.0048

$LEADB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0224       0.2591         0.4787            0.2398
class 2:            0.0721       0.6893         0.2197            0.0190
class 3:            0.5295       0.4436         0.0168            0.0102

$DISHONB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.0662       0.2931         0.4430            0.1978
class 2:            0.0051       0.1219         0.5795            0.2935
class 3:            0.0173       0.0699         0.1759            0.7369

$INTELB
          1 Extremely well 2 Quite well 3 Not too well 4 Not well at all
class 1:            0.1189       0.3026         0.3772            0.2013
class 2:            0.0445       0.7848         0.1706            0.0000
class 3:            0.5379       0.4621         0.0000            0.0000

Estimated class population shares 
 0.3105 0.4258 0.2637 
 
Predicted class memberships (by modal posterior prob.) 
 0.308 0.4284 0.2636 
 
========================================================= 
Fit for 3 latent classes: 
========================================================= 
number of observations: 880 
number of estimated parameters: 110 
residual degrees of freedom: 770 
maximum log-likelihood: -10915.77 
 
AIC(3): 22051.54
BIC(3): 22577.33
G^2(3): 10137.3 (Likelihood ratio/deviance statistic) 
X^2(3): 3084652868 (Chi-square goodness of fit) 
 
  • Compare the two LCA models
entropy.R2 <- function(fit) {
  entropy <- function(p) sum(-p * log(p))
  error_prior <- entropy(fit$P) # Class proportions
  error_post <- mean(apply(fit$posterior, 1, entropy), na.rm =T)
  R2_entropy <- (error_prior - error_post) / error_prior
  R2_entropy
}
nes.party2$bic
[1] 23200.76
entropy.R2(nes.party2)
[1] 0.8585686
nes.party3$bic
[1] 22577.33
entropy.R2(nes.party3)
[1] 0.8535891
  • What do you conclude?

  • Interpret LCA results

election2$LCA <- nes.party3$predclass
election2 %>% 
  tbl_summary(
    by = LCA) %>%
  add_overall() %>%
  add_p()
Characteristic Overall, N = 8801 1, N = 2711 2, N = 3771 3, N = 2321 p-value2
MORALG



<0.001
    1 Extremely well 231 (26%) 147 (54%) 43 (11%) 41 (18%)
    2 Quite well 424 (48%) 112 (41%) 230 (61%) 82 (35%)
    3 Not too well 155 (18%) 9 (3.3%) 86 (23%) 60 (26%)
    4 Not well at all 70 (8.0%) 3 (1.1%) 18 (4.8%) 49 (21%)
CARESG



<0.001
    1 Extremely well 140 (16%) 117 (43%) 10 (2.7%) 13 (5.6%)
    2 Quite well 375 (43%) 133 (49%) 192 (51%) 50 (22%)
    3 Not too well 246 (28%) 17 (6.3%) 139 (37%) 90 (39%)
    4 Not well at all 119 (14%) 4 (1.5%) 36 (9.5%) 79 (34%)
KNOWG



<0.001
    1 Extremely well 259 (29%) 178 (66%) 21 (5.6%) 60 (26%)
    2 Quite well 482 (55%) 90 (33%) 298 (79%) 94 (41%)
    3 Not too well 110 (13%) 0 (0%) 56 (15%) 54 (23%)
    4 Not well at all 29 (3.3%) 3 (1.1%) 2 (0.5%) 24 (10%)
LEADG



<0.001
    1 Extremely well 112 (13%) 98 (36%) 7 (1.9%) 7 (3.0%)
    2 Quite well 369 (42%) 148 (55%) 169 (45%) 52 (22%)
    3 Not too well 289 (33%) 21 (7.7%) 171 (45%) 97 (42%)
    4 Not well at all 110 (13%) 4 (1.5%) 30 (8.0%) 76 (33%)
DISHONG



<0.001
    1 Extremely well 78 (8.9%) 6 (2.2%) 21 (5.6%) 51 (22%)
    2 Quite well 164 (19%) 11 (4.1%) 76 (20%) 77 (33%)
    3 Not too well 353 (40%) 100 (37%) 193 (51%) 60 (26%)
    4 Not well at all 285 (32%) 154 (57%) 87 (23%) 44 (19%)
INTELG



<0.001
    1 Extremely well 253 (29%) 172 (63%) 22 (5.8%) 59 (25%)
    2 Quite well 501 (57%) 94 (35%) 302 (80%) 105 (45%)
    3 Not too well 95 (11%) 0 (0%) 50 (13%) 45 (19%)
    4 Not well at all 31 (3.5%) 5 (1.8%) 3 (0.8%) 23 (9.9%)
MORALB



<0.001
    1 Extremely well 206 (23%) 24 (8.9%) 29 (7.7%) 153 (66%)
    2 Quite well 455 (52%) 105 (39%) 282 (75%) 68 (29%)
    3 Not too well 172 (20%) 104 (38%) 64 (17%) 4 (1.7%)
    4 Not well at all 47 (5.3%) 38 (14%) 2 (0.5%) 7 (3.0%)
CARESB



<0.001
    1 Extremely well 94 (11%) 1 (0.4%) 0 (0%) 93 (40%)
    2 Quite well 329 (37%) 19 (7.0%) 189 (50%) 121 (52%)
    3 Not too well 293 (33%) 118 (44%) 164 (44%) 11 (4.7%)
    4 Not well at all 164 (19%) 133 (49%) 24 (6.4%) 7 (3.0%)
KNOWB



<0.001
    1 Extremely well 134 (15%) 16 (5.9%) 5 (1.3%) 113 (49%)
    2 Quite well 475 (54%) 75 (28%) 287 (76%) 113 (49%)
    3 Not too well 200 (23%) 112 (41%) 83 (22%) 5 (2.2%)
    4 Not well at all 71 (8.1%) 68 (25%) 2 (0.5%) 1 (0.4%)
LEADB



<0.001
    1 Extremely well 156 (18%) 6 (2.2%) 28 (7.4%) 122 (53%)
    2 Quite well 432 (49%) 67 (25%) 261 (69%) 104 (45%)
    3 Not too well 217 (25%) 132 (49%) 81 (21%) 4 (1.7%)
    4 Not well at all 75 (8.5%) 66 (24%) 7 (1.9%) 2 (0.9%)
DISHONB



<0.001
    1 Extremely well 24 (2.7%) 18 (6.6%) 2 (0.5%) 4 (1.7%)
    2 Quite well 142 (16%) 79 (29%) 48 (13%) 15 (6.5%)
    3 Not too well 379 (43%) 122 (45%) 218 (58%) 39 (17%)
    4 Not well at all 335 (38%) 52 (19%) 109 (29%) 174 (75%)
INTELB



<0.001
    1 Extremely well 174 (20%) 32 (12%) 15 (4.0%) 127 (55%)
    2 Quite well 484 (55%) 78 (29%) 301 (80%) 105 (45%)
    3 Not too well 167 (19%) 106 (39%) 61 (16%) 0 (0%)
    4 Not well at all 55 (6.3%) 55 (20%) 0 (0%) 0 (0%)
VOTE3



<0.001
    1 430 (49%) 254 (94%) 146 (39%) 30 (13%)
    2 423 (48%) 7 (2.6%) 217 (58%) 199 (86%)
    3 27 (3.1%) 10 (3.7%) 14 (3.7%) 3 (1.3%)
AGE 47 (36, 58) 49 (37, 61) 43 (33, 57) 49 (38, 60) <0.001
EDUC




    1 15 (1.7%) 7 (2.6%) 4 (1.1%) 4 (1.7%)
    2 33 (3.8%) 14 (5.2%) 9 (2.4%) 10 (4.3%)
    3 201 (23%) 55 (20%) 85 (23%) 61 (26%)
    4 187 (21%) 46 (17%) 91 (24%) 50 (22%)
    5 82 (9.3%) 31 (11%) 34 (9.0%) 17 (7.3%)
    6 246 (28%) 74 (27%) 110 (29%) 62 (27%)
    7 116 (13%) 44 (16%) 44 (12%) 28 (12%)
GENDER



0.061
    1 403 (46%) 109 (40%) 187 (50%) 107 (46%)
    2 477 (54%) 162 (60%) 190 (50%) 125 (54%)
PARTY



<0.001
    1 181 (21%) 127 (47%) 41 (11%) 13 (5.6%)
    2 122 (14%) 60 (22%) 50 (13%) 12 (5.2%)
    3 127 (14%) 55 (20%) 59 (16%) 13 (5.6%)
    4 49 (5.6%) 9 (3.3%) 28 (7.4%) 12 (5.2%)
    5 126 (14%) 11 (4.1%) 71 (19%) 44 (19%)
    6 113 (13%) 5 (1.8%) 65 (17%) 43 (19%)
    7 162 (18%) 4 (1.5%) 63 (17%) 95 (41%)
1 n (%); Median (IQR)
2 Pearson’s Chi-squared test; Kruskal-Wallis rank sum test