7.1 Group Invariance (Quantitative)#
Measurement invariance across groups for quantitative item responses#
# General imports
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.float_format', '{:.6f}'.format) # Set float format for pandas display
# Rpy2 imports
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri, numpy2ri
from rpy2.robjects.packages import importr
# Automatic conversion of arrays and dataframes
pandas2ri.activate()
numpy2ri.activate()
# Set random seed for reproducibility
ro.r('set.seed(123)')
# Ipython extenrsion for magix plotting
%load_ext rpy2.ipython
# R imports
importr('base')
importr('lavaan')
importr('semTools')
importr('MPsychoR')
importr('lordif')
importr('Hmisc')
The rpy2.ipython extension is already loaded. To reload it, use:
%reload_ext rpy2.ipython
rpy2.robjects.packages.Package as a <module 'Hmisc'>
Measurement invariance means that we measure the same psychological construct across groups and time (see lectures 8 & 9 for examples). In the following RMD we will look at how we can assess measurement invariance across…
groups for quantitative item responses
groups for qualitative item responses
time for quantitative responses
The four most important measurement invariance structures are the following:
Configural invariance: same factor structure but unrestricted loadings and intercepts across groups.
Metric invariance: the factor structure and the loadings are constrained to be equal across groups; the intercepts are free. (In the book this is referred to as Weak invariance)
Scale invariance: the factor structure, the loadings and the indicator intercepts are restricted to be equal across groups. (In the book this is referred to as Strong invariance)
Strict invariance: the factor structure, the loadings, the indicator intercepts and the reliabilities are restricted to be equal across groups.
Note. In the book there is is also a model in which, in addition to the indicator intercepts, the means of the latent variables are restricted to be equal as well, however we will not cover this here.
We now load and inspect the Bergh
dataset found within the lavaan
package from R.
Load, prepare and inspect the dataset#
ro.r('data("Bergh")') # Load the Bergh dataset from R
Bergh = pandas2ri.rpy2py(ro.globalenv['Bergh']) # Convert the R dataframe to a pandas dataframe
Bergh.head()
EP | SP | HP | DP | A1 | A2 | A3 | O1 | O2 | O3 | gender | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2.666667 | 3.125000 | 1.400000 | 2.818182 | 3.437500 | 3.600000 | 3.352941 | 2.875000 | 3.400000 | 3.176471 | 1 |
2 | 2.666667 | 3.250000 | 1.400000 | 2.545455 | 2.312500 | 2.666667 | 3.117647 | 4.437500 | 3.866667 | 4.470588 | 1 |
3 | 1.000000 | 1.625000 | 2.700000 | 2.000000 | 3.562500 | 4.600000 | 3.941176 | 4.250000 | 3.666667 | 3.705882 | 1 |
4 | 2.666667 | 2.750000 | 1.800000 | 2.818182 | 2.750000 | 3.200000 | 3.352941 | 2.875000 | 3.400000 | 3.117647 | 1 |
5 | 2.888889 | 3.250000 | 2.700000 | 3.000000 | 3.250000 | 4.200000 | 3.764706 | 3.937500 | 4.400000 | 4.294118 | 1 |
The dataset#
We illustrate today’s topic using a dataset from Bergh et al. (2016).
In the study by Bergh et al. (2016), a multigroup confirmatory factor analysis (CFA) was conducted to assess whether a generalized prejudice factor (GP) could explain shared variance across four prejudice domains:
Ethnic prejudice (EP)
Sexism (SP)
Sexual prejudice against non-heterosexuals (HP)
Prejudice toward people with disabilities (DP)
The multigroup aspect comes in by exploring differences in CFA parameters across gender.
Fit the models#
Luckily, we do not have to specify all the models mentioned above by ourselves. Instead, we can use the measurementInvariance()
function from the semTools
package. We only need to specify few elements:
) The model equation (i.e. which items we want to include),
) The dataset,
) The grouping variable and
) The estimator (We use a robust ML estimator since some of the prejudice measures deviate from normality).
Note. strict = T
gives us a configural, metric (weak), scale (strong), strict and a fifth model. With strict = F
we will not get the strict model.
ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'") # Define the measurement model for General Personality (GP)
ro.r("""
minvfit <- measurementInvariance( # Function call: measurementInvariance()
model = GP_model, # 1. Pass the measurement model
data = Bergh, # 2. Pass the dataset
group = "gender", # 3. Specify the grouping variable
estimator = "MLR", # 4. Maximum Likelihood with Robust standard errors
strict = T) # Get strict model
""")
Measurement invariance models:
Model 1 : fit.configural
Model 2 : fit.loadings
Model 3 : fit.intercepts
Model 4 : fit.residuals
Model 5 : fit.means
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->lavTestLRT():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.configural 4 7376.1 7490.2 1.6552
fit.loadings 7 7382.9 7482.8 14.4960 11.834 3 0.007972 **
fit.intercepts 10 7417.7 7503.3 55.2875 47.833 3 2.311e-10 ***
fit.residuals 14 7429.8 7496.4 75.3689 12.468 4 0.014189 *
fit.means 15 7482.4 7544.3 130.0375 49.267 1 2.234e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Fit measures:
cfi.scaled rmsea.scaled cfi.scaled.delta rmsea.scaled.delta
fit.configural 1.000 0.000 NA NA
fit.loadings 0.988 0.046 0.012 0.046
fit.intercepts 0.914 0.102 0.074 0.056
fit.residuals 0.904 0.091 0.010 0.011
fit.means 0.816 0.122 0.088 0.031
R[write to console]: In addition:
R[write to console]: Warning message:
R[write to console]: In measurementInvariance(model = GP_model, data = Bergh, group = "gender", :
R[write to console]:
R[write to console]: The measurementInvariance function is deprecated, and it will cease to be included in future versions of semTools. See help('semTools-deprecated) for details.
This code chunk fits four invariance models: configural, metric (weak), scale (strong), strict and a fifth model where, in addition to the indicator intercepts, the means of the latent variables are restricted to be equal as well. As mentioned above, we will not deal with the last model.
Each model is tested against the previous one, and a significant result indicates that the higher restricted model is significantly worse than the previous model. In our example, only configural variance holds as restricting the model parameters to match the assumptions of metric invariance results in a significantly worse fit.
Select the best fitting model#
In our example we would go with the configural model, for which the output can be requested as follows:
# Extract only the configural model
ro.r("Fit_conf <- minvfit$fit.configural")
We now investigate the Fit_conf
using summary()
. Remember to set standardized = TRUE
& fit.measures = TRUE
.
summary_conf = ro.r("summary(Fit_conf, standardized = TRUE, fit.measures = TRUE)")
print(summary_conf)
lavaan 0.6-19 ended normally after 45 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 24
Number of observations per group:
male 249
female 612
Model Test User Model:
Standard Scaled
Test Statistic 1.655 1.549
Degrees of freedom 4 4
P-value (Chi-square) 0.799 0.818
Scaling correction factor 1.068
Yuan-Bentler correction (Mplus variant)
Test statistic for each group:
male 0.348 0.348
female 1.201 1.201
Model Test Baseline Model:
Test statistic 716.580 530.342
Degrees of freedom 12 12
P-value 0.000 0.000
Scaling correction factor 1.351
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000 1.000
Tucker-Lewis Index (TLI) 1.010 1.014
Robust Comparative Fit Index (CFI) 1.000
Robust Tucker-Lewis Index (TLI) 1.011
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3664.027 -3664.027
Scaling correction factor 1.284
for the MLR correction
Loglikelihood unrestricted model (H1) -3663.200 -3663.200
Scaling correction factor 1.253
for the MLR correction
Akaike (AIC) 7376.055 7376.055
Bayesian (BIC) 7490.249 7490.249
Sample-size adjusted Bayesian (SABIC) 7414.031 7414.031
Root Mean Square Error of Approximation:
RMSEA 0.000 0.000
90 Percent confidence interval - lower 0.000 0.000
90 Percent confidence interval - upper 0.046 0.042
P-value H_0: RMSEA <= 0.050 0.961 0.971
P-value H_0: RMSEA >= 0.080 0.003 0.002
Robust RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.046
P-value H_0: Robust RMSEA <= 0.050 0.961
P-value H_0: Robust RMSEA >= 0.080 0.003
Standardized Root Mean Square Residual:
SRMR 0.008 0.008
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Group 1 [male]:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
GP =~
EP 1.000 0.489 0.717
HP 1.452 0.289 5.024 0.000 0.709 0.387
DP 0.766 0.100 7.670 0.000 0.375 0.685
SP 1.137 0.153 7.409 0.000 0.556 0.805
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.EP 2.141 0.043 49.594 0.000 2.141 3.143
.HP 1.379 0.116 11.883 0.000 1.379 0.753
.DP 2.204 0.035 63.623 0.000 2.204 4.032
.SP 2.466 0.044 56.376 0.000 2.466 3.573
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.EP 0.225 0.041 5.435 0.000 0.225 0.486
.HP 2.851 0.362 7.873 0.000 2.851 0.850
.DP 0.159 0.022 7.197 0.000 0.159 0.531
.SP 0.168 0.031 5.341 0.000 0.168 0.352
GP 0.239 0.044 5.450 0.000 1.000 1.000
Group 2 [female]:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
GP =~
EP 1.000 0.536 0.752
HP 0.771 0.126 6.137 0.000 0.413 0.289
DP 0.691 0.054 12.798 0.000 0.371 0.718
SP 0.774 0.059 13.104 0.000 0.415 0.661
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.EP 1.927 0.029 66.895 0.000 1.927 2.704
.HP 1.150 0.058 19.893 0.000 1.150 0.804
.DP 1.999 0.021 95.793 0.000 1.999 3.872
.SP 1.972 0.025 77.676 0.000 1.972 3.140
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.EP 0.221 0.029 7.526 0.000 0.221 0.434
.HP 1.874 0.184 10.200 0.000 1.874 0.917
.DP 0.129 0.017 7.420 0.000 0.129 0.484
.SP 0.222 0.019 11.533 0.000 0.222 0.563
GP 0.287 0.034 8.357 0.000 1.000 1.000
Model Fit Summary#
The model shows excellent fit in both gender groups:
Chi-square p-values non-significant (male: 0.348, female: 1.201)
CFI/TLI ≥ 1.00, RMSEA = 0.000, SRMR = 0.008
For a recap on fit indexes see Model Fit from the previous session.
Differences in Factor Loadings#
HP loading is notably higher in males (1.452 / 0.387 standardized) than in females (0.771 / 0.289)
Suggests that HP contributes more strongly to GP for males
Interpretation: The latent factor GP is expressed somewhat differently by gender, especially due to the role of HP.
Differences in Intercepts#
Intercepts for all indicators are lower in females than in males
Implies potential mean differences in observed scores, with females generally scoring lower on prejudice indicators when controlling for GP
However, these differences should not be interpreted directly unless scalar invariance is confirmed.
Yes, but why?
You can only compare means of latent variables across groups (e.g., gender) if scalar invariance holds. We saw before that scalar invariance means:
Factor loadings are equal across groups (metric invariance), and
Intercepts of the observed variables are also equal across groups.
If scalar invariance holds, it implies that the measurement model behaves the same way across groups, so:
A 1-unit change in the latent variable produces the same observed value across groups
Any mean differences in observed variables reflect true differences in the latent trait, not measurement bias
If scalar invariance doesn’t hold…
Differences in intercepts could reflect bias, group-specific interpretations, or item functioning
As a result, you can’t validly interpret observed or latent mean differences
e.g., Females scoring lower on SP might reflect differences in how SP is measured, not true lower prejudice
Residual Variances and Reliability#
Residual variance for HP is higher in females (0.917) than males (0.850)
Suggests lower measurement reliability for HP in the female group
Preliminary Conclusion#
The model fits well in both gender groups
But key factor loadings and intercepts differ, particularly for HP
These findings suggest partial measurement invariance
Another Option:#
Instead of using these generic invariance sequences of models, we can approach multigroup CFA from a more hypothesis-driven direction. In the first model we are going to fit, the loadings for ethnic minorities (EP), non-heterosexuals (HP), and people with disabilities (DP) are held equal across men and women. We keep the sexism (SP) loadings free. Note that all intercepts are free to vary.
In lavaan
, this can be achieved by using one of the following two equivalent model formulations.
In the first variant, we explicitly incorporate the loadings
restrictions in the model formulation. We specify a vector of length 2, the first
element denoting the loading for the first gender category and the second element
denoting the loading for the second category. Since we use the same loadings
symbols for both categories, they are restricted to be equal.
Option 1 to specify the model#
# First option to specify the model
ro.r("GP_model <-'GP =~ c(v1,v1)*EP + c(v2,v2)*HP + c(v3,v3)*DP + SP'") # The loading of SP is not fixed.
ro.r('fitBase <-lavaan::cfa(GP_model, data = Bergh, group = "gender", estimator = "MLR")')
ro.r("fitMeasures(fitBase)")
array([ 2.20000000e+01, 4.25165558e-03, 7.32135091e+00, 6.00000000e+00,
2.92148336e-01, 6.85411314e+00, 6.00000000e+00, 3.34549818e-01,
1.06816896e+00, 7.16580288e+02, 1.20000000e+01, 0.00000000e+00,
5.30342427e+02, 1.20000000e+01, 0.00000000e+00, 1.35116530e+00,
9.98124627e-01, 9.96249254e-01, 9.98352222e-01, 9.96704444e-01,
9.98697343e-01, 9.97394686e-01, 9.96249254e-01, 9.79565860e-01,
9.89782930e-01, 4.94891465e-01, 9.98140462e-01, 9.98124627e-01,
9.96704444e-01, 9.74152122e-01, 9.87076061e-01, 4.93538031e-01,
9.98371078e-01, 9.98352222e-01, 9.97394686e-01, 9.98697343e-01,
-3.66686045e+03, -3.66319978e+03, 7.37772090e+03, 7.48239898e+03,
8.61000000e+02, 7.41253284e+03, 1.25325348e+00, 1.19508682e+00,
2.26176188e-02, 0.00000000e+00, 6.94429738e-02, 9.00000000e-01,
7.89925895e-01, 5.00000000e-02, 1.69311900e-02, 8.00000000e-02,
1.81842544e-02, 0.00000000e+00, 6.57201349e-02, 8.29630172e-01,
1.05640614e-02, 1.87938378e-02, 0.00000000e+00, 6.94032961e-02,
8.00895830e-01, 1.76915675e-02, 3.45567328e-02, 4.08880776e-02,
2.78965693e-02, 2.78965693e-02, 3.30076659e-02, 2.89250947e-02,
3.73421367e-02, 2.79544637e-02, 3.12849052e-02, 1.48178637e+03,
1.97809969e+03, 9.99596007e-01, 9.98114702e-01, 2.14199144e-01,
9.99232959e-01, 5.96066793e-02])
Unfortunately, some of the information from the R object got lost during the printing of fitMeasures(fitBase)
.
So we create a small work around:
# Helper function to extract fit measures from the fitBase model
def fit_measures(model_name):
ro.r(f"""
df_fit <- data.frame( measure = names(fitMeasures({model_name})),
value = unname(fitMeasures({model_name})),
row.names = NULL )
""")
fit_df = pandas2ri.rpy2py(ro.r["df_fit"])
return fit_df
# Extract fit measures from the fitBase model
fit_df = fit_measures('fitBase')
fit_df = fit_df.set_index('measure').T # Transpose the DataFrame for better readability
fit_df
measure | npar | fmin | chisq | df | pvalue | chisq.scaled | df.scaled | pvalue.scaled | chisq.scaling.factor | baseline.chisq | ... | crmr_nomean | srmr_mplus | srmr_mplus_nomean | cn_05 | cn_01 | gfi | agfi | pgfi | mfi | ecvi |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
value | 22.000000 | 0.004252 | 7.321351 | 6.000000 | 0.292148 | 6.854113 | 6.000000 | 0.334550 | 1.068169 | 716.580288 | ... | 0.037342 | 0.027954 | 0.031285 | 1481.786368 | 1978.099687 | 0.999596 | 0.998115 | 0.214199 | 0.999233 | 0.059607 |
1 rows × 78 columns
Option 2 to specify the model#
In the second variant, we restrict all the loadings to be equal across groups
using the group.equal
argument. We then free up the SP loading through the
group.partial
argument.
# Second option to specify the model
ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'")
ro.r('fitBase <- lavaan::cfa(GP_model,data = Bergh, group = "gender", group.equal = c("loadings"), group.partial = c("GP=~ SP"), estimator = "MLR")')
fit_df = fit_measures('fitBase')
# This time we print the fit measures we are interested in
def print_fit_summary(fit_df):
"""
Print a summary of model fit statistics from a fit_measures df.
"""
fit_series = fit_df.set_index('measure')['value'] # Convert to series to facilitate indexing
chisq = fit_series['chisq']
df = int(fit_series['df'])
pval = fit_series['pvalue']
rmsea = fit_series['rmsea']
rmsea_low = fit_series['rmsea.ci.lower']
rmsea_up = fit_series['rmsea.ci.upper']
cfi = fit_series['cfi']
srmr = fit_series['srmr']
print(
f"It gives a χ²-value of {chisq:.3f} (df = {df}, p = {pval:.3g}), "
f"a RMSEA of {rmsea:.3f} with a 90% CI of [{rmsea_low:.3f}, {rmsea_up:.3f}], "
f"a CFI of {cfi:.3f}, and an SRMR of {srmr:.3f}."
)
print_fit_summary(fit_df)
It gives a χ²-value of 7.321 (df = 6, p = 0.292), a RMSEA of 0.023 with a 90% CI of [0.000, 0.069], a CFI of 0.998, and an SRMR of 0.028.
Both variants lead to the same results. Using our function fit_measures()
, which calls fitMeasures(fitBase)
,
we get a \(\chi^2\)-value of 7.321 (df = 6, p = 0.292), a RMSEA of 0.023 with a 90%
CI of [0, 0.069], a CFI of 0.998, and an SRMR of 0.028. The model fits well.
Since the difference in intercepts between men and women is negligible for the
EP indicator, we can force the intercepts to be equal for this indicator. Let us build on
the second specification variant from above, in order to set up this model. Through
group.equal we constrain all loadings and intercepts to be equal across groups
and, subsequently, free up the SP loading as well as the intercepts for DP, HP, and
SP using group.partial
:
ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'")
ro.r("""
fitBase1 <- lavaan::cfa(GP_model, data = Bergh,
group = "gender", group.equal = c("loadings", "intercepts"),
group.partial = c("GP=~SP", "DP~1", "HP~1", "SP~1"),
estimator = "MLR")
""")
fitBase1_df = fit_measures('fitBase1')
print_fit_summary(fitBase1_df)
It gives a χ²-value of 7.321 (df = 6, p = 0.292), a RMSEA of 0.023 with a 90% CI of [0.000, 0.069], a CFI of 0.998, and an SRMR of 0.028.
Again, this model fits well. We now use this model as baseline model and
compare it to two additional models. First (1), we constrain the SP loading to be 0
for the women (ingroup-outgroup model). We can specify this restriction directly in
the model specification through c(NA,0)
. NA means free to vary across men (first
group), whereas the second element fixes the parameter to 0 for the women. Note
that this is quite a restrictive assumption.
This step turns our CFA from being purely descriptive into a theory-driven hypothesis test:
Men view sexism (SP) as part of generalized prejudice (i.e., it loads onto GP)
Women do not, i.e., sexism is not psychologically integrated into a broader prejudice construct for them
ro.r("GP_model2 <- 'GP =~ c(v1,v1)*EP + c(v2,v2)*HP + c(v3,v3)*DP + c(NA, 0)*SP'")
ro.r("""
fitIO <- lavaan::cfa(GP_model2, data = Bergh,
group = "gender", group.equal = c("intercepts"),
group.partial = c("DP~1", "HP~1", "SP~1"),
estimator = "MLR")
""")
fitIO_df = fit_measures('fitIO')
print_fit_summary(fitIO_df)
It gives a χ²-value of 233.756 (df = 7, p = 0), a RMSEA of 0.274 with a 90% CI of [0.245, 0.305], a CFI of 0.678, and an SRMR of 0.148.
Bad fit.
In the next model (2), we restrict all the loadings to be equal (marginalization model). The intercepts have the same constraints as above. The model assumes:
The structure of prejudice (GP) is shared across gender (i.e., loadings are equal)
But certain prejudices are experienced or expressed differently in terms of baseline levels (intercepts) — particularly HP, DP, SP — likely because women are more often targets of these forms of prejudice
This aligns with theories suggesting men and women perceive prejudice similarly, but women internalize or report it differently, especially on issues that affect them directly.
ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'")
ro.r("""
fitMarg <- lavaan::cfa(GP_model, data = Bergh,
group = "gender", group.equal = c("loadings", "intercepts"),
group.partial = c("DP~1", "HP~1", "SP~1"),
estimator = "MLR")
""")
fitMarg_df = fit_measures('fitMarg')
print_fit_summary(fitMarg_df)
It gives a χ²-value of 14.496 (df = 7, p = 0.043), a RMSEA of 0.050 with a 90% CI of [0.008, 0.086], a CFI of 0.989, and an SRMR of 0.036.
We get a \(\chi^2\)-value of 14.496 (df = 7, p = 0.043), a RMSEA of 0.05 with a 90% CI of [0.008, 0.086], a CFI of 0.989, and an SRMR of 0.036. In terms of goodness-of-fit statistics, we do not see much of a difference compared to the baseline model. Let us do some model comparison. Since the marginalization model is nested in the basemodel, we can apply the following \(\chi^2\)-difference test. Note that a significant result suggests that the higher parametrized model fits significantly worse.
print(ro.r("anova(fitMarg, fitBase1)"))
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")
lavaan->lavTestLRT():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fitBase1 6 7377.7 7482.4 7.3214
fitMarg 7 7382.9 7482.8 14.4960 6.4063 1 0.01137 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation#
The scaled chi-square difference test comparing the marginalization model (fitMarg
) to the baseline model (fitBase1
) yielded a significant result:
χ²_diff(1) = 6.41, p = .011. This suggests that constraining the loadings to be equal across gender (as done in fitMarg
) results in a statistically worse fit compared to the less constrained model.
However, this statistical significance is small in practical terms: the AIC and BIC differences are minimal, and both models show excellent overall fit in the absolute fit indices (CFI ≈ 1.00, RMSEA ≈ 0.00).
Therefore, despite the statistical difference, the marginalization model remains theoretically and empirically defensible, especially given its alignment with the idea that prejudice has a shared structure across gender, but differently experienced or expressed (i.e., varying intercepts for personally relevant domains like SP, HP, and DP). In contexts prioritizing interpretability and theoretical grounding, the marginalization model may be a preferable choice.
Conclusions#
The alternative approach presented here may be quite confusing as it wasn’t covered in the lecture. So lets put it in words.
The goal here is to investigate the between-group measurement (in)variance of one specific indicator. In the approach mentioned before (and in the lecture) we investigated whether measurement invariance holds for all indicators.
As you might have guessed, the indicator we are investigating in this alternative approach is SP
. Let us revisit the steps:
We fit a model in which all loadings but the one from
SP
are fixed (FitBase1
).We fit alternative models with alternative properties of the
SP
laodings (all the other loadings stay the same). (GP_model2
: FixSP
loading for women but not for men;fitMarg
: Fix all loadings).We compare model fits. Let us check the comparison of
fitMarg
andfitBase1
again:
print(ro.r("anova(fitMarg, fitBase1)"))
Scaled Chi-Squared Difference Test (method = “satorra.bentler.2001”)
lavaan->lavTestLRT():
lavaan NOTE: The “Chisq” column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fitBase1 6 7377.7 7482.4 7.3214
fitMarg 7 7382.9 7482.8 14.4960 6.4063 1 0.01137 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Before we look at the values, lets think about what this comparison is good for. In the fitBase1
model we allow the loadings of SP
to vary between men and women. In the fitMarg
model we assume them to be equal for both groups. Now we can apply our ‘normal’ principle again: If the more restrictive model that assumes measurement invariance (fitMarg
) is not significantly worse than the model that allows the loadings to vary (fitBase1
) we assume measurement invariance. The critical point here is that we only investigated measurement invaraince for one indicator (SP
) and not for the whole model (as done in the previous section).