7.1 Group Invariance (Quantitative)#

Measurement invariance across groups for quantitative item responses#

# General imports
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.float_format', '{:.6f}'.format) # Set float format for pandas display

# Rpy2 imports
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri, numpy2ri
from rpy2.robjects.packages import importr

# Automatic conversion of arrays and dataframes
pandas2ri.activate()
numpy2ri.activate()

# Set random seed for reproducibility
ro.r('set.seed(123)')

# Ipython extenrsion for magix plotting
%load_ext rpy2.ipython

# R imports
importr('base')
importr('lavaan')
importr('semTools')
importr('MPsychoR')
importr('lordif')
importr('Hmisc')
The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython
rpy2.robjects.packages.Package as a <module 'Hmisc'>

Measurement invariance means that we measure the same psychological construct across groups and time (see lectures 8 & 9 for examples). In the following RMD we will look at how we can assess measurement invariance across…

  • groups for quantitative item responses

  • groups for qualitative item responses

  • time for quantitative responses

The four most important measurement invariance structures are the following:

  • Configural invariance: same factor structure but unrestricted loadings and intercepts across groups.

  • Metric invariance: the factor structure and the loadings are constrained to be equal across groups; the intercepts are free. (In the book this is referred to as Weak invariance)

  • Scale invariance: the factor structure, the loadings and the indicator intercepts are restricted to be equal across groups. (In the book this is referred to as Strong invariance)

  • Strict invariance: the factor structure, the loadings, the indicator intercepts and the reliabilities are restricted to be equal across groups.

Note. In the book there is is also a model in which, in addition to the indicator intercepts, the means of the latent variables are restricted to be equal as well, however we will not cover this here.

We now load and inspect the Bergh dataset found within the lavaan package from R.

Load, prepare and inspect the dataset#

ro.r('data("Bergh")')                             # Load the Bergh dataset from R
Bergh = pandas2ri.rpy2py(ro.globalenv['Bergh'])   # Convert the R dataframe to a pandas dataframe
Bergh.head()                                           
EP SP HP DP A1 A2 A3 O1 O2 O3 gender
1 2.666667 3.125000 1.400000 2.818182 3.437500 3.600000 3.352941 2.875000 3.400000 3.176471 1
2 2.666667 3.250000 1.400000 2.545455 2.312500 2.666667 3.117647 4.437500 3.866667 4.470588 1
3 1.000000 1.625000 2.700000 2.000000 3.562500 4.600000 3.941176 4.250000 3.666667 3.705882 1
4 2.666667 2.750000 1.800000 2.818182 2.750000 3.200000 3.352941 2.875000 3.400000 3.117647 1
5 2.888889 3.250000 2.700000 3.000000 3.250000 4.200000 3.764706 3.937500 4.400000 4.294118 1

The dataset#

We illustrate today’s topic using a dataset from Bergh et al. (2016).

In the study by Bergh et al. (2016), a multigroup confirmatory factor analysis (CFA) was conducted to assess whether a generalized prejudice factor (GP) could explain shared variance across four prejudice domains:

  • Ethnic prejudice (EP)

  • Sexism (SP)

  • Sexual prejudice against non-heterosexuals (HP)

  • Prejudice toward people with disabilities (DP)

The multigroup aspect comes in by exploring differences in CFA parameters across gender.

Fit the models#

Luckily, we do not have to specify all the models mentioned above by ourselves. Instead, we can use the measurementInvariance() function from the semTools package. We only need to specify few elements:

  1. ) The model equation (i.e. which items we want to include),

  2. ) The dataset,

  3. ) The grouping variable and

  4. ) The estimator (We use a robust ML estimator since some of the prejudice measures deviate from normality).

Note. strict = T gives us a configural, metric (weak), scale (strong), strict and a fifth model. With strict = F we will not get the strict model.

ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'")  # Define the measurement model for General Personality (GP)
ro.r(""" 
     minvfit <- measurementInvariance(         # Function call: measurementInvariance()
            model = GP_model,                  # 1. Pass the measurement model
            data = Bergh,                      # 2. Pass the dataset  
            group = "gender",                  # 3. Specify the grouping variable
            estimator = "MLR",                 # 4. Maximum Likelihood with Robust standard errors
            strict = T)                        # Get strict model
     """)
Measurement invariance models:

Model 1 : fit.configural
Model 2 : fit.loadings
Model 3 : fit.intercepts
Model 4 : fit.residuals
Model 5 : fit.means


Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->lavTestLRT():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
               Df    AIC    BIC    Chisq Chisq diff Df diff Pr(>Chisq)    
fit.configural  4 7376.1 7490.2   1.6552                                  
fit.loadings    7 7382.9 7482.8  14.4960     11.834       3   0.007972 ** 
fit.intercepts 10 7417.7 7503.3  55.2875     47.833       3  2.311e-10 ***
fit.residuals  14 7429.8 7496.4  75.3689     12.468       4   0.014189 *  
fit.means      15 7482.4 7544.3 130.0375     49.267       1  2.234e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Fit measures:

               cfi.scaled rmsea.scaled cfi.scaled.delta rmsea.scaled.delta
fit.configural      1.000        0.000               NA                 NA
fit.loadings        0.988        0.046            0.012              0.046
fit.intercepts      0.914        0.102            0.074              0.056
fit.residuals       0.904        0.091            0.010              0.011
fit.means           0.816        0.122            0.088              0.031
R[write to console]: In addition: 
R[write to console]: Warning message:

R[write to console]: In measurementInvariance(model = GP_model, data = Bergh, group = "gender",  :
R[write to console]: 
 
R[write to console]:  The measurementInvariance function is deprecated, and it will cease to be included in future versions of semTools. See help('semTools-deprecated) for details.

This code chunk fits four invariance models: configural, metric (weak), scale (strong), strict and a fifth model where, in addition to the indicator intercepts, the means of the latent variables are restricted to be equal as well. As mentioned above, we will not deal with the last model.

Each model is tested against the previous one, and a significant result indicates that the higher restricted model is significantly worse than the previous model. In our example, only configural variance holds as restricting the model parameters to match the assumptions of metric invariance results in a significantly worse fit.

Select the best fitting model#

In our example we would go with the configural model, for which the output can be requested as follows:

# Extract only the configural model
ro.r("Fit_conf <- minvfit$fit.configural")

We now investigate the Fit_conf using summary(). Remember to set standardized = TRUE & fit.measures = TRUE.

summary_conf = ro.r("summary(Fit_conf, standardized = TRUE, fit.measures = TRUE)")
print(summary_conf)
lavaan 0.6-19 ended normally after 45 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        24

  Number of observations per group:                   
    male                                           249
    female                                         612

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                 1.655       1.549
  Degrees of freedom                                 4           4
  P-value (Chi-square)                           0.799       0.818
  Scaling correction factor                                  1.068
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    male                                         0.348       0.348
    female                                       1.201       1.201

Model Test Baseline Model:

  Test statistic                               716.580     530.342
  Degrees of freedom                                12          12
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.351

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000       1.000
  Tucker-Lewis Index (TLI)                       1.010       1.014
                                                                  
  Robust Comparative Fit Index (CFI)                         1.000
  Robust Tucker-Lewis Index (TLI)                            1.011

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3664.027   -3664.027
  Scaling correction factor                                  1.284
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -3663.200   -3663.200
  Scaling correction factor                                  1.253
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                7376.055    7376.055
  Bayesian (BIC)                              7490.249    7490.249
  Sample-size adjusted Bayesian (SABIC)       7414.031    7414.031

Root Mean Square Error of Approximation:

  RMSEA                                          0.000       0.000
  90 Percent confidence interval - lower         0.000       0.000
  90 Percent confidence interval - upper         0.046       0.042
  P-value H_0: RMSEA <= 0.050                    0.961       0.971
  P-value H_0: RMSEA >= 0.080                    0.003       0.002
                                                                  
  Robust RMSEA                                               0.000
  90 Percent confidence interval - lower                     0.000
  90 Percent confidence interval - upper                     0.046
  P-value H_0: Robust RMSEA <= 0.050                         0.961
  P-value H_0: Robust RMSEA >= 0.080                         0.003

Standardized Root Mean Square Residual:

  SRMR                                           0.008       0.008

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [male]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  GP =~                                                                 
    EP                1.000                               0.489    0.717
    HP                1.452    0.289    5.024    0.000    0.709    0.387
    DP                0.766    0.100    7.670    0.000    0.375    0.685
    SP                1.137    0.153    7.409    0.000    0.556    0.805

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .EP                2.141    0.043   49.594    0.000    2.141    3.143
   .HP                1.379    0.116   11.883    0.000    1.379    0.753
   .DP                2.204    0.035   63.623    0.000    2.204    4.032
   .SP                2.466    0.044   56.376    0.000    2.466    3.573

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .EP                0.225    0.041    5.435    0.000    0.225    0.486
   .HP                2.851    0.362    7.873    0.000    2.851    0.850
   .DP                0.159    0.022    7.197    0.000    0.159    0.531
   .SP                0.168    0.031    5.341    0.000    0.168    0.352
    GP                0.239    0.044    5.450    0.000    1.000    1.000


Group 2 [female]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  GP =~                                                                 
    EP                1.000                               0.536    0.752
    HP                0.771    0.126    6.137    0.000    0.413    0.289
    DP                0.691    0.054   12.798    0.000    0.371    0.718
    SP                0.774    0.059   13.104    0.000    0.415    0.661

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .EP                1.927    0.029   66.895    0.000    1.927    2.704
   .HP                1.150    0.058   19.893    0.000    1.150    0.804
   .DP                1.999    0.021   95.793    0.000    1.999    3.872
   .SP                1.972    0.025   77.676    0.000    1.972    3.140

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .EP                0.221    0.029    7.526    0.000    0.221    0.434
   .HP                1.874    0.184   10.200    0.000    1.874    0.917
   .DP                0.129    0.017    7.420    0.000    0.129    0.484
   .SP                0.222    0.019   11.533    0.000    0.222    0.563
    GP                0.287    0.034    8.357    0.000    1.000    1.000

Model Fit Summary#

  • The model shows excellent fit in both gender groups:

    • Chi-square p-values non-significant (male: 0.348, female: 1.201)

    • CFI/TLI ≥ 1.00, RMSEA = 0.000, SRMR = 0.008

    • For a recap on fit indexes see Model Fit from the previous session.

Differences in Factor Loadings#

  • HP loading is notably higher in males (1.452 / 0.387 standardized) than in females (0.771 / 0.289)

    • Suggests that HP contributes more strongly to GP for males

Interpretation: The latent factor GP is expressed somewhat differently by gender, especially due to the role of HP.

Differences in Intercepts#

  • Intercepts for all indicators are lower in females than in males

    • Implies potential mean differences in observed scores, with females generally scoring lower on prejudice indicators when controlling for GP

However, these differences should not be interpreted directly unless scalar invariance is confirmed.


Yes, but why?

You can only compare means of latent variables across groups (e.g., gender) if scalar invariance holds. We saw before that scalar invariance means:

  1. Factor loadings are equal across groups (metric invariance), and

  2. Intercepts of the observed variables are also equal across groups.

If scalar invariance holds, it implies that the measurement model behaves the same way across groups, so:

  • A 1-unit change in the latent variable produces the same observed value across groups

  • Any mean differences in observed variables reflect true differences in the latent trait, not measurement bias

If scalar invariance doesn’t hold

  • Differences in intercepts could reflect bias, group-specific interpretations, or item functioning

  • As a result, you can’t validly interpret observed or latent mean differences

    • e.g., Females scoring lower on SP might reflect differences in how SP is measured, not true lower prejudice


Residual Variances and Reliability#

  • Residual variance for HP is higher in females (0.917) than males (0.850)

    • Suggests lower measurement reliability for HP in the female group

Preliminary Conclusion#

  • The model fits well in both gender groups

  • But key factor loadings and intercepts differ, particularly for HP

  • These findings suggest partial measurement invariance

Another Option:#

Instead of using these generic invariance sequences of models, we can approach multigroup CFA from a more hypothesis-driven direction. In the first model we are going to fit, the loadings for ethnic minorities (EP), non-heterosexuals (HP), and people with disabilities (DP) are held equal across men and women. We keep the sexism (SP) loadings free. Note that all intercepts are free to vary.

In lavaan, this can be achieved by using one of the following two equivalent model formulations. In the first variant, we explicitly incorporate the loadings restrictions in the model formulation. We specify a vector of length 2, the first element denoting the loading for the first gender category and the second element denoting the loading for the second category. Since we use the same loadings symbols for both categories, they are restricted to be equal.

Option 1 to specify the model#

# First option to specify the model 
ro.r("GP_model <-'GP =~ c(v1,v1)*EP + c(v2,v2)*HP + c(v3,v3)*DP + SP'") # The loading of SP is not fixed.
ro.r('fitBase <-lavaan::cfa(GP_model, data = Bergh, group = "gender", estimator = "MLR")')
ro.r("fitMeasures(fitBase)")
array([ 2.20000000e+01,  4.25165558e-03,  7.32135091e+00,  6.00000000e+00,
        2.92148336e-01,  6.85411314e+00,  6.00000000e+00,  3.34549818e-01,
        1.06816896e+00,  7.16580288e+02,  1.20000000e+01,  0.00000000e+00,
        5.30342427e+02,  1.20000000e+01,  0.00000000e+00,  1.35116530e+00,
        9.98124627e-01,  9.96249254e-01,  9.98352222e-01,  9.96704444e-01,
        9.98697343e-01,  9.97394686e-01,  9.96249254e-01,  9.79565860e-01,
        9.89782930e-01,  4.94891465e-01,  9.98140462e-01,  9.98124627e-01,
        9.96704444e-01,  9.74152122e-01,  9.87076061e-01,  4.93538031e-01,
        9.98371078e-01,  9.98352222e-01,  9.97394686e-01,  9.98697343e-01,
       -3.66686045e+03, -3.66319978e+03,  7.37772090e+03,  7.48239898e+03,
        8.61000000e+02,  7.41253284e+03,  1.25325348e+00,  1.19508682e+00,
        2.26176188e-02,  0.00000000e+00,  6.94429738e-02,  9.00000000e-01,
        7.89925895e-01,  5.00000000e-02,  1.69311900e-02,  8.00000000e-02,
        1.81842544e-02,  0.00000000e+00,  6.57201349e-02,  8.29630172e-01,
        1.05640614e-02,  1.87938378e-02,  0.00000000e+00,  6.94032961e-02,
        8.00895830e-01,  1.76915675e-02,  3.45567328e-02,  4.08880776e-02,
        2.78965693e-02,  2.78965693e-02,  3.30076659e-02,  2.89250947e-02,
        3.73421367e-02,  2.79544637e-02,  3.12849052e-02,  1.48178637e+03,
        1.97809969e+03,  9.99596007e-01,  9.98114702e-01,  2.14199144e-01,
        9.99232959e-01,  5.96066793e-02])

Unfortunately, some of the information from the R object got lost during the printing of fitMeasures(fitBase).

So we create a small work around:

# Helper function to extract fit measures from the fitBase model
def fit_measures(model_name):
  ro.r(f"""
      df_fit <- data.frame( measure = names(fitMeasures({model_name})),
                            value   = unname(fitMeasures({model_name})),
                            row.names = NULL )
        """)
  fit_df = pandas2ri.rpy2py(ro.r["df_fit"])
  return fit_df

# Extract fit measures from the fitBase model
fit_df = fit_measures('fitBase') 
fit_df = fit_df.set_index('measure').T  # Transpose the DataFrame for better readability
fit_df
measure npar fmin chisq df pvalue chisq.scaled df.scaled pvalue.scaled chisq.scaling.factor baseline.chisq ... crmr_nomean srmr_mplus srmr_mplus_nomean cn_05 cn_01 gfi agfi pgfi mfi ecvi
value 22.000000 0.004252 7.321351 6.000000 0.292148 6.854113 6.000000 0.334550 1.068169 716.580288 ... 0.037342 0.027954 0.031285 1481.786368 1978.099687 0.999596 0.998115 0.214199 0.999233 0.059607

1 rows × 78 columns

Option 2 to specify the model#

In the second variant, we restrict all the loadings to be equal across groups using the group.equal argument. We then free up the SP loading through the group.partial argument.

# Second option to specify the model
ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'")
ro.r('fitBase <- lavaan::cfa(GP_model,data = Bergh, group = "gender", group.equal = c("loadings"), group.partial = c("GP=~ SP"), estimator = "MLR")')
fit_df = fit_measures('fitBase')

# This time we print the fit measures we are interested in
def print_fit_summary(fit_df):
    """
    Print a summary of model fit statistics from a fit_measures df.
    """
    fit_series = fit_df.set_index('measure')['value']  # Convert to series to facilitate indexing
    chisq = fit_series['chisq']
    df = int(fit_series['df'])
    pval = fit_series['pvalue']
    rmsea = fit_series['rmsea']
    rmsea_low = fit_series['rmsea.ci.lower']
    rmsea_up = fit_series['rmsea.ci.upper']
    cfi = fit_series['cfi']
    srmr = fit_series['srmr']

    print(
        f"It gives a χ²-value of {chisq:.3f} (df = {df}, p = {pval:.3g}), "
        f"a RMSEA of {rmsea:.3f} with a 90% CI of [{rmsea_low:.3f}, {rmsea_up:.3f}], "
        f"a CFI of {cfi:.3f}, and an SRMR of {srmr:.3f}."
    )

print_fit_summary(fit_df)
It gives a χ²-value of 7.321 (df = 6, p = 0.292), a RMSEA of 0.023 with a 90% CI of [0.000, 0.069], a CFI of 0.998, and an SRMR of 0.028.

Both variants lead to the same results. Using our function fit_measures(), which calls fitMeasures(fitBase), we get a \(\chi^2\)-value of 7.321 (df = 6, p = 0.292), a RMSEA of 0.023 with a 90% CI of [0, 0.069], a CFI of 0.998, and an SRMR of 0.028. The model fits well.

Since the difference in intercepts between men and women is negligible for the EP indicator, we can force the intercepts to be equal for this indicator. Let us build on the second specification variant from above, in order to set up this model. Through group.equal we constrain all loadings and intercepts to be equal across groups and, subsequently, free up the SP loading as well as the intercepts for DP, HP, and SP using group.partial:

ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'")

ro.r("""
      fitBase1 <- lavaan::cfa(GP_model, data = Bergh, 
      group = "gender", group.equal = c("loadings", "intercepts"),
      group.partial = c("GP=~SP", "DP~1", "HP~1", "SP~1"),
      estimator = "MLR")
     """)

fitBase1_df = fit_measures('fitBase1')
print_fit_summary(fitBase1_df)
It gives a χ²-value of 7.321 (df = 6, p = 0.292), a RMSEA of 0.023 with a 90% CI of [0.000, 0.069], a CFI of 0.998, and an SRMR of 0.028.

Again, this model fits well. We now use this model as baseline model and compare it to two additional models. First (1), we constrain the SP loading to be 0 for the women (ingroup-outgroup model). We can specify this restriction directly in the model specification through c(NA,0). NA means free to vary across men (first group), whereas the second element fixes the parameter to 0 for the women. Note that this is quite a restrictive assumption.

This step turns our CFA from being purely descriptive into a theory-driven hypothesis test:

  • Men view sexism (SP) as part of generalized prejudice (i.e., it loads onto GP)

  • Women do not, i.e., sexism is not psychologically integrated into a broader prejudice construct for them

ro.r("GP_model2 <- 'GP =~ c(v1,v1)*EP + c(v2,v2)*HP + c(v3,v3)*DP + c(NA, 0)*SP'")

ro.r("""
      fitIO <- lavaan::cfa(GP_model2, data = Bergh,
      group = "gender", group.equal = c("intercepts"),
      group.partial = c("DP~1", "HP~1", "SP~1"),
      estimator = "MLR")
     """)

fitIO_df = fit_measures('fitIO')
print_fit_summary(fitIO_df)
It gives a χ²-value of 233.756 (df = 7, p = 0), a RMSEA of 0.274 with a 90% CI of [0.245, 0.305], a CFI of 0.678, and an SRMR of 0.148.

Bad fit.

In the next model (2), we restrict all the loadings to be equal (marginalization model). The intercepts have the same constraints as above. The model assumes:

  • The structure of prejudice (GP) is shared across gender (i.e., loadings are equal)

  • But certain prejudices are experienced or expressed differently in terms of baseline levels (intercepts) — particularly HP, DP, SP — likely because women are more often targets of these forms of prejudice

This aligns with theories suggesting men and women perceive prejudice similarly, but women internalize or report it differently, especially on issues that affect them directly.

ro.r("GP_model <- 'GP =~ EP + HP + DP + SP'")

ro.r("""
     fitMarg <- lavaan::cfa(GP_model, data = Bergh,
      group = "gender", group.equal = c("loadings", "intercepts"),
      group.partial = c("DP~1", "HP~1", "SP~1"),
      estimator = "MLR")
     """)

fitMarg_df = fit_measures('fitMarg')
print_fit_summary(fitMarg_df)
It gives a χ²-value of 14.496 (df = 7, p = 0.043), a RMSEA of 0.050 with a 90% CI of [0.008, 0.086], a CFI of 0.989, and an SRMR of 0.036.

We get a \(\chi^2\)-value of 14.496 (df = 7, p = 0.043), a RMSEA of 0.05 with a 90% CI of [0.008, 0.086], a CFI of 0.989, and an SRMR of 0.036. In terms of goodness-of-fit statistics, we do not see much of a difference compared to the baseline model. Let us do some model comparison. Since the marginalization model is nested in the basemodel, we can apply the following \(\chi^2\)-difference test. Note that a significant result suggests that the higher parametrized model fits significantly worse.

print(ro.r("anova(fitMarg, fitBase1)"))
Scaled Chi-Squared Difference Test (method = "satorra.bentler.2001")

lavaan->lavTestLRT():  
   lavaan NOTE: The "Chisq" column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
         Df    AIC    BIC   Chisq Chisq diff Df diff Pr(>Chisq)  
fitBase1  6 7377.7 7482.4  7.3214                                
fitMarg   7 7382.9 7482.8 14.4960     6.4063       1    0.01137 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation#

The scaled chi-square difference test comparing the marginalization model (fitMarg) to the baseline model (fitBase1) yielded a significant result:
χ²_diff(1) = 6.41, p = .011. This suggests that constraining the loadings to be equal across gender (as done in fitMarg) results in a statistically worse fit compared to the less constrained model.

However, this statistical significance is small in practical terms: the AIC and BIC differences are minimal, and both models show excellent overall fit in the absolute fit indices (CFI ≈ 1.00, RMSEA ≈ 0.00).

Therefore, despite the statistical difference, the marginalization model remains theoretically and empirically defensible, especially given its alignment with the idea that prejudice has a shared structure across gender, but differently experienced or expressed (i.e., varying intercepts for personally relevant domains like SP, HP, and DP). In contexts prioritizing interpretability and theoretical grounding, the marginalization model may be a preferable choice.

Conclusions#

The alternative approach presented here may be quite confusing as it wasn’t covered in the lecture. So lets put it in words.

The goal here is to investigate the between-group measurement (in)variance of one specific indicator. In the approach mentioned before (and in the lecture) we investigated whether measurement invariance holds for all indicators.

As you might have guessed, the indicator we are investigating in this alternative approach is SP. Let us revisit the steps:

  1. We fit a model in which all loadings but the one from SP are fixed (FitBase1).

  2. We fit alternative models with alternative properties of the SP laodings (all the other loadings stay the same). (GP_model2: Fix SP loading for women but not for men; fitMarg: Fix all loadings).

  3. We compare model fits. Let us check the comparison of fitMargand fitBase1 again:

print(ro.r("anova(fitMarg, fitBase1)"))
Scaled Chi-Squared Difference Test (method = “satorra.bentler.2001”)

lavaan->lavTestLRT():  
   lavaan NOTE: The “Chisq” column contains standard test statistics, not the 
   robust test that should be reported per model. A robust difference test is 
   a function of two standard (not robust) statistics.
         Df    AIC    BIC   Chisq Chisq diff Df diff Pr(>Chisq)  
fitBase1  6 7377.7 7482.4  7.3214                                
fitMarg   7 7382.9 7482.8 14.4960     6.4063       1    0.01137 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Before we look at the values, lets think about what this comparison is good for. In the fitBase1 model we allow the loadings of SP to vary between men and women. In the fitMarg model we assume them to be equal for both groups. Now we can apply our ‘normal’ principle again: If the more restrictive model that assumes measurement invariance (fitMarg) is not significantly worse than the model that allows the loadings to vary (fitBase1) we assume measurement invariance. The critical point here is that we only investigated measurement invaraince for one indicator (SP) and not for the whole model (as done in the previous section).