6.1 Tau Congeneric models#

Usage#

This notebook illustrates the tau-congeneric measurement model using Data_EmotionalClarity.dat. Load the dataset, explore descriptive statistics, and fit the model via lavaan with rpy2.

The dataset#

For this exercise we use a dataset from Lischetzke (2003). The construct we want to measure is emotional clarity by means of reaction times (RT) on a mood intensity scale. It is assumed that the faster people assess their mood, the greater the emotional clarity.

Load and inspect the full data set#

# General imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Rpy2 imports
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri, numpy2ri
from rpy2.robjects.packages import importr

# Automatic conversion of arrays and dataframes
pandas2ri.activate()
numpy2ri.activate()

# Set random seed for reproducibility
ro.r('set.seed(123)')

# Ipython extenrsion for magix plotting
%load_ext rpy2.ipython

# R imports
importr('base')
importr('lavaan')
importr('psych')
importr('stats')
c:\Users\maku1542\AppData\Local\miniconda3\envs\psy126\Lib\site-packages\rpy2\robjects\packages.py:367: UserWarning: The symbol 'quartz' is not in this R namespace/package.
  warnings.warn(
rpy2.robjects.packages.Package as a <module 'stats'>
file_name = "data/Data_EmotionalClarity.dat"
dat = pd.read_csv(file_name, sep="\t")
print(dat.head())
   sex    item_1    item_2    item_3    item_4    item_5    item_6
0    1  1.463255  1.739589  1.384292  1.568408  1.457452  1.628260
1    1  1.689358  1.789256  1.771557  1.696533  1.395997  1.842294
2    0  1.300736  1.492455  1.347294  1.178347  1.784903  1.221125
3    0  1.588419  1.459545  1.300736  1.278152  1.145496  1.446213
4    0  1.182953  0.914289  0.997686  1.357895  0.875052  1.232852

Extract items 1 to 6 for the analysis#

dat2 = dat.iloc[:, 1:7]
print(dat2.head())
     item_1    item_2    item_3    item_4    item_5    item_6
0  1.463255  1.739589  1.384292  1.568408  1.457452  1.628260
1  1.689358  1.789256  1.771557  1.696533  1.395997  1.842294
2  1.300736  1.492455  1.347294  1.178347  1.784903  1.221125
3  1.588419  1.459545  1.300736  1.278152  1.145496  1.446213
4  1.182953  0.914289  0.997686  1.357895  0.875052  1.232852

Tau-Congeneric Measurement Model#

We will now start testing the measurement models that were covered in lecture section of this course two weeks ago.

The Tau Congeneric measurement model is the least restrictive one out of the measurement models that we will use today. It assumes that:

  • items differ in their difficulty

  • items differ in their discrimination power

  • items vary in their reliability

We therefore obtain estimates for the loadings (Latent Variables section), the intercepts (Intercepts section), and the errors (Variances section).

Fit the model#

We are now going to define the model using lavaan syntax.

# Put data into R
ro.globalenv['dat2'] = dat2
# Specify the model
ro.r("mtc <- 'eta =~ item_1 + item_2 + item_3 + item_4 + item_5 + item_6'")
# Fit the model
ro.r('fitmtc <- sem(mtc, data=dat2, meanstructure=TRUE)')
# Print the output of the model for interpretation
summary_fitmtc = ro.r("summary(fitmtc, fit.measures=TRUE, standardized=TRUE)")
print(summary_fitmtc)
lavaan 0.6-19 ended normally after 38 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        18

  Number of observations                           238

Model Test User Model:
                                                      
  Test statistic                                 9.568
  Degrees of freedom                                 9
  P-value (Chi-square)                           0.387

Model Test Baseline Model:

  Test statistic                               435.847
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.999
  Tucker-Lewis Index (TLI)                       0.998

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -432.180
  Loglikelihood unrestricted model (H1)       -427.396
                                                      
  Akaike (AIC)                                 900.360
  Bayesian (BIC)                               962.861
  Sample-size adjusted Bayesian (SABIC)        905.806

Root Mean Square Error of Approximation:

  RMSEA                                          0.016
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.076
  P-value H_0: RMSEA <= 0.050                    0.763
  P-value H_0: RMSEA >= 0.080                    0.036

Standardized Root Mean Square Residual:

  SRMR                                           0.021

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  eta =~                                                                
    item_1            1.000                               0.229    0.638
    item_2            1.098    0.131    8.404    0.000    0.251    0.682
    item_3            1.194    0.140    8.535    0.000    0.273    0.697
    item_4            1.294    0.147    8.790    0.000    0.296    0.728
    item_5            1.032    0.131    7.886    0.000    0.236    0.628
    item_6            1.049    0.133    7.886    0.000    0.240    0.628

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .item_1            1.504    0.023   64.757    0.000    1.504    4.198
   .item_2            1.423    0.024   59.647    0.000    1.423    3.866
   .item_3            1.392    0.025   54.862    0.000    1.392    3.556
   .item_4            1.305    0.026   49.486    0.000    1.305    3.208
   .item_5            1.346    0.024   55.221    0.000    1.346    3.579
   .item_6            1.306    0.025   52.662    0.000    1.306    3.414

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .item_1            0.076    0.008    9.363    0.000    0.076    0.593
   .item_2            0.072    0.008    8.941    0.000    0.072    0.534
   .item_3            0.079    0.009    8.770    0.000    0.079    0.514
   .item_4            0.078    0.009    8.362    0.000    0.078    0.471
   .item_5            0.086    0.009    9.449    0.000    0.086    0.606
   .item_6            0.089    0.009    9.449    0.000    0.089    0.606
    eta               0.052    0.010    5.051    0.000    1.000    1.000

Alternatively you can have you parameters organized tidly in a dataframe, similar to the semopy outputs from last semester:

pe = ro.r('parameterEstimates(fitmtc)')        # R → pandas
pe = pandas2ri.rpy2py(pe)
pe
lhs op rhs est se z pvalue ci.lower ci.upper
1 eta =~ item_1 1.000000 0.000000 NaN NaN 1.000000 1.000000
2 eta =~ item_2 1.098183 0.130681 8.403543 0.000000e+00 0.842053 1.354313
3 eta =~ item_3 1.193622 0.139846 8.535263 0.000000e+00 0.919529 1.467715
4 eta =~ item_4 1.294106 0.147228 8.789826 0.000000e+00 1.005545 1.582667
5 eta =~ item_5 1.032055 0.130870 7.886123 3.108624e-15 0.775555 1.288555
6 eta =~ item_6 1.049491 0.133084 7.885902 3.108624e-15 0.788650 1.310332
7 item_1 ~~ item_1 0.076077 0.008125 9.362691 0.000000e+00 0.060151 0.092002
8 item_2 ~~ item_2 0.072360 0.008093 8.941056 0.000000e+00 0.056498 0.088222
9 item_3 ~~ item_3 0.078730 0.008978 8.769699 0.000000e+00 0.061135 0.096326
10 item_4 ~~ item_4 0.077839 0.009309 8.361802 0.000000e+00 0.059594 0.096085
11 item_5 ~~ item_5 0.085767 0.009077 9.449296 0.000000e+00 0.067977 0.103556
12 item_6 ~~ item_6 0.088700 0.009387 9.449471 0.000000e+00 0.070302 0.107097
13 eta ~~ eta 0.052306 0.010357 5.050545 4.405515e-07 0.032008 0.072604
14 item_1 ~1 1.504005 0.023225 64.756711 0.000000e+00 1.458484 1.549526
15 item_2 ~1 1.422903 0.023855 59.647002 0.000000e+00 1.376148 1.469659
16 item_3 ~1 1.392156 0.025376 54.862182 0.000000e+00 1.342421 1.441892
17 item_4 ~1 1.304696 0.026365 49.485916 0.000000e+00 1.253021 1.356370
18 item_5 ~1 1.346359 0.024381 55.220698 0.000000e+00 1.298572 1.394145
19 item_6 ~1 1.305712 0.024794 52.661972 0.000000e+00 1.257117 1.354308
20 eta ~1 0.000000 0.000000 NaN NaN 0.000000 0.000000

Model fit#

Before we look at the model parameters, let’s first review the model fit indices.
The insignificant p-value for the \(\chi^2\) test indicates that our model implied correlation matrix doesn’t deviate significantly from the data implied correlation matrix, suggesting a good fit. Furthermore, the CFI and TLI are > .95, also indicating a good fit. AIC and BIC can’t be interpreted individually but will be later used for comparing models (see below). Lastly, the RMSEA and SRMR are also < .08, also suggesting a good fit. In summary, all indices suggest that the models fits our data well.
As a reminder - the usual limit value / criteria for the various fit indices:

  • Chi-square / Chi-square p-value: The \(\chi^2\)-Test tests the null hypothesis that the model implied covariance matrix is equal to the empirical (actual) covariance matrix. Therefore, a low test statistic (and a non-significant p-value) indicate good fit.

  • CFI: The CFI compares the fit of your user-specified model to the baseline model, with values closer to 1 indicating that the user model has a much better fit. > .95 desirable

  • AIC & BIC: measures of the relative quality of the statistical model for a given set of data (BIC includes a penalty for the number of parameters in the model). Lower AIC & BIC values indicate a better model. This statistic can be only used for comparison but not as an absolute criterion.

  • RMSEA: The RMSEA can be seen as a statistic derived from the \(\chi^2\) test, adjusted for model complexity and less influenced by sample size. An RMSEA value of <0.08 indicates an adequate fit.

Latent variables section#

Increasing loadings can be interpreted as the respective item having a higher discrimination power. For example, item_1 has a loading of 1.098 while item_4 has a loading of 1.294, meaning that the same increase in the latent variable (i.e. the trait we measure) results in a larger difference in item_4 compared to item_1. Graphically this is represented by item_4 having a steeper slope. You might notice that the loadings are quite similar across items; keep this in mind for later.

Intercepts section#

The intercepts can be used to interpret the difficulty of the item. Here, bigger values indicate that an item is more difficult. However, watch out: The interpretation can differ in other cases. Here, larger intercepts relate to larger reaction times, meaning, according to the theory, the mood which is assessed with this item is ‘less emotionally clear’. On the other hand, if we would like to assess intelligence by the percentage of correct answers in a test, a larger intercept would mean that even individuals with 0 (or average, if centered) intelligence would end up with a large percentage of correct answers, meaning our item is actually to easy. (Technically, you can always say that an item associated with a larger intercept is more difficult. However, the explicit interpretation can differ).

Variances section#

The Variances refer to the reliability of the items. Speaking in a ‘CFA-Language’, they represent the residuals (errors) associated with the items. In the last row, the variance of the latent variable is shown.