Solved – Estimating effect size for mean difference (Cohen’s d) in a structural equation model

I would like to ask for you help in the following matter. There is a structural equation model (SEM), which in one of its regression parts estimates the effect of gender and a few other factors on a latent variable E.

I would like to know whether the effect of gender is practically significant, obtaining something in the lines of Cohen's d. The material on effect size tests in SEM is surprisingly scarce. Having read this, I imagined it is possible to use the fact that gender is dichotomous, and to treat the unstandardized SEM regression coefficient as the difference between men and women. Let's say that coefficient is -0.224 (men as reference).

Standard deviation for the outcome E is 0.967 for the sample. Pooled SD for the sample is 0.964. The residual variation for E based on the SEM model is 0.781 (leading to the residual SD of 0.884).

The question is whether this is a proper strategy to obtain Cohen's d in this manner, i.e. through dividing the between-group difference (controlled for other factors) by the residual SD (which I guess would be more appropriate in this situation than the pooled sample SD)? More specifically, 0.224/0.884 = 0.253, indicating a small but still meaningful effect?

Or are there better methods to estimate the effect size in this case? Thank you for your comments and clarifications.

I think probably the best way to do this is with a multi-group SEM approach. Here's how I would go about it (you can find descriptions of this approach in many intro CFA/SEM texts, like Beaujean, 2014; Brown, 2015; and Little, 2013):

  1. Given that you are interested in comparing means between groups, I would consider identifying and setting the scale for E using the effects-coding approach (see Little, Slegers, & Card, 2006 for a description). This will ensure that E is scaled on the same metric as its original indicators, which might help with interpretation v. other methods of scale-setting. In a nutshell, the effects-coding method requires you to constrain loadings of E to average 1, and the observed intercepts to average 0.
  2. Fit a global measurement model of E, and ensure that it fits well by conventional standards (e.g., Hu & Bentler, 1999), since fit will only get worse once you move to multi-group evaluations.
  3. Using a multi-group approach, test for measurement invariance of E between your gender groups. Specifically, you need to ensure you have evidence of configural (i.e., same general pattern of factors/loadings), weak (i.e., equality of factor loadings) and strong invariance (i.e., equality of observed variable intercepts), in order to make valid inferences about comparisons of latent means (see Vandenberg & Lance, 2000, for a review). Consider both traditional $chi^2$ difference tests, and evaluating the magnitude of change in $CFI$ as means of conducting invariance-related nested model comparisons (Cheung & Rensvold, 2002).
  4. If you are just interested in estimating Cohen's $d$, and not evaluating whether the mean difference is significantly different from zero, you can skip this step. But I would also then fit a model constraining latent means to equality between gender groups, in order to test whether they are significantly different.
  5. Look back at the output from the strong invariance model, in which latent means were allowed to vary between groups. From this model, you now have an estimate of each group's latent mean and variance, which should be sufficient to calculate $d$ in the traditional manner. If you want to save yourself some calculations, you could use phantom variables (see Little, 2013, for a nice description of this technique) to estimate latent standard deviations for you, which you can then plug into your calculations for $d$.

The main perk of this approach, as I see it, is that it allows you to test the measurement invariance assumptions that are implicit in your comparison of group means. The approach you describe above, while generally reasonable, assumes that the construct E "means" the same thing for both genders, without actually testing whether it does or not. To me (though, just my opinion), I don't see the point in using an SEM approach to testing these sorts of hypotheses unless you are going to take full-advantage of the benefits the SEM approach provides you, so I'd strongly advocate for the steps I've described above.

If you're not used to testing measurement invariance, you should check out Beaujean's (2014) book, and the lavaan and semToolspackages for R, which make evaluating measurement invariance (and latent mean equivalence) between groups a breeze, though doing so with effects-coding requires a bit more coding work.


Beaujean, A. A. (2014). Latent Variable Modeling Using R: A Step-by-Step Guide. New York, NY: Routledge.

Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research (2nd ed.). New York, NY: Guilford Press.

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance. Structural Equation Modeling, 9, 233-255.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.

Little, T. D. (2013). Longitudinal Structural Equation Modeling. New York, NY: Guilford Press.

Little, T. D., Slegers, D. W., & Card, N. A. (2006). A Non-arbitrary Method of Identifying and Scaling Latent Variables in SEM and MACS Models. Structural Equation Modeling, 13, 59-72.

Vandenberg, R. J., & Lance, C. E. (2000). A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research. Organizational Research Methods, 3, 4-70.

Similar Posts:

Rate this post

Leave a Comment