In my previous post I introduced what mediation analysis means and how mediation is commonly tested. This post will show how to test a simple mediation model with one mediator using Stata and particularly using Structural Equation Modeling (SEM). In order to illustrate it, I will use the same previous example where job satisfaction mediates the relationship between work incentives and job performance.

The advantage of using SEM as opposed to using linear regression models in mediation analysis is that it allows you to define latent variables and that all paths are estimated simultaneously. For the sake of simplicity let’s first assume that all three variables are observed. The following command syntax will fit the mediation model above:

** ****sem (satisfaction<-incentives)(performance<-satisfaction incentives)**

** **The direct effect of work incentives on job performance is estimated by the coefficient for performance<-incentives. The indirect effect is estimated by the product of the coefficient for satisfaction<-incentives and the coefficient for performance<-satisfaction also known as *ab* product.

We have two options to estimate the indirect and total effects. We can use the **estat** **teffects** post estimation command or we can compute it ourselves using the **nlcom** command. The syntax to compute the indirect effect ourselves in this example is:

**nlcom b[satisfaction:incentives]*b[performance:satisfaction]**

** **and for the total effect is:

**nlcom b[satisfaction:incentives]*b[performance:satisfaction] +_b[performance:incentives]**

Now, imagine that our study variables are latent, in other words, they are measured via a list of observed items. Let’s assume that satisfaction is measured by 4 likert-scale questions: satis1 to satis4; work incentives is measured by 3 likert-scale questions: incen1 to incen3 and job performance is measured by 2 indicators of performance: perf1 and perf2.

If we would like to incorporate the measurement model into the SEM equations, as reflected in the above figure we would use the following syntax:

**sem (satis1 satis2 satis3 satis4 <-Satisfaction) // measurement piece**

(incen1 incen2 incen3<-Incentives)(perf1 perf2<-Performance) // measurement piece

(Satisfaction<-Incentives)(Performance<-Satisfaction Incentives) // structural piece

(incen1 incen2 incen3<-Incentives)(perf1 perf2<-Performance) // measurement piece

(Satisfaction<-Incentives)(Performance<-Satisfaction Incentives) // structural piece

The use of bootstrap is recommended when estimating the indirect effect since bootstrap provides more accurate estimates of the standard error and confidence intervals. I particularly, recommend using the percentile or bias-corrected intervals since they reflect the asymmetry in the sampling distribution of the indirect effect. In other words, they do not force the construction of symmetric CIs such as the normal-based CIs, which simply plug in the standard error obtained via bootstrap in the conventional normal CIs.

The following shows the syntax to create a program that will compute BC bootstrap intervals. The advantage of this program is that it can be adapted to a more complex model that you may have such as with two mediators.

**capture program drop bootm1**

**program bootm1, rclass**

sem (satisfaction<-incentives)(performance<-satisfaction incentives)

**return scalar indeff=b[satisfaction:incentives]*b[performance:satisfaction]**

**e****nd **

sem (satisfaction<-incentives)(performance<-satisfaction incentives)

**bootstrap r(indeff), reps(1000):bootm1**

**estat bootstrap, percentile bc**

Note that this post has only been dedicated to mediation analysis with one mediator. I have also assumed that all variables are measured at the same level. However, it is very frequent to have more than one mediator and to have data measured at different levels (e.g. employee nested within firm), which is also known as multilevel or clustered data. More to come…