## Mediation Analysis: an introduction

Mediation is defined in statistics as a model that tries to explain the mechanism through which an independent variable (X) influences an outcome (Y) via the intervening effect of a third variable, known as mediator or mediating variable (M). It is important to keep in mind that this is a casual model, since it is hypothesized that the M influences Y and not vice versa. mediation diagram

Looking at the diagram above, the top half represents the unmediated causal relationship between X and Y. The path c is also known as total effect. In the bottom half of the diagram we can see that a third variable, the mediator, has been included to explain the relationship between X and Y in the sense that part or all of the effect of X on Y is channeled through M. The coefficient a indicates the effect of X on M, b indicates the effect of M on Y and c’ is the unique effect of X on Y after M has been controlled for. The latter is also known as direct effect. The indirect effect is defined as the product ab.

This can be illustrated with the following example. Having work incentives is expected to increase job performance. We can hypothesize that having work incentives increases job satisfaction and job satisfaction has a positive impact in job performance. Hence, job satisfaction mediates the relationship between work incentives and job performance.

Let’s write the equations represented by the diagram:

1) Y=cX+e

2) M=aX +e

3) Y=c’X+bM+e

Mediation can be tested via multiple regression or via Structural Equation Modeling (SEM). I suggest using the latter since it provides more accurate estimates since the equations are estimated simultaneously. Baron and Kenny (1986) proposed the following steps to establish mediation. Step 1 consists in showing that c, from equation 1, is statistically significantly different from zero, in other words there is a significant relationship between work incentives (X) and job performance (Y). This makes sense, since there is no point in talking about mediating a relationship that does not exist. Step 2 estimates path a from equation 2, in other words we need to establish that there is a significant relationship between X and M, or according to our example, we need to show that greater work incentives leads to increased job satisfaction. Step 3 consists in estimating path b, in equation 3, and testing that is significantly different from zero. This step establishes that there is a relationship between M and Y. Following our example, we need to proof that a more satisfied worker will have increased job performance. Finally, step 4 consists in testing whether c’ is statistically equal to zero, which will establish complete or full mediation. If steps 1 to 3 are met but we cannot prove that c’ equals zero then partial mediation is established. It is recommended to infer the total effect from summing up direct and indirect effects from equations 2 & 3 (c’+ab), rather than directly from c in equation 1.

More recently and after simulations by Hayes and Sharkow (2013), it is suggested to establish mediation by testing the null hypothesis that ab=0. Given that this test assumes that a and b are uncorrelated, it is recommended to use bootstrapping.

In my next post I will show how to test for mediation using Stata. 