The effect size provides an estimate of the *magnitude of change*. For example, in a research study that estimates the efficacy of a new intervention to improve post surgery recovery after hip replacement, the effect size is a measure of how big is the difference between the treatment and the control or placebo groups, in other words how much larger is the effect of the new intervention versus the standard of care.

Reporting the effect size is as important or more than reporting the p-value. What’s more, a significant p-value alone may not be relevant because it may be associated to a very small effect size or magnitude of change. This can happen when a study has a very large sample size. When working with a sample of thousands of observations your study is well powered to detect even the smaller differences. Under this type of scenario we can find a very significant p-value corresponding to a very small change that is not clinically relevant. It is the job of the researcher or clinician to establish a priori what is considered to be a substantive change, a change that in our example will have a significant impact in a patient’s life.

Going back to the example, imagine that the average recovery times for treatment and control group are found to be 30 and 21 days respectively; this represents a 30% reduction in recovery time when comparing treatment to control. Now, compare this result with the hypothetic finding were the mean difference between both groups is 1 day only, or equivalently a 3% reduction between treatment and control. Both changes or effect sizes may be accompanied by significant p-values, provided we have a large sample. However, a 30% reduction has a much bigger impact in the patient quality of life and the health care related cost.

When talking about effect size it is important to differentiate between *absolute effect size* or raw effect size and *standardized effect size*. In our example, the difference between the average recovery time in both groups (a reduction of 9 days) is an absolute effect size, because it is measures in real units (i.e. days). However, sometimes the study outcome is measured in units that are not as easy to interpret, such as a Likert scale or a composite from a psychological test. In these circumstances it is better to transform the mean difference by dividing it by the standard deviation of the pooled sample. This is also known as the *Cohen’s d* and it is a *standardized effect size*. The main advantage of working with a standardized measure is that there is usually a convention established that allows you to determine the strength of an effect size. Cohen (1988) offers the following interpretation for the standardized mean difference: .8 is considered large, .5 moderate and .2 small. Another advantage of working with standardized effect sizes is that they allow you to compare different studies, which is very useful when conducting meta-analysis.

In this post we have focused on mean difference as a measure of effect size. Other common standardized measures of effect are the correlation between two variables ( *r* ), the odds ratio (*OR*) and the eta-squared ( *h** ^{2}*). This, I leave it for the reader to investigate…