## Cross tabulation: What’s the appropriate test?

Let’s start by defining what a cross tabulation or, more commonly called, crosstab isA cross tabulation, also known as contingency table, is a two dimensional table that reports the number of participants (i.e. observations) whose characteristics fall in each cell of the table. It is widely used in market research and also in scientific research. It represents the relationship between two categorical variables, which can be of a nominal or ordinal

nature. In a nominal variable categories can’t be ranked or ordered (e.g. gender) as opposed to the categories in an ordinal variable (e.g.  number of siblings).

The most widely used statistical test when you have a cross tabulation between two  categorical variables (nominal or ordinal)  is the Chi-square test or test for independence. The null hypothesis for this test is that the occurrence of both outcomes measured by the categorical variables is statistically independent. For example if we have a contingency table between gender (male, female) and smoking (non-smoker, occasional smoker, regular smoker), you could use the  chi-square test to  test the Ho:  There is no relationship or association between gender and smoking versus the Ha: There is an association between gender and smoking.

The Fisher’s exact test is useful when there are very low  frequencies (even zero in some cell) due to having a small sample size or a category with  rare occurrence  (e.g. a particular type of complication during surgery). In this case the Chi-square test would not be appropriate.

However, the Chi-square test will only tell you whether the relationship between two categorical variables is significant. There are other tests which you can use to measure not only the association but the strength of this association. Lambda, used with nominal variables, ranges from 0 (no relationship) to 1 (perfect association). You can interpret its score as a percentage of how much of one variable can be explained by knowing the values of the other. One potential problem with Lambda is that it has a tendency to underestimate the relationship, that’s why it is always recommendable to use it together with the Chi-square test.

When the two variables studied are ordinal the following tests will measure significance, strength and direction of the relationship: Gamma,  Sommer’s D, Kendall’s tau. Finally, there is one test that you need to use when you have before and after data. The McNemar test can be understood as a paired version of the Chi-square test which can only be  run for 2×2 tables. With this test you want to assess whether the outcome variable (e.g. acceptance of a new app)  has significantly changed between before and after an  experiment/intervention.  The McNemar test can be also extended for higher order tables, a.k.a. symmetry and marginal homogeneity tests.

I hope this post will have helped the audience to be more informed when it comes to analyzing contingency tables. 