## A simple comparison of box plots and violin plots

2015-09-15T15:27:01Z (GMT)

This figure compares box plots and violin plots (or cat-eye plots) for three different sample sizes of standard normal data (the four "groups" in each plot are independent samples; the box and violin plots use the same data).

Violin plots seem to be rapidly gaining popularity. I believe they are useful for large sample sizes (e.g. n>250 or ideally even larger), where the kernel density plots provide a reasonably accurate representation of the distributions, potentially showing nuances such as bimodality or other forms of non-normality that would be invisible or less clear in box plots.

However, I believe violin plots are potentially misleading for smaller sample sizes, where the density plots can appear to show interesting features (and group-differences therein) even when produced for standard normal data. For example, in the figure here, for n=64, there appear to be differences in kurtosis between groups 1 and 2, differences in skew between 2 and 3; for n=16, group 4 hints at bimodality.

Because box plots provide a coarser summary of the data, they seem safer to use with smaller samples. Though here, with n=64, it's still tempting to infer (false) differences between the groups visually, despite the overlapping notches.

For very small sample sizes, violin plots provide no direct indication that the sample is very small, worsening the above problem. In contrast, notched box plots show wide notches compared to their interquartile ranges with very small samples -- an ugliness that helpfully indicates that they might not be the best way to display very small samples (showing the individual data points and indicating only the mean and/or median seems better in such cases).

The above issues could be a particular concern in the situation where the sample sizes differ between groups  within a single plot (though this is not allowed in the cateye function used here, it is possible in MATLAB's boxplot and probably in other violin plot implementations). For example, here none of the four groups for n=16 look anywhere near as "normal" as the four groups for n=256.