This post demonstrates how small multiples can be used to highlight different parts of a distribution. In principle, ggplot2 offers many, easy to use options that partial out different groups in your data. The aesthetics color, size, and shape come to mind. Moreover, the purpose of small multiples is to show the same (bivariate) association across different groups in your data. Notwithstanding, either approach has drawbacks. When mapping categories to an aesthetic like color, all groups remain on the same canvas. The result may be wanting, especially when you work with big datasets. Small multiples, in contrast, draw out each group but deemphasize the grand picture. Wouldn’t it be nice to find some middle ground?
We are going to work with the diamonds data which is available from the ggplot2 package. The goal is to highlight each cut in a scatter plot of price against carat without falling into either of the extremes mentioned above. Here is what the data look like:
rm(list = ls()) library("tidyverse") data(diamonds) select(diamonds, price, carat, cut)
## # A tibble: 53,940 x 3 ## price carat cut ## <int> <dbl> <ord> ## 1 326 0.23 Ideal ## 2 326 0.21 Premium ## 3 327 0.23 Good ## 4 334 0.290 Premium ## 5 335 0.31 Good ## 6 336 0.24 Very Good ## 7 336 0.24 Very Good ## 8 337 0.26 Very Good ## 9 337 0.22 Fair ## 10 338 0.23 Very Good ## # … with 53,930 more rows
As always, smart layering is the answer. We are going to plot the diamonds data twice using different colors: Once for all diamonds in the data, and once for each cut. The code also includes minor finishing touches (opacity and color).
alpha_lvl <- .4 ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point( data = select(diamonds, -cut), # Dropping <cut> plots all our data. colour = "#3288bd", alpha = alpha_lvl ) + geom_point(colour = "#d53e4f", alpha = alpha_lvl) + scale_y_log10() + facet_wrap(vars(cut))
This post demonstrates how small multiples can highlight different segments within a distribution without losing sight of its overall shape. The key is smart layering. Plot the data twice: Once ignore and once highlight your facets.