No more loops

1 minute read

I like writing in R and often I do vizualisations using ggplot2. I often have a need to generate a multi-page PDF document from several disjoint groups in a single data frame. In dplyr groups are created using group_by() function. The data.table package provides special by-syntax. Grouping in R is analagous to grouping in SQL. After doing the computations it is often useful to visualize the result for each group separately. For that ggplot provides functions facet_grid and facet_wrap. Syntax looks as follows.

ggplot(df, aes(x, y)) + geom_line() + facet_wrap(~z)

Unfortunatelly, if the results are to be printed into a PDF document, all the plots are going to be on the same page. This is not always what I want. Previously I was creating a loop around ggplot command, selecting subset of rows for each dataframe and printing groups one by one. The code always looked ugly. But recently I discovered for myself a simple and neat way to get rid of the loop. If you are using dplyr, you can benefit from combining group_by, do and ggplot together.

The resulting code will look something like this:

p <- df %>%
    group_by(a, b, c) %>%
    do (
        plots = ggplot(data = .) +
            geom_line()
    )
print(p$plots)

The data frame is spilt into groups and then do applies a function to each group. The function do has to return a dataframe, that’s why one has to give a name to a series of plots. There is a variable ., created by dplyr, which represents the argument piped into to do. For example, to get a single column of data, one would need to write .$col. The last line applies print to the whole series of plots one by one.