1 What is this cookbook?

This cookbook serves as a place to exchange frequently needed plot recipes using standardized data sets. In order to understand how the recipes work it is helpful to first make yourself familiar with the data sets used. Besides that we rely on a set of conventions to keep the code readable in this document. This document is a work in progress document and might contain mistakes.

1.1 Code conventions

The typical type of data the we are confronted with comes in the so called wide format. This means for each measurement we have a column. Each row is one observation (or in our case one participant). We use the tidyverse package for data wrangling. If data wrangling is needed for visualization it should be included in the example.

Include data wrangling, if needed for visualization.
Select the variables needed in the ggplot call.
Format the code using the format code-tool in RStudio. Then readjust the parameter assignments in functions to one additional tab.
End all ggplot pipes in a + NULL line. This allows commenting out single lines (including the last line) without breaking the code.

1.2 Required Libraries

This cookbook uses several packages. The rwththeme package requires an install from github. You can go to https://github.com/Sumidu/rwththeme to install it.

library(tidyverse)
library(rwththeme)

df <- read_rds("data.rds")

1.3 Data Familiarization

TODO: Provide an introduction to the dataset that is being used here.

2 Data Manipulation Techniques

3 Statistical Recipes

TODO: Here some typical statistical tricks are shown.

4 Basic Plotting Recipes

4.1 Data Overview

4.1.1 Creating a basic plot. Example: Histogram

Each plot has labels. The title should contain a short sentence summarizing what the plot is showing. The subtitle should contain what is actually shown (e.g., plot type, variables, etc.). The caption should give additional information that might make the plot ambiguous to read. Additionally sources can be added here.

The bins of a histogram become more readable, when the border color is set to "white" as in this example, as it delineates the plot from background.

df %>%
  dplyr::select(age) %>%
  ggplot() +
  aes(age) +
  geom_histogram(bins = 30, color = "white") +
  labs(
    title = "The sample is of a very young age",
    subtitle = "Histogram of the age variable",
    x = "Age",
    y = "Frequency (absolute)",
    caption = "Histogram using 30 bins. Source: hcictools"
  ) +
  NULL

4.1.2 Creating a chart to show basic summary statistics (SE Version)

Sometimes to explore a whole set of variables it can be helpful to visualize the summary statistics.

df %>% 
  select(starts_with("robo")) %>%    # pick all variables that start with robo
  psych::describe() %>%              # get summary statistics
  as.data.frame() %>%                # convert the result to a data.frame
  rownames_to_column() %>%           # convert the non-tidy rowname to a column
  ggplot() +
  aes(y = mean, x = rowname, ymin = mean - se, ymax = mean + se) +
  geom_point() +
  geom_errorbar(width = 0.5) +
  scale_y_continuous( breaks = 1:6, limits = c(1,6)) +
  coord_flip() +
  labs(y = "Mean of the variable", x = "Variable", title = "Acceptance for robo_bed is highest", 
       subtitle = "Means of different scale items", caption = "Errorbar denotes standard error")+
  NULL

4.1.3 Creating a chart to show basic summary statistics (CI Version)

df %>% 
  select(starts_with("robo")) %>%    # pick all variables that start with robo
  psych::describe() %>%              # get summary statistics
  as.data.frame() %>%                # convert the result to a data.frame
  rownames_to_column() %>%           # convert the non-tidy rowname to a column
  ggplot() +
  aes(y = mean, x = rowname, ymin = mean - se * 1.97, ymax = mean + se * 1.97) +
  geom_point() +
  geom_errorbar(width = 0.5) +
  scale_y_continuous( breaks = 1:6, limits = c(1,6)) +
  coord_flip() +
  labs(y = "Mean of the variable", x = "Variable", title = "Acceptance for robo_bed is highest", 
       subtitle = "Means of different scale items", caption = "Errorbar denotes 95% confidence interval")+
  NULL

5 Advanced Plotting recipes

5.1 Radarplot

6 Tipps

6.1 Assigning a plot to a variable and showing it still

Simply put the assignment in parenthesis. This will also plot the output

(p <- 
   df %>% 
    ggplot() +
    aes(x = age) + 
    geom_histogram(bins=30)
)

RWTH Cookbook for R

André Calero Valdez

Last updated: 2019-04-10