This cookbook serves as a place to exchange frequently needed plot recipes using standardized data sets. In order to understand how the recipes work it is helpful to first make yourself familiar with the data sets used. Besides that we rely on a set of conventions to keep the code readable in this document. This document is a work in progress document and might contain mistakes.
The typical type of data the we are confronted with comes in the so called wide format. This means for each measurement we have a column. Each row is one observation (or in our case one participant). We use the tidyverse
package for data wrangling. If data wrangling is needed for visualization it should be included in the example.
+ NULL
line. This allows commenting out single lines (including the last line) without breaking the code.This cookbook uses several packages. The rwththeme
package requires an install from github. You can go to https://github.com/Sumidu/rwththeme to install it.
library(tidyverse)
library(rwththeme)
df <- read_rds("data.rds")
TODO: Provide an introduction to the dataset that is being used here.
TODO: Here some typical statistical tricks are shown.
Each plot has labels. The title
should contain a short sentence summarizing what the plot is showing. The subtitle
should contain what is actually shown (e.g., plot type, variables, etc.). The caption
should give additional information that might make the plot ambiguous to read. Additionally sources can be added here.
The bins of a histogram become more readable, when the border color is set to "white"
as in this example, as it delineates the plot from background.
df %>%
dplyr::select(age) %>%
ggplot() +
aes(age) +
geom_histogram(bins = 30, color = "white") +
labs(
title = "The sample is of a very young age",
subtitle = "Histogram of the age variable",
x = "Age",
y = "Frequency (absolute)",
caption = "Histogram using 30 bins. Source: hcictools"
) +
NULL
Sometimes to explore a whole set of variables it can be helpful to visualize the summary statistics.
df %>%
select(starts_with("robo")) %>% # pick all variables that start with robo
psych::describe() %>% # get summary statistics
as.data.frame() %>% # convert the result to a data.frame
rownames_to_column() %>% # convert the non-tidy rowname to a column
ggplot() +
aes(y = mean, x = rowname, ymin = mean - se, ymax = mean + se) +
geom_point() +
geom_errorbar(width = 0.5) +
scale_y_continuous( breaks = 1:6, limits = c(1,6)) +
coord_flip() +
labs(y = "Mean of the variable", x = "Variable", title = "Acceptance for robo_bed is highest",
subtitle = "Means of different scale items", caption = "Errorbar denotes standard error")+
NULL
df %>%
select(starts_with("robo")) %>% # pick all variables that start with robo
psych::describe() %>% # get summary statistics
as.data.frame() %>% # convert the result to a data.frame
rownames_to_column() %>% # convert the non-tidy rowname to a column
ggplot() +
aes(y = mean, x = rowname, ymin = mean - se * 1.97, ymax = mean + se * 1.97) +
geom_point() +
geom_errorbar(width = 0.5) +
scale_y_continuous( breaks = 1:6, limits = c(1,6)) +
coord_flip() +
labs(y = "Mean of the variable", x = "Variable", title = "Acceptance for robo_bed is highest",
subtitle = "Means of different scale items", caption = "Errorbar denotes 95% confidence interval")+
NULL
Simply put the assignment in parenthesis. This will also plot the output
(p <-
df %>%
ggplot() +
aes(x = age) +
geom_histogram(bins=30)
)