Scatterplots in ggplot2
Scatterplots are a great tool to visualize data. They can be used to understand the relationship between variables that may not otherwise be observed by just looking at the values in a spreadsheet. The following guide covers my appraoch to making scatter plots with the ggplot2 package.
In this guide, I’ll use data from the correlation walkthrough here. The data describes a hypothetical anxiety measure, exam test scores, and the number or hours spent studying to better understand factors that led to better grades.
library(tidyverse)
library(kableExtra)
library(here)
kable(head(data))
Code | Revise | Exam | Anxiety | Gender |
---|---|---|---|---|
1 | 4 | 40 | 86.298 | Male |
2 | 11 | 65 | 88.716 | Female |
3 | 27 | 80 | 70.178 | Male |
4 | 53 | 80 | 61.312 | Male |
5 | 4 | 40 | 89.522 | Male |
6 | 22 | 70 | 60.506 | Female |
Basic ggplot scatterplot
ggplot(data, aes(x = Anxiety, y = Exam)) +
geom_point()
Adding a linear regression line
ggplot(data, aes(x = Anxiety, y = Exam)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)
Visualize points by gender
ggplot(data, aes(x = Anxiety, y = Exam, color = Gender)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
Changing colors, backgrounds, and axes
colors = c( "#440154FF","#1565c0")
ggplot(data, aes(x = Anxiety, y = Exam, color = Gender)) +
geom_point(alpha = 0.7) +
scale_color_manual(values = colors) +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
theme(axis.line = element_line(color = "grey70"))