Scatterplots in ggplot2

Scatterplots are a great tool to visualize data. They can be used to understand the relationship between variables that may not otherwise be observed by just looking at the values in a spreadsheet. The following guide covers my appraoch to making scatter plots with the ggplot2 package.

In this guide, I’ll use data from the correlation walkthrough here. The data describes a hypothetical anxiety measure, exam test scores, and the number or hours spent studying to better understand factors that led to better grades.

library(tidyverse)
library(kableExtra)
library(here)
kable(head(data))
CodeReviseExamAnxietyGender
144086.298Male
2116588.716Female
3278070.178Male
4538061.312Male
544089.522Male
6227060.506Female

Basic ggplot scatterplot

ggplot(data, aes(x = Anxiety, y = Exam)) +
  geom_point()

Adding a linear regression line

ggplot(data, aes(x = Anxiety, y = Exam)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)

Visualize points by gender

ggplot(data, aes(x = Anxiety, y = Exam, color = Gender)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Changing colors, backgrounds, and axes

colors = c( "#440154FF","#1565c0")

ggplot(data, aes(x = Anxiety, y = Exam, color = Gender)) +
  geom_point(alpha = 0.7) +
  scale_color_manual(values = colors) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal() +
  theme(axis.line = element_line(color = "grey70"))

Previous