Displaying Likert Style Responses
Surveys often contain responses to a given question in a Likert style format where the available responses are something like “Strongly Agree”, “Agree”, “Neutral”, “Disagree”, or “Strongly Disagree.” The following represents a “cheat-sheet” for using {ggplot} to display non-weighted Likert style survey responses in a bar chart format.
Packages
library(tidyverse)
library(gtsummary)
library(bstfun)
Bar charts for factored survey responses
sample_question_1.factor | sample_question_2.factor | sample_question_3.factor |
---|---|---|
Moderately important | Somewhat effective | Somewhat effective |
Extremely important | Somewhat effective | Very effective |
Extremely important | Very effective | Very effective |
Extremely important | Very effective | Very effective |
Extremely important | Very effective | Very effective |
Extremely important | Somewhat effective | Somewhat effective |
Bar chart with perecentage of factored responses relative to the whole sample
This version calculates and displays the percent of responses from the entire survey sample. The key for this type of chart is to set group = 1 in the aes()
call, set clip = “off” in the coord_flip()
layer in combination with setting limits in the scale_y_continuous()
layer to prevent the percent labels from getting clipped. The legend and x label (which is actually on the y axis as display because of coord_flip()
have been removed for simplicity. Also, the theme()
axis.text.x setting is in place in case the x-axis ticks need to be rotated to prevent overplotting.
Requires:
- One survey question with factored and ordered responses
data %>%
drop_na(sample_question_1.factor) %>%
ggplot(., aes(sample_question_1.factor, group = 1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), position = position_dodge()) +
geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", size = 3,
hjust = -.15, colour = "black") +
coord_flip(clip = "off") +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) + # limits extends the chart to prevent clipping
labs(y = "Percent", x = "") +
theme_minimal() +
theme(legend.position = "top", axis.text.x = element_text(angle = -0, hjust = 0)) +
guides(fill = "none")
Faceted bar chart faceted by a grouping variable
This version is an extension of basic bar chart, but adds the facet_wrap()
layer to display responses to a grouping variable. In this example, the survey responses were collected from various “sites” that can be displayed separately. This option can be useful if the number of grouping variables is small. The two different geom_bar()
layers control which variable to fill with color, either the response or the grouping variable (site).
Requires:
- One survey question with factored and ordered responses
- One categorical grouping variable such as (gender, site, age groups e.g. young/old, etc.)
data %>%
drop_na(sample_question_1.factor, site) %>%
ggplot(., aes(sample_question_1.factor, group = 1)) +
#geom_bar(aes(y = ..prop.., fill = site), position = position_dodge()) + # Applies fill to site
geom_bar(aes(y = ..prop.., fill = factor(..x..)), position = position_dodge()) + # Applies fill to response
geom_text(aes(label = scales::percent(round(..prop..,2)), y= ..prop.. ), stat= "count", size = 3,
hjust = -.15, colour = "black") +
coord_flip(clip = "off") +
scale_y_continuous(labels = scales::percent) +
labs(y = "Percent", x = "") +
theme_minimal() +
theme(legend.position = "top", axis.text.x = element_text(angle = -0, hjust = 0)) +
guides(fill = "none") +
facet_wrap(~ site, ncol = 2)
Bar chart with perecentage of factored responses relative to a grouping variable
This style displays the same information as the faceted bar chart above, but places all of the bars in one panel. Again, this style works best when the number of values in a grouping variable is small to prevent over crowding each x-axis tick.
Requires:
- One survey question with factored and ordered responses
- One categorical grouping variable
data %>%
drop_na(sample_question_1.factor, site) %>%
ggplot(., aes(sample_question_1.factor, fill = site)) +
geom_bar(aes(y=..count../tapply(..count.., ..fill.. ,sum)[..fill..]), position="dodge2") +
geom_text(aes(y=..count../tapply(..count.., ..fill.. ,sum)[..fill..], label = scales::percent(round(..count../tapply(..count.., ..fill.. ,sum)[..fill..],2))), stat="count", position=position_dodge(1), hjust=-0.15, size = 3) +
coord_flip(clip = "off") +
scale_y_continuous(labels = scales::percent) +
theme_minimal() +
theme(legend.position = "top") +
guides(fill = guide_legend(title = "Site")) +
labs(y = "Percent", x = "")
Bar chart faceted by related survey items
Rather than faceting by a grouping variable, this style of chart will create multiple panels for multiple survey items. The key to this approach is to select the columns that are to be displayed and then converting the data to long format. Once the data are in long format, the remaining percentage calculations can be accomplished within the geom_bar()
layer and displayed by the geom_text()
layer. This style can be useful for displaying related survey items that a reader may want to compare.
Requires:
- At least two survey questions with the same factored and ordered responses
- One categorical grouping variable
- Data arranged in long format
participant | question | response |
---|---|---|
1 | sample_question_2.factor | Somewhat effective |
1 | sample_question_3.factor | Somewhat effective |
2 | sample_question_2.factor | Somewhat effective |
2 | sample_question_3.factor | Very effective |
3 | sample_question_2.factor | Very effective |
3 | sample_question_3.factor | Very effective |
data %>%
select(sample_question_2.factor:sample_question_3.factor) %>%
pivot_longer(cols = everything(),
names_to = "question",
values_to = "response") %>%
drop_na() %>%
ggplot(., aes(response, group = 1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), position = position_dodge()) +
geom_text(aes(label = scales::percent(..prop.., accuracy = 0.1L), y= ..prop.. ), stat= "count", size = 3,
hjust = -.15, colour = "black") +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
coord_flip() +
theme_minimal() +
guides(fill = "none") +
labs(y = "Percent", x = "") +
facet_wrap(~question)
Responses in this type of char can be double checked with a call to tbl_likert()
from the {bstfun} package. However, note that the rounding between tbl_likert()
and the ggplot call will be slightly off due to rounding error.
data %>%
select(sample_question_2.factor:sample_question_3.factor) %>%
tbl_likert(digits = list(everything() ~ 1))
Characteristic | Not effective1 | Somewhat effective1 | Very effective1 | Not sure1 |
---|---|---|---|---|
sample_question_2.factor | 2.0 (1.4%) | 58.0 (40.8%) | 74.0 (52.1%) | 8.0 (5.6%) |
sample_question_3.factor | 2.0 (1.4%) | 54.0 (37.8%) | 83.0 (58.0%) | 4.0 (2.8%) |
1 n (%) |
Bar chart with grouping variable and faceted by related survey item
Building off of the previous chart, this style will facet two or more survey items and include the percent of responses relative to the grouping variable. Like the preceding chart, a key to this approach is convert the columns of interest into long format. From there, the next step is to group by all variables to count the number of responses. Next, we want to ungroup and drop the NAs followed by grouping again by the item (question) and grouping variable (site). Next, are two instances of using the mutate verb. One is to calculate the numerical percentage and the other is to create the label to display on top of the bar.
Requires:
- At least two survey questions with the same factored and ordered responses
- One categorical grouping variable
- Data arranged in long format
data %>%
select(sample_question_2.factor:sample_question_3.factor, site) %>%
pivot_longer(cols = sample_question_2.factor:sample_question_3.factor,
names_to = "question",
values_to = "response") %>%
group_by(response, site, question) %>%
summarise(freq = n(), .groups = "drop") %>%
ungroup() %>%
drop_na() %>%
group_by(question, site) %>%
mutate(prop = round(freq/sum(freq, na.rm = T),3) * 100) %>% # Get the % to plot
mutate(prop_label = scales::percent(freq/sum(freq, na.rm = T), accuracy = 0.1L)) %>% # Get the % label
ggplot(., aes(x = response, y = prop, fill = site, label = prop_label)) +
geom_col(position = "dodge2") +
geom_text(position = position_dodge(.9), size = 3, hjust = -.1) +
coord_flip(clip = "off") +
scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 100)) +
theme_minimal() +
guides(fill = guide_legend(title = "Site")) +
theme(legend.position = "top", axis.text.x = element_text(angle = -0, hjust = 0)) +
labs(y = "Percent", x = "") +
facet_wrap(~question)
To display a double check our work, we can use the tbl_summary()
function from the {gtsummary} package. As noted before, the values may be slightly off due to rounding error.
data %>%
drop_na(site) %>%
select(sample_question_2.factor:sample_question_3.factor, site) %>%
tbl_summary(by = "site",
digits = all_categorical() ~ 1)
Characteristic | Albuquerque, N = 521 | San Diego, N = 231 | Denver, N = 351 | El Paso, N = 171 | Los Angeles, N = 151 |
---|---|---|---|---|---|
sample_question_2.factor | |||||
Not effective | 0.0 (0.0%) | 1.0 (4.3%) | 1.0 (3.0%) | 0.0 (0.0%) | 0.0 (0.0%) |
Somewhat effective | 21.0 (40.4%) | 13.0 (56.5%) | 11.0 (33.3%) | 8.0 (47.1%) | 4.0 (26.7%) |
Very effective | 29.0 (55.8%) | 6.0 (26.1%) | 20.0 (60.6%) | 7.0 (41.2%) | 11.0 (73.3%) |
Not sure | 2.0 (3.8%) | 3.0 (13.0%) | 1.0 (3.0%) | 2.0 (11.8%) | 0.0 (0.0%) |
Unknown | 0 | 0 | 2 | 0 | 0 |
sample_question_3.factor | |||||
Not effective | 0.0 (0.0%) | 1.0 (4.3%) | 1.0 (2.9%) | 0.0 (0.0%) | 0.0 (0.0%) |
Somewhat effective | 25.0 (48.1%) | 12.0 (52.2%) | 9.0 (26.5%) | 6.0 (35.3%) | 2.0 (13.3%) |
Very effective | 27.0 (51.9%) | 9.0 (39.1%) | 22.0 (64.7%) | 10.0 (58.8%) | 13.0 (86.7%) |
Not sure | 0.0 (0.0%) | 1.0 (4.3%) | 2.0 (5.9%) | 1.0 (5.9%) | 0.0 (0.0%) |
Unknown | 0 | 0 | 1 | 0 | 0 |
1 n (%) |
Bar charts for numeric survey responses
Bar chart with means of numeric responses
In some cases, the responses of survey items may be represented by integers (i.e. 1, 2, 3, 4, 5) and it may be useful to plot the means of these responses. The key to this style of plot is to summarize each question into its respective mean before arranging the data into long format (if more than one question).
Requires:
- One or more survey questions with numerical responses
- Each question summarized to a mean
- Data arranged in long format if more than one question
sample_question_1.integer | sample_question_2.integer | sample_question_3.integer | sample_question_4.integer |
---|---|---|---|
3 | 4 | 4 | 3 |
4 | 5 | 5 | 5 |
3 | 4 | 4 | 4 |
3 | 4 | 4 | 3 |
3 | 3 | 4 | 3 |
4 | 3 | 3 | 3 |
data %>%
select(sample_question_1.integer:sample_question_4.integer) %>%
summarise(across(everything(), ~ mean(.x, na.rm = TRUE))) %>%
pivot_longer(cols = sample_question_1.integer:sample_question_4.integer,
names_to = "question",
values_to = "mean") %>%
ggplot(., aes(x = factor(question, levels = rev(question)), y = mean, fill = question)) +
geom_col() +
geom_text(aes(label = round(mean,2)), hjust = -.3, size = 3) +
geom_hline(aes(yintercept = mean(mean)), color = "black", linetype = "dotted") +
coord_flip() +
scale_y_continuous(limits = c(0,5), breaks = scales::breaks_pretty(11)) +
theme_minimal() +
guides(fill = "none") +
labs(y = "Survey item mean", x = "")
data %>%
select(sample_question_1.integer:sample_question_4.integer) %>%
summarise(across(everything(), ~ mean(.x, na.rm = TRUE))) %>%
pivot_longer(cols = sample_question_1.integer:sample_question_4.integer,
names_to = "question",
values_to = "mean")
question | mean |
---|---|
sample_question_1.integer | 3.152778 |
sample_question_2.integer | 3.326389 |
sample_question_3.integer | 3.496552 |
sample_question_4.integer | 2.439716 |