Displaying Likert Style Responses

Surveys often contain responses to a given question in a Likert style format where the available responses are something like “Strongly Agree”, “Agree”, “Neutral”, “Disagree”, or “Strongly Disagree.” The following represents a “cheat-sheet” for using {ggplot} to display non-weighted Likert style survey responses in a bar chart format.

Packages

library(tidyverse)
library(gtsummary)
library(bstfun)

Bar charts for factored survey responses

Table 1: Example survey responses.
sample_question_1.factorsample_question_2.factorsample_question_3.factor
Moderately importantSomewhat effectiveSomewhat effective
Extremely importantSomewhat effectiveVery effective
Extremely importantVery effectiveVery effective
Extremely importantVery effectiveVery effective
Extremely importantVery effectiveVery effective
Extremely importantSomewhat effectiveSomewhat effective

Bar chart with perecentage of factored responses relative to the whole sample

This version calculates and displays the percent of responses from the entire survey sample. The key for this type of chart is to set group = 1 in the aes() call, set clip = “off” in the coord_flip() layer in combination with setting limits in the scale_y_continuous() layer to prevent the percent labels from getting clipped. The legend and x label (which is actually on the y axis as display because of coord_flip() have been removed for simplicity. Also, the theme() axis.text.x setting is in place in case the x-axis ticks need to be rotated to prevent overplotting. Requires:

  • One survey question with factored and ordered responses
data %>%
  drop_na(sample_question_1.factor) %>%
  ggplot(., aes(sample_question_1.factor, group = 1)) +
  geom_bar(aes(y = ..prop.., fill = factor(..x..)), position = position_dodge()) +
  geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", size = 3,
              hjust = -.15, colour = "black") +
  coord_flip(clip = "off") +
  scale_y_continuous(labels = scales::percent, limits = c(0, 1)) + # limits extends the chart to prevent clipping 
  labs(y = "Percent", x = "") +
  theme_minimal() +
  theme(legend.position = "top", axis.text.x = element_text(angle = -0, hjust = 0)) +
  guides(fill = "none")

Faceted bar chart faceted by a grouping variable

This version is an extension of basic bar chart, but adds the facet_wrap() layer to display responses to a grouping variable. In this example, the survey responses were collected from various “sites” that can be displayed separately. This option can be useful if the number of grouping variables is small. The two different geom_bar() layers control which variable to fill with color, either the response or the grouping variable (site).

Requires:

  • One survey question with factored and ordered responses
  • One categorical grouping variable such as (gender, site, age groups e.g. young/old, etc.)
data %>%
  drop_na(sample_question_1.factor, site) %>%
  ggplot(., aes(sample_question_1.factor, group = 1)) +
  #geom_bar(aes(y = ..prop.., fill = site), position = position_dodge()) + # Applies fill to site
  geom_bar(aes(y = ..prop.., fill = factor(..x..)), position = position_dodge()) + # Applies fill to response
  geom_text(aes(label = scales::percent(round(..prop..,2)), y= ..prop.. ), stat= "count", size = 3,
              hjust = -.15, colour = "black") +
  coord_flip(clip = "off") +
  scale_y_continuous(labels = scales::percent) +
  labs(y = "Percent", x = "") +
  theme_minimal() +
  theme(legend.position = "top", axis.text.x = element_text(angle = -0, hjust = 0)) +
  guides(fill = "none") +
  facet_wrap(~ site, ncol = 2)

Bar chart with perecentage of factored responses relative to a grouping variable

This style displays the same information as the faceted bar chart above, but places all of the bars in one panel. Again, this style works best when the number of values in a grouping variable is small to prevent over crowding each x-axis tick.

Requires:

  • One survey question with factored and ordered responses
  • One categorical grouping variable
data %>%
  drop_na(sample_question_1.factor, site) %>%
  ggplot(., aes(sample_question_1.factor, fill = site)) +
  geom_bar(aes(y=..count../tapply(..count.., ..fill.. ,sum)[..fill..]), position="dodge2") +
  geom_text(aes(y=..count../tapply(..count.., ..fill.. ,sum)[..fill..], label = scales::percent(round(..count../tapply(..count.., ..fill.. ,sum)[..fill..],2))), stat="count", position=position_dodge(1), hjust=-0.15, size = 3) +
  coord_flip(clip = "off") +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal() +
  theme(legend.position = "top") +
  guides(fill = guide_legend(title = "Site")) +
  labs(y = "Percent", x = "")

Rather than faceting by a grouping variable, this style of chart will create multiple panels for multiple survey items. The key to this approach is to select the columns that are to be displayed and then converting the data to long format. Once the data are in long format, the remaining percentage calculations can be accomplished within the geom_bar() layer and displayed by the geom_text() layer. This style can be useful for displaying related survey items that a reader may want to compare.

Requires:

  • At least two survey questions with the same factored and ordered responses
  • One categorical grouping variable
  • Data arranged in long format
Table 2: Example long format data.
participantquestionresponse
1sample_question_2.factorSomewhat effective
1sample_question_3.factorSomewhat effective
2sample_question_2.factorSomewhat effective
2sample_question_3.factorVery effective
3sample_question_2.factorVery effective
3sample_question_3.factorVery effective
data %>% 
  select(sample_question_2.factor:sample_question_3.factor) %>%
  pivot_longer(cols = everything(),
               names_to = "question",
               values_to = "response") %>%
  drop_na() %>%
  ggplot(., aes(response, group = 1)) +
  geom_bar(aes(y = ..prop.., fill = factor(..x..)), position = position_dodge()) +
  geom_text(aes(label = scales::percent(..prop.., accuracy = 0.1L), y= ..prop.. ), stat= "count", size = 3,
              hjust = -.15, colour = "black") +
  scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
  coord_flip() +
  theme_minimal() +
  guides(fill = "none") +
  labs(y = "Percent", x = "") +
  facet_wrap(~question)

Responses in this type of char can be double checked with a call to tbl_likert() from the {bstfun} package. However, note that the rounding between tbl_likert() and the ggplot call will be slightly off due to rounding error.

data %>% 
  select(sample_question_2.factor:sample_question_3.factor) %>%
  tbl_likert(digits = list(everything() ~ 1))

CharacteristicNot effective1Somewhat effective1Very effective1Not sure1
sample_question_2.factor2.0 (1.4%)58.0 (40.8%)74.0 (52.1%)8.0 (5.6%)
sample_question_3.factor2.0 (1.4%)54.0 (37.8%)83.0 (58.0%)4.0 (2.8%)
1 n (%)

Building off of the previous chart, this style will facet two or more survey items and include the percent of responses relative to the grouping variable. Like the preceding chart, a key to this approach is convert the columns of interest into long format. From there, the next step is to group by all variables to count the number of responses. Next, we want to ungroup and drop the NAs followed by grouping again by the item (question) and grouping variable (site). Next, are two instances of using the mutate verb. One is to calculate the numerical percentage and the other is to create the label to display on top of the bar.

Requires:

  • At least two survey questions with the same factored and ordered responses
  • One categorical grouping variable
  • Data arranged in long format
data %>% 
  select(sample_question_2.factor:sample_question_3.factor, site) %>%
  pivot_longer(cols = sample_question_2.factor:sample_question_3.factor,
               names_to = "question",
               values_to = "response") %>%
  group_by(response, site, question) %>%
  summarise(freq = n(), .groups = "drop") %>%
  ungroup() %>%
  drop_na() %>%
  group_by(question, site) %>%
  mutate(prop = round(freq/sum(freq, na.rm = T),3) * 100) %>% # Get the % to plot
  mutate(prop_label = scales::percent(freq/sum(freq, na.rm = T), accuracy = 0.1L)) %>% # Get the % label
  ggplot(., aes(x = response, y = prop, fill = site, label = prop_label)) +
  geom_col(position = "dodge2") +
  geom_text(position = position_dodge(.9), size = 3, hjust = -.1) +
  coord_flip(clip = "off") +
  scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 100)) +
  theme_minimal() +
  guides(fill = guide_legend(title = "Site")) +
  theme(legend.position = "top", axis.text.x = element_text(angle = -0, hjust = 0)) +
  labs(y = "Percent", x = "") +
  facet_wrap(~question)

To display a double check our work, we can use the tbl_summary() function from the {gtsummary} package. As noted before, the values may be slightly off due to rounding error.

data %>% 
  drop_na(site) %>%
  select(sample_question_2.factor:sample_question_3.factor, site) %>%
  tbl_summary(by = "site",
              digits = all_categorical() ~ 1)

CharacteristicAlbuquerque, N = 521San Diego, N = 231Denver, N = 351El Paso, N = 171Los Angeles, N = 151
sample_question_2.factor
    Not effective0.0 (0.0%)1.0 (4.3%)1.0 (3.0%)0.0 (0.0%)0.0 (0.0%)
    Somewhat effective21.0 (40.4%)13.0 (56.5%)11.0 (33.3%)8.0 (47.1%)4.0 (26.7%)
    Very effective29.0 (55.8%)6.0 (26.1%)20.0 (60.6%)7.0 (41.2%)11.0 (73.3%)
    Not sure2.0 (3.8%)3.0 (13.0%)1.0 (3.0%)2.0 (11.8%)0.0 (0.0%)
    Unknown00200
sample_question_3.factor
    Not effective0.0 (0.0%)1.0 (4.3%)1.0 (2.9%)0.0 (0.0%)0.0 (0.0%)
    Somewhat effective25.0 (48.1%)12.0 (52.2%)9.0 (26.5%)6.0 (35.3%)2.0 (13.3%)
    Very effective27.0 (51.9%)9.0 (39.1%)22.0 (64.7%)10.0 (58.8%)13.0 (86.7%)
    Not sure0.0 (0.0%)1.0 (4.3%)2.0 (5.9%)1.0 (5.9%)0.0 (0.0%)
    Unknown00100
1 n (%)

Bar charts for numeric survey responses

Bar chart with means of numeric responses

In some cases, the responses of survey items may be represented by integers (i.e. 1, 2, 3, 4, 5) and it may be useful to plot the means of these responses. The key to this style of plot is to summarize each question into its respective mean before arranging the data into long format (if more than one question).

Requires:

  • One or more survey questions with numerical responses
  • Each question summarized to a mean
  • Data arranged in long format if more than one question
Table 3: Example survey responses.
sample_question_1.integersample_question_2.integersample_question_3.integersample_question_4.integer
3443
4555
3444
3443
3343
4333
data %>% 
  select(sample_question_1.integer:sample_question_4.integer) %>%
  summarise(across(everything(), ~ mean(.x, na.rm = TRUE))) %>%
  pivot_longer(cols = sample_question_1.integer:sample_question_4.integer,
               names_to = "question",
               values_to = "mean") %>%
  ggplot(., aes(x = factor(question, levels = rev(question)), y = mean, fill = question)) +
  geom_col() +
  geom_text(aes(label = round(mean,2)), hjust = -.3, size = 3) +
  geom_hline(aes(yintercept = mean(mean)), color = "black", linetype = "dotted") +
  coord_flip() +
  scale_y_continuous(limits = c(0,5), breaks = scales::breaks_pretty(11)) +
  theme_minimal() +
  guides(fill = "none") +
  labs(y = "Survey item mean", x = "")

data %>% 
  select(sample_question_1.integer:sample_question_4.integer) %>%
  summarise(across(everything(), ~ mean(.x, na.rm = TRUE))) %>%
  pivot_longer(cols = sample_question_1.integer:sample_question_4.integer,
               names_to = "question",
               values_to = "mean")
Table 4: Example summarized responses.
questionmean
sample_question_1.integer3.152778
sample_question_2.integer3.326389
sample_question_3.integer3.496552
sample_question_4.integer2.439716
Previous
Next