Correlation in R Pt. 2 - Comparing Correlation Coefficients

In this second part, we continue to examine the Exam Anxiety dataset to look at the separate correlations between the variables of interest in males and females. We then proceed to test if there is a difference between those two correlations.

Plot by Gender

We can plot the data by Gender to see if the relationship between pre-test anxiety and exam scores are different between genders. This is accomplished by adding the color = Gender argument to the aes() portion of the ggplot call.

colors = c( "#440154FF","#1565c0")
ggplot(data, aes(x = Anxiety, y = Exam, color = Gender)) +
  geom_point(alpha = 0.7) +
  scale_color_manual(values = colors) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal() +
  theme(axis.line = element_line(color = "grey70"))

Correlations by Gender

The group_by() function is compatible with the rstatix cor_test() to produce two separate correlations, one ofr males and one for females. We simply pipe data to group_by() and then pipe that output to cor_test().

correlations <- data %>% 
  group_by(Gender) %>%  
  cor_test(.,
           vars = c("Anxiety"),
           vars2 = c("Exam", "Revise"),
           method = "pearson",
           use = "pairwise.complete.obs")
kable(correlations)
Gendervar1var2corstatisticpconf.lowconf.highmethod
FemaleAnxietyExam-0.38-2.8879755.76e-03-0.5944857-0.1182269Pearson
MaleAnxietyExam-0.51-4.1447561.31e-04-0.6841707-0.2700554Pearson
FemaleAnxietyRevise-0.82-10.0799940.00e+00-0.8944820-0.7054746Pearson
MaleAnxietyRevise-0.60-5.2670882.90e-06-0.7482821-0.3876660Pearson

Compare correlation values between genders

Filed, Miles, and Field provide the code for a function to compare two correlation values1. This function takes in as input the correlation and sample size for each group. Entering our values into the function results in a z-difference score of -.80 and p-value of .21 which suggests there is no statistical difference between the correlations between Anxiety and Exam scores in females and males.

kable(data %>% count(Gender))
Gendern
Female51
Male52
zdifference <- function(r1, r2, n1, n2){
  zd <- (atanh(r1)-atanh(r2))/sqrt(1/(n1-3)+1/(n2-3))
	p <- 1 - pnorm(abs(zd))
	print(paste("Z Difference: ", zd))
	print(paste("One-Tailed P-Value: ", p))
	print(paste("Two-Tailed P-Value: ", (2*p)))
	}
	
zdifference(-0.51, -0.38, 52, 51)
## [1] "Z Difference:  -0.801014866962209"
## [1] "One-Tailed P-Value:  0.211561519563112"
## [1] "Two-Tailed P-Value:  0.423123039126223"

References

Kassambara, Alboukadel. 2020. Rstatix: Pipe-Friendly Framework for Basic Statistical Tests. https://CRAN.R-project.org/package=rstatix.

Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.

Zhu, Hao. 2021. KableExtra: Construct Complex Table with Kable and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.

Field, Andy, Jeremy Miles, and Zoe Field. 2012. Discovering Statistics Using R. Sage.


  1. Pearson correlation values can be converted to z-scores with the arc tangent function, atanh(). Converting correlations to z-scores before conducting other statistics is necessary because correlations values are bounded by -1 and +1. Conversely, the hyperbolic tangent tanh() function can be used to convert z-scores back to correlation values. ↩︎

Previous