Correlation in R Pt. 2 - Comparing Correlation Coefficients

In this second part, we continue to examine the Exam Anxiety dataset to look at the separate correlations between the variables of interest in males and females. We then proceed to test if there is a difference between those two correlations.

Plot by Gender

We can plot the data by Gender to see if the relationship between pre-test anxiety and exam scores are different between genders. This is accomplished by adding the color = Gender argument to the aes() portion of the ggplot call.

colors = c( "#440154FF","#1565c0")
ggplot(data, aes(x = Anxiety, y = Exam, color = Gender)) +
  geom_point(alpha = 0.7) +
  scale_color_manual(values = colors) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal() +
  theme(axis.line = element_line(color = "grey70"))

Correlations by Gender

The group_by() function is compatible with the rstatix cor_test() to produce two separate correlations, one ofr males and one for females. We simply pipe data to group_by() and then pipe that output to cor_test().

correlations <- data %>% 
  group_by(Gender) %>%  
  cor_test(.,
           vars = c("Anxiety"),
           vars2 = c("Exam", "Revise"),
           method = "pearson",
           use = "pairwise.complete.obs")
kable(correlations)

Gender	var1	var2	cor	statistic	p	conf.low	conf.high	method
Female	Anxiety	Exam	-0.38	-2.887975	5.76e-03	-0.5944857	-0.1182269	Pearson
Male	Anxiety	Exam	-0.51	-4.144756	1.31e-04	-0.6841707	-0.2700554	Pearson
Female	Anxiety	Revise	-0.82	-10.079994	0.00e+00	-0.8944820	-0.7054746	Pearson
Male	Anxiety	Revise	-0.60	-5.267088	2.90e-06	-0.7482821	-0.3876660	Pearson

Compare correlation values between genders

Filed, Miles, and Field provide the code for a function to compare two correlation values¹. This function takes in as input the correlation and sample size for each group. Entering our values into the function results in a z-difference score of -.80 and p-value of .21 which suggests there is no statistical difference between the correlations between Anxiety and Exam scores in females and males.

kable(data %>% count(Gender))

Gender	n
Female	51
Male	52

zdifference <- function(r1, r2, n1, n2){
  zd <- (atanh(r1)-atanh(r2))/sqrt(1/(n1-3)+1/(n2-3))
	p <- 1 - pnorm(abs(zd))
	print(paste("Z Difference: ", zd))
	print(paste("One-Tailed P-Value: ", p))
	print(paste("Two-Tailed P-Value: ", (2*p)))
	}
	
zdifference(-0.51, -0.38, 52, 51)

## [1] "Z Difference:  -0.801014866962209"
## [1] "One-Tailed P-Value:  0.211561519563112"
## [1] "Two-Tailed P-Value:  0.423123039126223"

References

Kassambara, Alboukadel. 2020. Rstatix: Pipe-Friendly Framework for Basic Statistical Tests. https://CRAN.R-project.org/package=rstatix.

Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.

Zhu, Hao. 2021. KableExtra: Construct Complex Table with Kable and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.

Field, Andy, Jeremy Miles, and Zoe Field. 2012. Discovering Statistics Using R. Sage.

Pearson correlation values can be converted to z-scores with the arc tangent function, atanh(). Converting correlations to z-scores before conducting other statistics is necessary because correlations values are bounded by -1 and +1. Conversely, the hyperbolic tangent tanh() function can be used to convert z-scores back to correlation values. ↩︎

Last updated on Oct 8, 2023