Correlation in R Pt. 2 - Comparing Correlation Coefficients
In this second part, we continue to examine the Exam Anxiety dataset to look at the separate correlations between the variables of interest in males and females. We then proceed to test if there is a difference between those two correlations.
Plot by Gender
We can plot the data by Gender to see if the relationship between pre-test anxiety and exam scores are different between genders. This is accomplished by adding the color = Gender argument to the aes() portion of the ggplot call.
colors = c( "#440154FF","#1565c0")
ggplot(data, aes(x = Anxiety, y = Exam, color = Gender)) +
geom_point(alpha = 0.7) +
scale_color_manual(values = colors) +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
theme(axis.line = element_line(color = "grey70"))
Correlations by Gender
The group_by()
function is compatible with the rstatix cor_test()
to produce two separate correlations, one ofr males and one for females. We simply pipe data to group_by()
and then pipe that output to cor_test()
.
correlations <- data %>%
group_by(Gender) %>%
cor_test(.,
vars = c("Anxiety"),
vars2 = c("Exam", "Revise"),
method = "pearson",
use = "pairwise.complete.obs")
kable(correlations)
Gender | var1 | var2 | cor | statistic | p | conf.low | conf.high | method |
---|---|---|---|---|---|---|---|---|
Female | Anxiety | Exam | -0.38 | -2.887975 | 5.76e-03 | -0.5944857 | -0.1182269 | Pearson |
Male | Anxiety | Exam | -0.51 | -4.144756 | 1.31e-04 | -0.6841707 | -0.2700554 | Pearson |
Female | Anxiety | Revise | -0.82 | -10.079994 | 0.00e+00 | -0.8944820 | -0.7054746 | Pearson |
Male | Anxiety | Revise | -0.60 | -5.267088 | 2.90e-06 | -0.7482821 | -0.3876660 | Pearson |
Compare correlation values between genders
Filed, Miles, and Field provide the code for a function to compare two correlation values1. This function takes in as input the correlation and sample size for each group. Entering our values into the function results in a z-difference score of -.80 and p-value of .21 which suggests there is no statistical difference between the correlations between Anxiety and Exam scores in females and males.
kable(data %>% count(Gender))
Gender | n |
---|---|
Female | 51 |
Male | 52 |
zdifference <- function(r1, r2, n1, n2){
zd <- (atanh(r1)-atanh(r2))/sqrt(1/(n1-3)+1/(n2-3))
p <- 1 - pnorm(abs(zd))
print(paste("Z Difference: ", zd))
print(paste("One-Tailed P-Value: ", p))
print(paste("Two-Tailed P-Value: ", (2*p)))
}
zdifference(-0.51, -0.38, 52, 51)
## [1] "Z Difference: -0.801014866962209"
## [1] "One-Tailed P-Value: 0.211561519563112"
## [1] "Two-Tailed P-Value: 0.423123039126223"
References
Kassambara, Alboukadel. 2020. Rstatix: Pipe-Friendly Framework for Basic Statistical Tests. https://CRAN.R-project.org/package=rstatix.
Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Zhu, Hao. 2021. KableExtra: Construct Complex Table with Kable and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.
Field, Andy, Jeremy Miles, and Zoe Field. 2012. Discovering Statistics Using R. Sage.
Pearson correlation values can be converted to z-scores with the arc tangent function,
atanh()
. Converting correlations to z-scores before conducting other statistics is necessary because correlations values are bounded by -1 and +1. Conversely, the hyperbolic tangenttanh()
function can be used to convert z-scores back to correlation values. ↩︎