vignettes/example-correlation.Rmd
example-correlation.Rmd
The cor()
function can calculate the correlation between
the variables in a data.frame.
A question which is often asked of this correlation information, is which variables have a strong correlation (positive or negative)?
Hunting for the strong and weak correlations in the raw
cor()
output is difficult.
cor(mtcars)
#> mpg cyl disp hp drat wt
#> mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
#> cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
#> disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
#> hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
#> drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
#> wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
#> qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
#> vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
#> am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
#> gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
#> carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
#> qsec vs am gear carb
#> mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
#> cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
#> disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
#> hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
#> drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
#> wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
#> qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
#> vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
#> am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
#> gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
#> carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
{emphatic}
One way to highlight the correlations would be to colour the strong negative correlations in red, and the strong positive correlations in blue.
For this type of colouring, the ggplot2
colour scale
scale_colour_gradient2()
is a perfect fit.
Issue: everything is now coloured - including locations with low correlation.
cor(mtcars) %>%
hl_mat(scale_colour_gradient2())
mpg cyl disp hp drat wt qsec vs am gear carb mpg 1.00000000 -0.85216196 -0.84755138 -0.77616837 0.68117191 -0.86765938 0.41868403 0.66403892 0.59983243 0.48028476 -0.55092507 cyl -0.85216196 1.00000000 0.90203287 0.83244745 -0.69993811 0.78249579 -0.59124207 -0.81081180 -0.52260705 -0.49268660 0.52698829 disp -0.84755138 0.90203287 1.00000000 0.79094859 -0.71021393 0.88797992 -0.43369788 -0.71041589 -0.59122704 -0.55556920 0.39497686 hp -0.77616837 0.83244745 0.79094859 1.00000000 -0.44875912 0.65874789 -0.70822339 -0.72309674 -0.24320426 -0.12570426 0.74981247 drat 0.68117191 -0.69993811 -0.71021393 -0.44875912 1.00000000 -0.71244065 0.09120476 0.44027846 0.71271113 0.69961013 -0.09078980 wt -0.86765938 0.78249579 0.88797992 0.65874789 -0.71244065 1.00000000 -0.17471588 -0.55491568 -0.69249526 -0.58328700 0.42760594 qsec 0.41868403 -0.59124207 -0.43369788 -0.70822339 0.09120476 -0.17471588 1.00000000 0.74453544 -0.22986086 -0.21268223 -0.65624923 vs 0.66403892 -0.81081180 -0.71041589 -0.72309674 0.44027846 -0.55491568 0.74453544 1.00000000 0.16834512 0.20602335 -0.56960714 am 0.59983243 -0.52260705 -0.59122704 -0.24320426 0.71271113 -0.69249526 -0.22986086 0.16834512 1.00000000 0.79405876 0.05753435 gear 0.48028476 -0.49268660 -0.55556920 -0.12570426 0.69961013 -0.58328700 -0.21268223 0.20602335 0.79405876 1.00000000 0.27407284 carb -0.55092507 0.52698829 0.39497686 0.74981247 -0.09078980 0.42760594 -0.65624923 -0.56960714 0.05753435 0.27407284 1.00000000
{emphatic}
to only locations with
high correlation
The selection of locations to colour is controlled with the
selection
argument to hl_mat()
.
The result still isn’t great because everything is now coloured - including locations with low correlation, as well as the diagonal (which is uninformative)
Issue: the diagonal of the matrix is still highlighted, but it is totally uninformative.
cor(mtcars) %>%
hl_mat(scale_colour_gradient2(), selection = abs(.x) > 0.7)
mpg cyl disp hp drat wt qsec vs am gear carb mpg 1.00000000 -0.85216196 -0.84755138 -0.77616837 0.68117191 -0.86765938 0.41868403 0.66403892 0.59983243 0.48028476 -0.55092507 cyl -0.85216196 1.00000000 0.90203287 0.83244745 -0.69993811 0.78249579 -0.59124207 -0.81081180 -0.52260705 -0.49268660 0.52698829 disp -0.84755138 0.90203287 1.00000000 0.79094859 -0.71021393 0.88797992 -0.43369788 -0.71041589 -0.59122704 -0.55556920 0.39497686 hp -0.77616837 0.83244745 0.79094859 1.00000000 -0.44875912 0.65874789 -0.70822339 -0.72309674 -0.24320426 -0.12570426 0.74981247 drat 0.68117191 -0.69993811 -0.71021393 -0.44875912 1.00000000 -0.71244065 0.09120476 0.44027846 0.71271113 0.69961013 -0.09078980 wt -0.86765938 0.78249579 0.88797992 0.65874789 -0.71244065 1.00000000 -0.17471588 -0.55491568 -0.69249526 -0.58328700 0.42760594 qsec 0.41868403 -0.59124207 -0.43369788 -0.70822339 0.09120476 -0.17471588 1.00000000 0.74453544 -0.22986086 -0.21268223 -0.65624923 vs 0.66403892 -0.81081180 -0.71041589 -0.72309674 0.44027846 -0.55491568 0.74453544 1.00000000 0.16834512 0.20602335 -0.56960714 am 0.59983243 -0.52260705 -0.59122704 -0.24320426 0.71271113 -0.69249526 -0.22986086 0.16834512 1.00000000 0.79405876 0.05753435 gear 0.48028476 -0.49268660 -0.55556920 -0.12570426 0.69961013 -0.58328700 -0.21268223 0.20602335 0.79405876 1.00000000 0.27407284 carb -0.55092507 0.52698829 0.39497686 0.74981247 -0.09078980 0.42760594 -0.65624923 -0.56960714 0.05753435 0.27407284 1.00000000
Since the magnitude of the correlations is now indicated by colour, we can de-emphasise the text by reducing its contrast with the fill colour.
Also, the diagonals are now excluded from the colouring
i.e. highlighting is now further limited to
whererow(.x) != col(.x)
cor(mtcars) %>%
hl_mat(scale_colour_gradient2(), selection = abs(.x) > 0.7 & row(.x) != col(.x)) %>%
hl_opt(text_contrast = 0.2)
mpg cyl disp hp drat wt qsec vs am gear carb mpg 1.00000000 -0.85216196 -0.84755138 -0.77616837 0.68117191 -0.86765938 0.41868403 0.66403892 0.59983243 0.48028476 -0.55092507 cyl -0.85216196 1.00000000 0.90203287 0.83244745 -0.69993811 0.78249579 -0.59124207 -0.81081180 -0.52260705 -0.49268660 0.52698829 disp -0.84755138 0.90203287 1.00000000 0.79094859 -0.71021393 0.88797992 -0.43369788 -0.71041589 -0.59122704 -0.55556920 0.39497686 hp -0.77616837 0.83244745 0.79094859 1.00000000 -0.44875912 0.65874789 -0.70822339 -0.72309674 -0.24320426 -0.12570426 0.74981247 drat 0.68117191 -0.69993811 -0.71021393 -0.44875912 1.00000000 -0.71244065 0.09120476 0.44027846 0.71271113 0.69961013 -0.09078980 wt -0.86765938 0.78249579 0.88797992 0.65874789 -0.71244065 1.00000000 -0.17471588 -0.55491568 -0.69249526 -0.58328700 0.42760594 qsec 0.41868403 -0.59124207 -0.43369788 -0.70822339 0.09120476 -0.17471588 1.00000000 0.74453544 -0.22986086 -0.21268223 -0.65624923 vs 0.66403892 -0.81081180 -0.71041589 -0.72309674 0.44027846 -0.55491568 0.74453544 1.00000000 0.16834512 0.20602335 -0.56960714 am 0.59983243 -0.52260705 -0.59122704 -0.24320426 0.71271113 -0.69249526 -0.22986086 0.16834512 1.00000000 0.79405876 0.05753435 gear 0.48028476 -0.49268660 -0.55556920 -0.12570426 0.69961013 -0.58328700 -0.21268223 0.20602335 0.79405876 1.00000000 0.27407284 carb -0.55092507 0.52698829 0.39497686 0.74981247 -0.09078980 0.42760594 -0.65624923 -0.56960714 0.05753435 0.27407284 1.00000000