The cor()
function can calculate the correlation between the variables in a data.frame.
A question which is often asked of this correlation information, is which variables have a strong correlation (positive or negative)?
Hunting for the strong and weak correlations in the raw cor()
output is difficult.
cor(mtcars)
#> mpg cyl disp hp drat wt
#> mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
#> cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
#> disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
#> hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
#> drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
#> wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
#> qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
#> vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
#> am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
#> gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
#> carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
#> qsec vs am gear carb
#> mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
#> cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
#> disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
#> hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
#> drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
#> wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
#> qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
#> vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
#> am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
#> gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
#> carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
{emphatic}
One way to highlight the correlations would be to colour the strong negative correlations in red, and the strong positive correlations in blue.
For this type of colouring, the ggplot2
colour scale scale_colour_gradient2()
is a perfect fit.
Issue: everything is now coloured - including locations with low correlation.
cor(mtcars) %>%
hl_mat(scale_colour_gradient2())
mpg cyl disp hp drat wt qsec vs am gear carb mpg 1.00000000 -0.85216196 -0.84755138 -0.77616837 0.68117191 -0.86765938 0.41868403 0.66403892 0.59983243 0.48028476 -0.55092507 cyl -0.85216196 1.00000000 0.90203287 0.83244745 -0.69993811 0.78249579 -0.59124207 -0.81081180 -0.52260705 -0.49268660 0.52698829 disp -0.84755138 0.90203287 1.00000000 0.79094859 -0.71021393 0.88797992 -0.43369788 -0.71041589 -0.59122704 -0.55556920 0.39497686 hp -0.77616837 0.83244745 0.79094859 1.00000000 -0.44875912 0.65874789 -0.70822339 -0.72309674 -0.24320426 -0.12570426 0.74981247 drat 0.68117191 -0.69993811 -0.71021393 -0.44875912 1.00000000 -0.71244065 0.09120476 0.44027846 0.71271113 0.69961013 -0.09078980 wt -0.86765938 0.78249579 0.88797992 0.65874789 -0.71244065 1.00000000 -0.17471588 -0.55491568 -0.69249526 -0.58328700 0.42760594 qsec 0.41868403 -0.59124207 -0.43369788 -0.70822339 0.09120476 -0.17471588 1.00000000 0.74453544 -0.22986086 -0.21268223 -0.65624923 vs 0.66403892 -0.81081180 -0.71041589 -0.72309674 0.44027846 -0.55491568 0.74453544 1.00000000 0.16834512 0.20602335 -0.56960714 am 0.59983243 -0.52260705 -0.59122704 -0.24320426 0.71271113 -0.69249526 -0.22986086 0.16834512 1.00000000 0.79405876 0.05753435 gear 0.48028476 -0.49268660 -0.55556920 -0.12570426 0.69961013 -0.58328700 -0.21268223 0.20602335 0.79405876 1.00000000 0.27407284 carb -0.55092507 0.52698829 0.39497686 0.74981247 -0.09078980 0.42760594 -0.65624923 -0.56960714 0.05753435 0.27407284 1.00000000
{emphatic}
to only locations with high correlationThe selection of locations to colour is controlled with the selection
argument to hl_mat()
.
The result still isn’t great because everything is now coloured - including locations with low correlation, as well as the diagonal (which is uninformative)
Issue: the diagonal of the matrix is still highlighted, but it is totally uninformative.
cor(mtcars) %>%
hl_mat(scale_colour_gradient2(), selection = abs(.x) > 0.7)
mpg cyl disp hp drat wt qsec vs am gear carb mpg 1.00000000 -0.85216196 -0.84755138 -0.77616837 0.68117191 -0.86765938 0.41868403 0.66403892 0.59983243 0.48028476 -0.55092507 cyl -0.85216196 1.00000000 0.90203287 0.83244745 -0.69993811 0.78249579 -0.59124207 -0.81081180 -0.52260705 -0.49268660 0.52698829 disp -0.84755138 0.90203287 1.00000000 0.79094859 -0.71021393 0.88797992 -0.43369788 -0.71041589 -0.59122704 -0.55556920 0.39497686 hp -0.77616837 0.83244745 0.79094859 1.00000000 -0.44875912 0.65874789 -0.70822339 -0.72309674 -0.24320426 -0.12570426 0.74981247 drat 0.68117191 -0.69993811 -0.71021393 -0.44875912 1.00000000 -0.71244065 0.09120476 0.44027846 0.71271113 0.69961013 -0.09078980 wt -0.86765938 0.78249579 0.88797992 0.65874789 -0.71244065 1.00000000 -0.17471588 -0.55491568 -0.69249526 -0.58328700 0.42760594 qsec 0.41868403 -0.59124207 -0.43369788 -0.70822339 0.09120476 -0.17471588 1.00000000 0.74453544 -0.22986086 -0.21268223 -0.65624923 vs 0.66403892 -0.81081180 -0.71041589 -0.72309674 0.44027846 -0.55491568 0.74453544 1.00000000 0.16834512 0.20602335 -0.56960714 am 0.59983243 -0.52260705 -0.59122704 -0.24320426 0.71271113 -0.69249526 -0.22986086 0.16834512 1.00000000 0.79405876 0.05753435 gear 0.48028476 -0.49268660 -0.55556920 -0.12570426 0.69961013 -0.58328700 -0.21268223 0.20602335 0.79405876 1.00000000 0.27407284 carb -0.55092507 0.52698829 0.39497686 0.74981247 -0.09078980 0.42760594 -0.65624923 -0.56960714 0.05753435 0.27407284 1.00000000
Since the magnitude of the correlations is now indicated by colour, we can de-emphasise the text by reducing its contrast with the fill colour.
Also, the diagonals are now excluded from the colouring i.e. highlighting is now further limited to whererow(.x) != col(.x)
cor(mtcars) %>%
hl_mat(scale_colour_gradient2(), selection = abs(.x) > 0.7 & row(.x) != col(.x)) %>%
hl_opt(text_contrast = 0.2)
mpg cyl disp hp drat wt qsec vs am gear carb mpg 1.00000000 -0.85216196 -0.84755138 -0.77616837 0.68117191 -0.86765938 0.41868403 0.66403892 0.59983243 0.48028476 -0.55092507 cyl -0.85216196 1.00000000 0.90203287 0.83244745 -0.69993811 0.78249579 -0.59124207 -0.81081180 -0.52260705 -0.49268660 0.52698829 disp -0.84755138 0.90203287 1.00000000 0.79094859 -0.71021393 0.88797992 -0.43369788 -0.71041589 -0.59122704 -0.55556920 0.39497686 hp -0.77616837 0.83244745 0.79094859 1.00000000 -0.44875912 0.65874789 -0.70822339 -0.72309674 -0.24320426 -0.12570426 0.74981247 drat 0.68117191 -0.69993811 -0.71021393 -0.44875912 1.00000000 -0.71244065 0.09120476 0.44027846 0.71271113 0.69961013 -0.09078980 wt -0.86765938 0.78249579 0.88797992 0.65874789 -0.71244065 1.00000000 -0.17471588 -0.55491568 -0.69249526 -0.58328700 0.42760594 qsec 0.41868403 -0.59124207 -0.43369788 -0.70822339 0.09120476 -0.17471588 1.00000000 0.74453544 -0.22986086 -0.21268223 -0.65624923 vs 0.66403892 -0.81081180 -0.71041589 -0.72309674 0.44027846 -0.55491568 0.74453544 1.00000000 0.16834512 0.20602335 -0.56960714 am 0.59983243 -0.52260705 -0.59122704 -0.24320426 0.71271113 -0.69249526 -0.22986086 0.16834512 1.00000000 0.79405876 0.05753435 gear 0.48028476 -0.49268660 -0.55556920 -0.12570426 0.69961013 -0.58328700 -0.21268223 0.20602335 0.79405876 1.00000000 0.27407284 carb -0.55092507 0.52698829 0.39497686 0.74981247 -0.09078980 0.42760594 -0.65624923 -0.56960714 0.05753435 0.27407284 1.00000000