Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentAlgebra + Data Science
-
segmentChapter 1 - Exploring Variation in Data
-
1.7 Exploring Multivariate Hypotheses with Visualizations
-
segmentChapter 2 - Modeling Data with Functions
-
segmentChapter 3 - Assessing How Well Models Fit the Data
-
segmentResources
list High School / Algebra + Data Science (G)
1.7 Exploring Multivariate Hypotheses with Visualizations
Now let’s get crazy. Maybe we can make a better prediction about body mass if we knew both the flipper length and whether the penguin was a gentoo penguin!
This is called a multivariate hypothesis because it doesn’t just have one predictor variable, it has 2! (A multivariate model has more than 1 predictor variable.)
We can explore multivariate hypotheses with data visualizations in a few ways. One way is to start with a basic scatter plot (such as the one below) and add in color to represent the other predictor variable (by adding the argument color = ~gentoo
). (We did this earlier when we added the variable female
to the plot.)
Try adding a color argument in the code block below to color gentoo penguins differently from non-gentoo in the scatter plot of body mass by flipper length.
require(coursekata)
# add color according to the gentoo variable
gf_point(body_mass_kg ~ flipper_length_m, data = penguins)
# add color according to the gentoo variable
gf_point(body_mass_kg ~ flipper_length_m, data = penguins, color = ~gentoo)
ex() %>% check_function(., "gf_point") %>% {
check_arg(., "data") %>% check_equal()
check_arg(., "object") %>% check_equal()
check_arg(., "color") %>% check_equal()
}
Is It Possible to Have More Than Two Predictor Variables?
Exploring variation with graphs is like a detective game. Patterns you notice when graphing data often will lead to new hypotheses and new word equations. And yes, you can have many predictor variables. Let’s look at an example.
When we looked at the plot above where we put gentoo penguins in a different color than the others, it reminded us of a puzzle we encountered earlier when we used the color
argument to represent female
. Here are the two graphs side by side.
Colored by female
|
Colored by gentoo
|
---|---|
|
|
Earlier we were puzzled by the fact that the female vs. male difference appeared to be repeated in two clumps of dots. Now we can see that the two clumps were defined by species, gentoo vs. others. It now looks like that in addition to flipper length, both female
and gentoo
explain variation in body mass.
Size, Shape, and Facets
Note that in addition to arguments like color
, you might also want to try exploring arguments like size
and shape
with gf_point()
. You can do almost anything you want to do when graphing in R; the sky’s the limit.
In the following line of code we added size = 3
to make the dots larger. You can try experimenting with different sizes.
gf_point(body_mass_kg ~ flipper_length_m, data = penguins,
color = ~female, size = 3)
If you want to represent gentoo
as well as female
and flipper_length_m
in the same plot, you could add the argument shape = ~gentoo
into the line of code above.
gf_point(body_mass_kg ~ flipper_length_m, data = penguins,
color = ~female, size = 3, shape = ~gentoo)
In addition to the males being teal and females being purple, the gentoo penguins are represented by triangles and the non-gentoo by circles. The possibilities, really, are endless.
Just for fun, we will teach you one more way to look at a multivariate hypothesis. We can make separate facets (or panels) of scatter plots – one for each category of a categorical variable (such as gentoo
) – by piping on (%>%
) a new function, gf_facet_wrap()
.
gf_point(body_mass_kg ~ flipper_length_m, data = penguins,
color = ~female, shape = ~gentoo) %>%
gf_facet_wrap(~ gentoo)
Try playing around with gf_facet_wrap()
in the code block below. <Run> it with the categorical variable island
. Then <Run> it with the quantitative variable bill_length_cm
. Use the <Submit> button when you have the faceted visualization that you think is most helpful.
require(coursekata)
# try faceting by island
# then try faceting bill_length_cm
gf_point(body_mass_kg ~ flipper_length_m, data = penguins) %>%
gf_facet_wrap(~ gentoo)
# try faceting by island
# then try faceting bill_length_cm
gf_point(body_mass_kg ~ flipper_length_m, data = penguins) %>%
gf_facet_wrap(~ island)
ex() %>% {
check_function(., "gf_point") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "data") %>% check_equal()
}
check_function(., "gf_facet_wrap") %>% {
#check_arg(., 1) %>% check_equal()
check_arg(., 2) %>% check_equal()
}
}