Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentAlgebra + Data Science
-
segmentChapter 1 - Exploring Variation in Data
-
1.7 Exploring Multivariate Hypotheses with Visualizations
-
segmentChapter 2 - Modeling Data with Functions
-
segmentChapter 3 - Assessing How Well Models Fit the Data
-
segmentResources
list High School / Algebra + Data Science (G)
1.7 Exploring Multivariate Hypotheses with Visualizations
Now let’s get crazy. Maybe we can make a better prediction about body mass if we knew both the flipper length and whether the penguin was a gentoo penguin!
This is called a multivariate hypothesis because it doesn’t just have one predictor variable, it has 2! (A multivariate model has more than 1 predictor variable.)
We can explore multivariate hypotheses with data visualizations in a
few ways. One way is to start with a basic scatter plot (such as the one
below) and add in color to represent the other predictor variable (by
adding the argument color = ~gentoo
). (We did this earlier
when we added the variable female
to the plot.)
Try adding a color argument in the code block below to color gentoo penguins differently from non-gentoo in the scatter plot of body mass by flipper length.
require(coursekata)
# add color according to the gentoo variable
gf_point(body_mass_kg ~ flipper_length_m, data = penguins)
# add color according to the gentoo variable
gf_point(body_mass_kg ~ flipper_length_m, data = penguins, color = ~gentoo)
ex() %>% check_function(., "gf_point") %>% {
check_arg(., "data") %>% check_equal()
check_arg(., "object") %>% check_equal()
check_arg(., "color") %>% check_equal()
}
Is It Possible to Have More Than Two Predictor Variables?
Exploring variation with graphs is like a detective game. Patterns you notice when graphing data often will lead to new hypotheses and new word equations. And yes, you can have many predictor variables. Let’s look at an example.
When we looked at the plot above where we put gentoo penguins in a
different color than the others, it reminded us of a puzzle we
encountered earlier when we used the color
argument to
represent female
. Here are the two graphs side by side.
Colored by female
|
Colored by gentoo
|
---|---|
|
|
Earlier we were puzzled by the fact that the female vs. male
difference appeared to be repeated in two clumps of dots. Now we can see
that the two clumps were defined by species, gentoo vs. others. It now
looks like that in addition to flipper length, both female
and gentoo
explain variation in body mass.
Size, Shape, and Facets
Note that in addition to arguments like color
, you might
also want to try exploring arguments like size
and
shape
with gf_point()
. You can do almost
anything you want to do when graphing in R; the sky’s the limit.
In the following line of code we added size = 3
to make
the dots larger. You can try experimenting with different sizes.
gf_point(body_mass_kg ~ flipper_length_m, data = penguins,
color = ~female, size = 3)
If you want to represent gentoo
as well as
female
and flipper_length_m
in the same plot,
you could add the argument shape = ~gentoo
into the line of
code above.
gf_point(body_mass_kg ~ flipper_length_m, data = penguins,
color = ~female, size = 3, shape = ~gentoo)
In addition to the males being teal and females being purple, the gentoo penguins are represented by triangles and the non-gentoo by circles. The possibilities, really, are endless.
Just for fun, we will teach you one more way to look at a
multivariate hypothesis. We can make separate facets (or panels) of
scatter plots – one for each category of a categorical variable (such as
gentoo
) – by piping on (%>%
) a new
function, gf_facet_wrap()
.
gf_point(body_mass_kg ~ flipper_length_m, data = penguins,
color = ~female, shape = ~gentoo) %>%
gf_facet_wrap(~ gentoo)
Try playing around with gf_facet_wrap()
in the code
block below. <Run> it with the categorical variable
island
. Then <Run> it with the quantitative variable
bill_length_cm
. Use the <Submit> button when you have
the faceted visualization that you think is most helpful.
require(coursekata)
# try faceting by island
# then try faceting bill_length_cm
gf_point(body_mass_kg ~ flipper_length_m, data = penguins) %>%
gf_facet_wrap(~ gentoo)
# try faceting by island
# then try faceting bill_length_cm
gf_point(body_mass_kg ~ flipper_length_m, data = penguins) %>%
gf_facet_wrap(~ island)
ex() %>% {
check_function(., "gf_point") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "data") %>% check_equal()
}
check_function(., "gf_facet_wrap") %>% {
#check_arg(., 1) %>% check_equal()
check_arg(., 2) %>% check_equal()
}
}