Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentCollege / Advanced Statistics with R (ABCD)
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentPART IV: MULTIVARIATE MODELS
-
segmentChapter 13 - Introduction to Multivariate Models
-
segmentChapter 14 - Multivariate Model Comparisons
-
segmentChapter 15 - Models with Interactions
-
segmentChapter 16 - More Models with Interactions
-
16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Advanced Statistics with R (ABCD)
16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors
In the interaction model, we not only allow the intercepts of the
regression lines to differ for different years built, but also the
slopes of the lines. To the extent that the slopes do, in fact, differ
for different values of YearBuilt
, it means that the
relationship between price and home size depends on when the
home was built – at least in the data if not in the DGP. Allowing the
slopes to differ costs us an additional degree of freedom, but may lead
to a better-fitting model than the additive model.
The GLM notation for the interaction model with two quantitative predictors is the same as it was for the model with one categorical and one quantitative predictor. So is the R code! But because both predictor variables are quantitative, the interpretation of the model is a little different.
Add some code to the code window below to fit the interaction model
and save it as interaction_model
. Also, add on
gf_model()
to the gf_point()
to visualize
predictions of the interaction model on the scatter plot.
require(coursekata)
# fit and save the interaction model
interaction_model <-
# add the model to this scatter plot
gf_point(PriceK ~ HomeSizeK, data = Ames, color = ~YearBuilt)
# fit and save the interaction model
interaction_model <- lm(PriceK ~ YearBuilt*HomeSizeK, data = Ames)
# add the model to this scatter plot
gf_point(PriceK ~ HomeSizeK, data = Ames, color = ~YearBuilt) %>%
gf_model(interaction_model)
ex() %>% check_or(
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ HomeSizeK*YearBuilt, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ YearBuilt + HomeSizeK + YearBuilt:HomeSizeK, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ HomeSizeK + YearBuilt + HomeSizeK:YearBuilt, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ YearBuilt + HomeSizeK + YearBuilt*HomeSizeK, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ HomeSizeK + YearBuilt + HomeSizeK*YearBuilt, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal()
)
ex() %>% check_function(., "gf_model") %>% check_arg(., "object")
Earlier, when we had one categorical predictor and one quantitative
predictor, the gf_model()
function overlaid two regression
lines – one for each level of the categorical variable. When both
predictors are quantitative, however, a different approach is
required.
gf_model()
can’t overlay a regression line for each
possible value of YearBuilt
because there are too many
possible values! Instead, it selects three representative values of
YearBuilt
and overlays these. The values it chooses are the
mean of YearBuilt
, one standard deviation above the mean,
and one standard deviation below the mean.
In the graph, the middle (greenish) line shows the model predictions
for the average YearBuilt
(1978), while the two flanking
lines represent one standard deviation above the mean (2014, yellowish)
and below the mean (1942, bluish).
Just because we graphed three lines doesn’t mean there are only three
possible lines. Theoretically there could be an infinite number of
lines. The gf_model()
function just shows a few
representative examples to help us see what the interaction pattern
looks like.
The important thing to notice is that the slope of the line is
steeper for newer homes compared to older homes. A way to describe this
pattern of increasing steepness is that the effect of home size on price
gets larger as houses get newer. In other words, there is an interaction
between HomeSizeK
and YearBuilt
.
Different Graphs Can Highlight Different Interpretations
You might wonder why we chose to represent HomeSizeK
on
the x-axis, and YearBuilt
with the different lines.
Actually, there is no reason you couldn’t present the same model in a
different way, as in the graph below.
interaction_model <- lm(PriceK ~ YearBuilt*HomeSizeK, data = Ames)
gf_point(PriceK ~ YearBuilt, data = Ames, color = ~HomeSizeK) %>%
gf_model(interaction_model)
Now each line represents a particular value on HomeSizeK
(the mean, +1 SD, and -1 SD). Although the model and the data are the
same as in the previous graph, plotting it in a different way may lead
to a different way of describing the pattern of results.