Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentHigh School / Statistics and Data Science II (XCD)
-
segmentPART I: EXPLORING AND MODELING VARIATION
-
segmentChapter 1 - Exploring Data with R
-
segmentChapter 2 - From Exploring to Modeling Variation
-
segmentChapter 3 - Modeling Relationships in Data
-
segmentPART II: COMPARING MODELS TO MAKE INFERENCES
-
segmentChapter 4 - The Logic of Inference
-
segmentChapter 5 - Model Comparison with F
-
segmentChapter 6 - Parameter Estimation and Confidence Intervals
-
segmentPART III: MULTIVARIATE MODELS
-
segmentChapter 7 - Introduction to Multivariate Models
-
segmentChapter 8 - Multivariate Model Comparisons
-
segmentChapter 9 - Models with Interactions
-
segmentChapter 10 - More Models with Interactions
-
10.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Statistics and Data Science II (XCD)
10.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors
We know from the graph that the interaction model produces a larger effect of home size on price for newer homes than it does for older homes. This is what makes it an interaction model: the effects of one predictor on the outcome are different for different values of the second predictor.
But how is it that multiplying YearBuilt
by
HomeSizeK
results in different slopes and intercepts for
homes of different ages? Let’s dig in and see how this works. Write some
code to fit and print out the parameter estimates from the interaction
model of PriceK
using HomeSizeK
and
YearBuilt
as predictors.
require(coursekata)
# find the best-fitting parameter estimates for the interaction model
# find the best-fitting parameter estimates for the interaction model
lm(PriceK ~ YearBuilt * HomeSizeK, data = Ames)
# or alternatively: lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames)
ex() %>% check_or(
check_function(., "lm") %>%
check_result() %>%
check_equal(),
override_solution(., "lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames)") %>%
check_function("lm") %>%
check_result() %>%
check_equal()
)
Call:
lm(formula = PriceK ~ YearBuilt * HomeSizeK, data = Ames)
Coefficients:
(Intercept) YearBuilt HomeSizeK
-157.0888 0.1037 -837.6424
YearBuilt:HomeSizeK
0.4686
How the Interaction Model Generates Predictions
Here, again, is the interaction model in GLM notation:
\[\text{PriceK}_i=b_0+b_1\text{YearBuilt}_i+b_2\text{HomeSizeK}_i+b_3\text{YearBuilt}_i*\text{HomeSizeK}_i+e_i\]
If we replace the \(b\)s with their corresponding parameter estimates we get this function that we can use to predict the price of any home based on its size and the year it was built:
\[\text{PriceK}=-157.09 + 0.1\text{YearBuilt}+-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}\]
Because this is an interaction model, we know that it will generate
many lines – one for each value of YearBuilt
. To see how
this works, it is helpful to label the part of the function that
generates the predicted y-intercept, and the part that generates the
predicted slope. Let’s start by looking at the y-intercept.
\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}}_\text{the part that produces slope}\]
The y-intercept part of the function generates a different
y-intercept for each value of YearBuilt
just as it did in
the additive model. To get the y-intercept, we start at -157.09, then
add 0.1 for each year of YearBuilt
.
To see how the interaction model generates a different slope for each
value of YearBuilt
, it helps to simplify the remaining part
of the function by doing a little algebraic manipulation.
We start with the part that produces the slope:
\[838.64\text{HomeSizeK}_i+0.47\text{YearBuilt}_i*\text{HomeSizeK}\]
and use the distributive property (\(ac+bc=(a+b)c\)) to turn it into this:
\[(-838.64+0.47\text{YearBuilt})\text{HomeSizeK}\]
We can put this re-written slope back into the function to more
clearly show how this equation generates different slopes for each value
of YearBuilt
:
\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{(-838.64+0.47\text{YearBuilt})}_\text{slope}\text{HomeSizeK}\]
Similar to the adjustment made for y-intercepts, the function gets
the slope for each value of YearBuilt
by starting with
-838.64, then adding 0.47 for each YearBuilt
. A home built
in the year 2000, therefore, would have a predicted slope of -838.64 +
(0.47*2000), or 101.36.
We could use this same logic to make regression lines of
YearBuilt
predicting PriceK
, yielding a
different regression line for each value of HomeSizeK
. The
key is that the y-intercept and slope for the regression lines of one
predictor are adjusted based on the value of the other predictor. This,
indeed, is the very definition of an interaction model.