CourseKata - 3.4 Summing Residuals From a Model

High School / Algebra + Data Science (G)

Book

3.4 Summing Residuals From a Model

Having learned a bit about vectors, let’s go back to our comparison of the better_function model with the worse_function model.

`better_function <- function(X){-6 + 51*X}`	`worse_function <- function(X){-7 + 49*X}}`

We observed before that the better model appears to have shorter residuals than the worse model, and that the residuals seem to hover more closely around (above and below) the line of predictions. Of course, we were only looking closely at the 6 residuals highlighted in the figure above.

Let’s see if we can test this idea by summing up all of the 333 residuals (one for each penguin) from each model.

Using Vectors to Calculate the Residuals

Let’s start by putting our outcome (Y) and predictor (X) variables into vectors:

Y <-  penguins$body_mass_kg
X <-  penguins$flipper_length_m

Using the vectors X and Y, we can now calculate the predictions and residuals from the better_function model like this:

prediction <-  better_function(X) 
residual <-  Y - prediction

Note that we have now saved two new vectors: prediction and residual. But we didn’t have to save a vector called prediction in order to calculate the residuals. We could have skipped this step and just replaced the idea of prediction with the function that calculated the prediction: better_function(X). Try it in the code block below.

require(coursekata)

# defines our Y and X
Y <-  penguins$body_mass_kg
X <-  penguins$flipper_length_m

# defines better_function
better_function <- function(X){-6 + 51*X}

# edit the code below to replace prediction
residual <-  Y - prediction

# prints out residual vector
residual

# defines our Y and X
Y <-  penguins$body_mass_kg
X <-  penguins$flipper_length_m

# defines better_function
better_function <- function(X){-6 + 51*X}

# edit the code below to replace prediction
residual <-  Y - better_function(X)

# prints out residual vector
residual

ex() %>% check_function("better_function") %>% {
  check_result(.,) %>% check_equal()
}

Using Vectors to Sum the Residuals

Now that we’ve found an easy way to calculate all the residuals from the better_function model, let’s try summing them up to get an idea of what the total error might be around this model. We will then compare the total error from the better_function model with that from the worse_function model to get a sense of which model has less total error.

To do this we will use the sum() function to sum up the residuals from each of the two models. Try getting both of these sums in the code block below.

require(coursekata)

# defines our Y and X
Y <-  penguins$body_mass_kg
X <-  penguins$flipper_length_m

# defines the functions (models)
better_function <- function(X){-6 + 51*X}
worse_function <- function(X){-7 + 49*X}

# assume Y, X, and the functions have been defined
# this code calculates the better and worse residuals
better_residual <- Y - better_function(X)
worse_residual <- Y - worse_function(X)

# write code to sum up each set of residuals

# assume Y, X, and the functions have been defined
# this calculates the better and worse residuals
better_residual <- Y - better_function(X)
worse_residual <- Y - worse_function(X)

# write code to sum up each set of residuals
sum(better_residual)
sum(worse_residual)

ex() %>% check_function("sum", index = 1) %>% check_arg("x") %>% check_equal()
ex() %>% check_function("sum", index = 2) %>% check_arg("x") %>% check_equal()

-14.072
452.772

In the figure below we show you the two models overlaid on the scatter plot of body mass by flipper length along with each model’s total residuals. We’ve also added a third model (on the right) and its total residuals.

Model: `worse_function` Sum of residuals = 452.772	Model: `better_function` Sum of residuals = -14.072	Model: some other function Sum of residuals = -213.228

We previously surmised that the residuals from our better models tend to be both positive and negative, and the better the model, the closer the residuals are to 0. We have now confirmed this idea by summing up all the residuals in the data frame. Residuals from our better_function model add up much closer to 0 than did residuals from the worse_function model.

When the body masses are sometimes higher and sometimes lower than the predictions, the sum of the residuals should be closer to 0 than when the predictions are always too high or too low. This suggests that in our quest for the best model, we should be looking for a model that perfectly balances the residuals.

3.3 Variables as Vectors 3.5 Residuals are Perfectly Balanced at the Mean

Course Outline

High School / Algebra + Data Science (G)

3.4 Summing Residuals From a Model

Using Vectors to Calculate the Residuals

Using Vectors to Sum the Residuals

Responses

list High School / Algebra + Data Science (G)

3.4 Summing Residuals From a Model

Using Vectors to Calculate the Residuals

Using Vectors to Sum the Residuals

High School / Algebra + Data Science (G)