CourseKata - 3.9 SSE, MSE, and RMSE

High School / Algebra + Data Science (G)

Book

3.9 SSE, MSE, and RMSE

Shortcut to SSE

As always you can calculate the residual errors from a model and then square them and sum them. But there is also a shortcut for calculating the SSE from a model:

sse(Y, my_function(X))

In the code block below, we’ve put in code to calculate SSE in the old way. Try using the sse() function to see if you get the same SSE.

require(coursekata)
require(Metrics)
Y <-  penguins$body_mass_kg
X <-  penguins$flipper_length_m

# assume Y and X have been defined for you
# this creates the best-fitting model
best_function <- function(X){-5.872 + 50.153*X}

# this is our old way of calculating SS
residual <- Y - best_function(X)
sum(residual^2)

# use sse() to calculate SS

# assume Y and X have been defined for you
# this creates the best-fitting model
best_function <- function(X){-5.872 + 50.153*X}

# this is our old way of calculating SS
residual <- Y - best_function(X)
sum(residual^2)

# use sse() to calculate SS
sse(Y, best_function(X))

msg <- "Did you use Y and best_function(X)?"
ex() %>% check_function("sse") %>% {
  check_arg(., "actual") %>% check_equal(incorrect_msg = msg)
  check_arg(., "predicted") %>% check_equal(incorrect_msg = msg)
}

51.21196324697
51.21196324697

Mean Square Error (MSE) or Variance

Sum of squares has the wonderful quality of pointing the way to the best-fitting model, but it’s a weird number to actually interpret. The best-fitting model (\(-5.872 + 50.153*X\)) has an SSE of 51.21 kilograms-squared. What’s a square-kilogram and what does it mean for a model’s error to have a total of 51 of them?

In addition to the difficulty of interpreting squared units, SSE is also weird because it gets bigger and bigger if there is simply more data because it represents total square error. In answer to these difficulties, statisticians have devised some measures based on SSE, but more interpretable. These are Mean Square Error (MSE) and Root Mean Square Error (RMSE).

Mean Square Error (MSE) is the average squared error. We divide SSE by the total number of squares (or penguins in this case). That gives us a square that isn’t the biggest one, or the smallest one; it’s the average one.

A scatter plot of body_mass_kg predicted by flipper_length_m. A green line is plotted on the graph and runs through the mean of the data points. A couple of data points are highlighted and their residuals as vertical lines from the green line are made into squares. The square in the center of the line is darkened to show the average square size.

In the case of the penguins data, to calculate MSE, we would divide SSE by the number of data points (e.g., number of penguins, which is 333). More generally:

\[MSE = \frac{SSE}{\text{number of data points}}\]

MSE is also called variance and just like SSE, there is a shortcut for calculating it in R. It looks very similar to the sse() function: mse(). Try it out in the code window below.

require(coursekata)
require(Metrics)
Y <-  penguins$body_mass_kg
X <-  penguins$flipper_length_m

ss <- function(actual, predicted){sum((actual-predicted)^2)}

# assume Y and X have been defined for you
# this creates the best-fitting model
best_function <- function(X){-5.872 + 50.153*X}

# edit this to calculate the MSE
sse(Y, best_function(X))

# assume Y and X have been defined for you
# this creates the best-fitting model
best_function <- function(X){-5.872 + 50.153*X}

# edit this to calculate the MSE
mse(Y, best_function(X))

ex() %>% check_function("mse") %>% {
  check_arg(., "actual") %>% check_equal()
  check_arg(., "predicted") %>% check_equal()
}

0.15378967942033

MSE is slightly better than SSE because it corrects for the number of penguins that went into calculating it. But it still is measured in kilograms-squared, which is a hard number to interpret. The squaring of the residuals helped us figure out the best-fitting model, but now we might want to un-square this number to put it back into units we understand better: kilograms.

Root Mean Square Error (RMSE)

RMSE is the square root of MSE.

\[RMSE = \sqrt{MSE} = \sqrt{\frac{SSE}{\text{number of data points}}}\]

Visually, you can think of it as just one side of the average square. The nice thing about this concept is that it’s roughly the average length of the residuals!

A scatter plot of body_mass_kg predicted by flipper_length_m. A green line is plotted on the graph and runs through the center of the data points. The residuals from each data point to the line are plotted as vertical lines. The line at the center is darkened to show the average residual length.

We can calculate it in the same way we calculate sse() and mse(): with a function called rmse().

require(coursekata)
Y <-  penguins$body_mass_kg
X <-  penguins$flipper_length_m

ss <- function(actual, predicted){sum((actual-predicted)^2)}

# assume Y and X have been defined for you
# this creates the best-fitting model
best_function <- function(X){-5.872 + 50.153*X}

# edit this to calculate the RMSE
mse(Y, best_function(X))

# assume Y and X have been defined for you
# this creates the best-fitting model
best_function <- function(X){-5.872 + 50.153*X}

# edit this to calculate the MSE
rmse(Y, best_function(X))

ex() %>% check_function("rmse") %>% {
  check_arg(., "actual") %>% check_equal()
  check_arg(., "predicted") %>% check_equal()
}

0.392160272618645

This number represents the average error in kilograms. Roughly, on average, the actual body masses are 0.39 kilograms away from the predictions. No more kilograms squared!

RMSE is very similar to the statistics concept of standard deviation (but slightly simpler). In more advanced data science, like machine learning, having a single number to evaluate a model’s performance is incredibly valuable, and RMSE happens to be one of the most widely used metrics for this purpose.

SSE, MSE, and RMSE are all measures of how bad a model is – how much error there is from its predictions. The model that is least bad is crowned the best-fitting model. The beauty of all three of these measures is that the best-fitting model (found with lm()) minimizes all three of these measures of error: SSE, MSE, and RMSE. Any other values of \(b_0\) and \(b_1\) will lead to higher error.

3.8 The Best-Fitting Model 3.10 Proportional Reduction in Error (PRE)

Course Outline

High School / Algebra + Data Science (G)

3.9 SSE, MSE, and RMSE

Shortcut to SSE

Mean Square Error (MSE) or Variance

Root Mean Square Error (RMSE)

Responses

list High School / Algebra + Data Science (G)

3.9 SSE, MSE, and RMSE

Shortcut to SSE

Mean Square Error (MSE) or Variance

Root Mean Square Error (RMSE)

High School / Algebra + Data Science (G)