Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentCollege / Introductory Statistics with R (ABC)
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
4.7 Contingency Tables
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Introductory Statistics with R (ABC)
4.7 Contingency Tables
Bar graphs are one way to visualize the hypothesis WtLost = Condition + Other Stuff. Another way to explore the hypothesis is with a contingency table, which shows the distribution of cases across two categorical variables.
You already know the R function we use to make tables,
tally()
. Here we will extend its use to look at an outcome
variable by an explanatory variable.
tally(WtLost ~ Condition, data = MindsetMatters)
Try using the tally()
function in the code block below
to generate the contingency table for our MindsetMatters
hypothesis.
require(coursekata)
MindsetMatters <- MindsetMatters %>%
mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))
# Make a contingency table
tally()
# Make a contingency table
tally(WtLost ~ Condition, data = MindsetMatters)
ex() %>% check_function("tally") %>% {
check_arg(., 1) %>% check_equal()
check_arg(., 2) %>% check_equal()
}
Condition
WtLost Informed Uninformed
lost 28 20
not lost 13 14
Each value in the table represents the frequency of a particular combination of levels (e.g., “lost” and “Informed”; “lost” and “Uninformed”, “not lost” and “Informed”; “not lost” and “Uninformed”) in the dataset.
If you want proportions instead of counts (more appropriate in this
case due to the unequal sample sizes across conditions) you can add the
argument format = "proportion"
:
tally(WtLost ~ Condition, data = MindsetMatters, format = "proportion")
Condition
WtLost Informed Uninformed
lost 0.6829268 0.5882353
not lost 0.3170732 0.4117647
In tables created by tally()
, the proportions are
normalized by column, meaning that the proportions in each column add up
to 1. If the row proportions added up to 1, we would say they are
normalized by row.
It is more informative to normalize by columns (that is, where levels
of WtLost
add up to 1 within each Condition
)
because our main interest is in comparing the proportion of housekeepers
who lost weight between the two conditions. If the table were normalized
by rows, we would not see the proportion of housekeepers who lost
weight, but rather the proportion of those who lost weight who were in
each condition.
Recap of Visualizations
So far we have considered both quantitative (e.g.,
Thumb
) and categorical (e.g., WtLost
)
outcomes. We have also looked at some categorical explanatory variables
(e.g., Gender
and Condition
) and quantitative
explanatory variables (e.g., Height
).
We haven’t yet looked at any situations where there is a categorical outcome and a quantitative explanatory variable. But there isn’t any reason to think that we couldn’t! Perhaps a quantitative variable like age or initial weight might help us predict whether a housekeeper will lose weight or not.
Let’s review when each type of visualization is appropriate to use.
Variable | Visualization Type | R Code |
---|---|---|
Categorical |
Frequency Table Bar Graph |
tally
|
Quantitative |
Histogram Box Plot |
gf_histogram
|
Outcome Variable | Explanatory Variable | Visualization Type | R Code |
---|---|---|---|
Categorical | Categorical |
Frequency Table Faceted Bar Graph |
tally
|
Quantitative | Categorical |
Faceted Histogram Box Plot Jitter Plot Scatter Plot |
gf_histogram %>%
|
Categorical | Quantitative | ||
Quantitative | Quantitative |
Jitter Plot Scatter Plot |
gf_jitter
|
You have also learned a lot of R functions that you can use to create these visualizations of distributions of data. Even though we are only about halfway through chapter 4, you have learned most of the code we will use in the entire course!