Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentAlgebra + Data Science
-
segmentChapter 1 - Exploring Variation in Data
-
1.9 Manipulating Data Frames: arrange() and mutate()
-
segmentChapter 2 - Modeling Data with Functions
-
segmentChapter 3 - Assessing How Well Models Fit the Data
-
segmentResources
list High School / Algebra + Data Science (G)
1.9 Manipulating Data Frames: arrange()
& mutate()
arrange()
Cases in Order
In many data sets, we might want to order the cases in some way. For example, in
SelectWorld
, we might want to know which country had, on average, the shortest girls in 1900. To find out, we can simply arrange the data frame in order using this code:
arrange(SelectWorld, GirlsH1900)
Write some code below to arrange the countries in order by GirlsH1900
, and then save the resulting data frame as a new object called ArrangeWorld
.
require(coursekata)
SelectWorld <- select(World, Country, LifeExpectancy, GirlsH1900, GirlsH1980)
# Save the arranged data frame
arrange(SelectWorld, GirlsH1900)
# Write code to print out first 6 rows of new data frame
# Save the arranged data frame
ArrangeWorld <- arrange(SelectWorld, GirlsH1900)
# Write code to print out first 6 rows of new data frame
head(ArrangeWorld)
ex() %>% {
check_object(., "ArrangeWorld") %>% check_equal()
check_output_expr(., "head(ArrangeWorld)")
}
The countries are arranged from shortest to tallest (according to how tall 18-year-old girls were in 1900).
Country LifeExpectancy GirlsH1900 GirlsH1980
1 Guatemala 69.7 140.9926 149.0530
2 El Salvador 71.3 142.0544 153.6128
3 Bangladesh 63.1 142.1550 151.0859
4 Peru 70.7 142.2386 152.2495
5 South Korea 77.9 143.2104 160.9055
6 Japan 82.3 143.3583 158.5073
The function arrange()
can also be used to arrange values in the opposite order (descending from tallest to shortest). Adding a negative sign (-
) in front of the variable will arrange the data so that the tallest girl countries appear at the top.
arrange(SelectWorld, -GirlsH1900)
Out of curiosity, do you think that the countries with the shortest girls are the same countries in 1980? Write some code below to find out.
require(coursekata)
SelectWorld <- select(World, Country, LifeExpectancy, GirlsH1900, GirlsH1980)
# Modify this code to see the shortest girl countries of 1980
head(arrange(SelectWorld, GirlsH1900))
# Modify this code to see the shortest girl countries of 1980
head(arrange(SelectWorld, GirlsH1980))
ex() %>% check_output_expr("head(arrange(SelectWorld, GirlsH1980))")
Country LifeExpectancy GirlsH1900 GirlsH1980
1 Guatemala 69.7 140.9926 149.0530
2 Philippines 71.0 148.1826 149.3036
3 Bangladesh 63.1 142.1550 151.0859
4 Nepal 62.6 144.6591 151.1820
5 Laos 63.2 145.1294 151.5993
6 Indonesia 69.7 145.0053 151.7019
mutate()
to Create New Variables
If you want to create a new variable, you can use
mutate()
. For example, in SelectWorld
, we might want to create a variable to indicate how much taller girls in each country were, on average, in 1980 compared with 1900. For example, girls in Peru averaged 152 cm tall in 1980 and 142 cm in 1900. We’d like to make a variable, which we might call GirlsHeightChange
, that would have a value of 10 for Peru, indicating that girls in Peru got taller by about 10 cm during those 80 years.
We can create a data frame with a new variable by using the mutate()
function, like this:
mutate(SelectWorld, GirlsHeightChange = GirlsH1980 - GirlsH1900)
Try running the code below. Try to arrange it in order, from countries with girls’ heights that changed the most to those that changed the least.
require(coursekata)
SelectWorld <- select(World, Country, LifeExpectancy, GirlsH1900, GirlsH1980)
NewWorld <- mutate(SelectWorld, GirlsHeightChange = GirlsH1980 - GirlsH1900)
# write code to arrange the data frame
NewWorld <- mutate(SelectWorld, GirlsHeightChange = GirlsH1980 - GirlsH1900)
# write code to arrange the data frame
arrange(NewWorld, -GirlsHeightChange)
ex() %>% check_output_expr("arrange(NewWorld, -GirlsHeightChange)")
Here is the head()
of this newly arranged data frame:
Country LifeExpectancy GirlsH1900 GirlsH1980 GirlsHeightChange
1 South Korea 77.9 143.2104 160.9055 17.69510
2 Japan 82.3 143.3583 158.5073 15.14893
3 Croatia 75.3 151.1788 165.8835 14.70473
4 Czech Republic (Czechia) 75.9 153.6532 167.5305 13.87726
5 Netherlands 79.2 155.8199 168.9368 13.11695
6 Greece 78.9 151.0368 163.8708 12.83408
Summary of Data Manipulation Functions
select()
|
selects a few variables (i.e., a few columns) |
|
filter()
|
filters for particular cases (i.e., particular rows) |
|
arrange()
|
arrange the cases according to a particular variable (i.e, arranges rows in order) |
|
mutate()
|
creates new variables (i.e., creates new columns) |
|