CourseKata - 1.9 Manipulating Data Frames: arrange() and mutate()

High School / Algebra + Data Science (G)

Book

1.9 Manipulating Data Frames: `arrange()` & `mutate()`

`arrange()` Cases in Order

In many data sets, we might want to order the cases in some way. For example, in SelectWorld, we might want to know which country had, on average, the shortest girls in 1900. To find out, we can simply arrange the data frame in order using this code:

arrange(SelectWorld, GirlsH1900)

Write some code below to arrange the countries in order by GirlsH1900, and then save the resulting data frame as a new object called ArrangeWorld.

require(coursekata)
SelectWorld <- select(World, Country, LifeExpectancy, GirlsH1900, GirlsH1980)

# Save the arranged data frame
arrange(SelectWorld, GirlsH1900)

# Write code to print out first 6 rows of new data frame

# Save the arranged data frame
ArrangeWorld <- arrange(SelectWorld, GirlsH1900)

# Write code to print out first 6 rows of new data frame
head(ArrangeWorld)

ex() %>% {
    check_object(., "ArrangeWorld") %>% check_equal()
    check_output_expr(., "head(ArrangeWorld)")
}

The countries are arranged from shortest to tallest (according to how tall 18-year-old girls were in 1900).

      Country LifeExpectancy GirlsH1900 GirlsH1980
1   Guatemala           69.7   140.9926   149.0530
2 El Salvador           71.3   142.0544   153.6128
3  Bangladesh           63.1   142.1550   151.0859
4        Peru           70.7   142.2386   152.2495
5 South Korea           77.9   143.2104   160.9055
6       Japan           82.3   143.3583   158.5073

The function arrange() can also be used to arrange values in the opposite order (descending from tallest to shortest). Adding a negative sign (-) in front of the variable will arrange the data so that the tallest girl countries appear at the top.

arrange(SelectWorld, -GirlsH1900)

Out of curiosity, do you think that the countries with the shortest girls are the same countries in 1980? Write some code below to find out.

require(coursekata)
SelectWorld <- select(World, Country, LifeExpectancy, GirlsH1900, GirlsH1980)

# Modify this code to see the shortest girl countries of 1980
head(arrange(SelectWorld, GirlsH1900))

# Modify this code to see the shortest girl countries of 1980
head(arrange(SelectWorld, GirlsH1980))

ex() %>% check_output_expr("head(arrange(SelectWorld, GirlsH1980))")

      Country LifeExpectancy GirlsH1900 GirlsH1980
1   Guatemala           69.7   140.9926   149.0530
2 Philippines           71.0   148.1826   149.3036
3  Bangladesh           63.1   142.1550   151.0859
4       Nepal           62.6   144.6591   151.1820
5        Laos           63.2   145.1294   151.5993
6   Indonesia           69.7   145.0053   151.7019

`mutate()` to Create New Variables

A 3 by 5 grid of gray squares to symbolize a data frame, where the top row is dark gray to indicate the column headers. To the right, is the same grid, but an extra column is added to the end and shaded in yellow to indicate the new column that is created when using the mutate function. If you want to create a new variable, you can use mutate(). For example, in SelectWorld, we might want to create a variable to indicate how much taller girls in each country were, on average, in 1980 compared with 1900. For example, girls in Peru averaged 152 cm tall in 1980 and 142 cm in 1900. We’d like to make a variable, which we might call GirlsHeightChange, that would have a value of 10 for Peru, indicating that girls in Peru got taller by about 10 cm during those 80 years.

We can create a data frame with a new variable by using the mutate() function, like this:

mutate(SelectWorld, GirlsHeightChange = GirlsH1980 - GirlsH1900)

Try running the code below. Try to arrange it in order, from countries with girls’ heights that changed the most to those that changed the least.

require(coursekata)
SelectWorld <- select(World, Country, LifeExpectancy, GirlsH1900, GirlsH1980)

NewWorld <- mutate(SelectWorld, GirlsHeightChange = GirlsH1980 - GirlsH1900)

# write code to arrange the data frame

NewWorld <- mutate(SelectWorld, GirlsHeightChange = GirlsH1980 - GirlsH1900)

# write code to arrange the data frame
arrange(NewWorld, -GirlsHeightChange)

ex() %>% check_output_expr("arrange(NewWorld, -GirlsHeightChange)")

Here is the head() of this newly arranged data frame:

                   Country LifeExpectancy GirlsH1900 GirlsH1980  GirlsHeightChange
1              South Korea           77.9   143.2104   160.9055           17.69510
2                    Japan           82.3   143.3583   158.5073           15.14893
3                  Croatia           75.3   151.1788   165.8835           14.70473
4 Czech Republic (Czechia)           75.9   153.6532   167.5305           13.87726
5              Netherlands           79.2   155.8199   168.9368           13.11695
6                   Greece           78.9   151.0368   163.8708           12.83408

Summary of Data Manipulation Functions

`select()`	selects a few variables (i.e., a few columns)
`filter()`	filters for particular cases (i.e., particular rows)
`arrange()`	arrange the cases according to a particular variable (i.e, arranges rows in order)
`mutate()`	creates new variables (i.e., creates new columns)

1.8 Manipulating Data Frames: select() and filter() 2.1 From Exploring Data to Modeling Data

Course Outline

High School / Algebra + Data Science (G)

1.9 Manipulating Data Frames: `arrange()` & `mutate()`

`arrange()` Cases in Order

`mutate()` to Create New Variables

Summary of Data Manipulation Functions

Responses

list High School / Algebra + Data Science (G)

1.9 Manipulating Data Frames: arrange() & mutate()

arrange() Cases in Order

mutate() to Create New Variables

Summary of Data Manipulation Functions

High School / Algebra + Data Science (G)

1.9 Manipulating Data Frames: `arrange()` & `mutate()`

`arrange()` Cases in Order

`mutate()` to Create New Variables