1.7 Selecting & Filtering Data
Sometimes you want to focus on a subset of the variables in a data
frame. For example, you might want to look at just the variables
PriceK and PriceR in the Ames
data frame. PriceK represents the sale price of the home in
thousands of dollars. PriceR represents the sale price in
dollars.
We can use the select() function to look at just a
subset of variables. When using select(), we first need to
tell R which data frame, then which variables to select from that data
frame.
select(Ames, PriceK, PriceR)
Modify the select() code below to take a look at just
the following variables in Ames: PriceK,
PriceR, and Neighborhood.
require(coursekata)
# Modify this code
select(Ames, ...)
# Modify this code
select(Ames, PriceK, PriceR, Neighborhood)
ex() %>% check_output_expr("select(Ames, PriceK, PriceR, Neighborhood)")Running the select() function will print out the values
of the selected variables for every case. If you want to just look at
the first six rows you can combine the head() and
select() functions like this:
head(select(Ames, PriceK, PriceR, Neighborhood)).
PriceK PriceR Neighborhood
1 260 260000 CollegeCreek
2 210 210000 CollegeCreek
3 155 155000 OldTown
4 125 125000 OldTown
5 110 110000 CollegeCreek
6 100 100000 OldTown
Whereas select() gives you a subset of
variables (or columns of the data frame), the
filter() function will give you a subset of
observations (or rows) of the data frame based on some
criteria. For example, here is some code that will return only the
observations where the sale price is greater than $300,000:
filter(Ames, PriceK > 300)
Edit the code below to filter for homes that cost more than 300K.
require(coursekata)
# Modify this code
filter()
# Modify this code
filter(Ames, PriceK > 300)
ex() %>% check_output_expr("filter(Ames, PriceK > 300)")
YearBuilt YearSold Neighborhood HomeSizeR HomeSizeK LotSizeR LotSizeK Floors
1 2007 2007 CollegeCreek 2696 2.696 9965 9.965 2
2 2004 2007 CollegeCreek 2000 2.000 10386 10.386 1
3 2000 2009 CollegeCreek 2153 2.153 11050 11.050 2
4 2006 2007 CollegeCreek 2828 2.828 9965 9.965 2
BuildQuality Foundation HasCentralAir Bathrooms Bedrooms TotalRooms
1 7 PouredConcrete 1 2 4 10
2 8 PouredConcrete 1 2 3 8
3 9 PouredConcrete 1 2 3 8
4 8 PouredConcrete 1 3 4 11
KitchenQuality HasFireplace GarageType GarageCars PriceR PriceK
1 Excellent 1 Attached 3 383970 383.97
2 Good 0 Attached 3 305900 305.90
3 Excellent 1 Attached 3 313000 313.00
4 Good 1 Attached 3 424870 424.87
The function filter(), like select(),
returns a data frame. In this case, the data frame only has four rows
because only four observations in Ames had sale prices
greater than $300K.