HomeworkTwo.Rmd
First, we will use a modified version of surveys.csv, which has some strange values entered for NAs. Download like so:
download.file(url = "https://raw.githubusercontent.com/BiologicalDataAnalysis2019/2021/master/homeworks/HomeworkTwo.Rmd", destfile = "/cloud/project/homeworks/HomeworkTwo.Rmd")
download.file("https://raw.githubusercontent.com/BiologicalDataAnalysis2019/2021/master/homeworks/HW2_data/surveys_odd_NA_values.csv", destfile = "/cloud/project/data/surveys_odd_NA_values.csv")
In your RStudio interface, you will note that there is now a “Homeworks” directory. In it, you will find “HomeworkTwo.Rmd”. In your RStudio instance, open it.
Each question will direct you to perform a task. Each question that expects code as an answer will have a space for you to enter the code.
You are welcome, and even encouraged, to work with a partner. I do ask, though, that every member submits their own homework. To submit your homework, simply save it. I will see it.
Load in the surveys.csv
data file that is located in your homework folder, and save it to a variable called surveys
. Use the read_csv function in the tidyverse
package to do this.
#Enter your answer for Question 1 here
Have a look at the copy of surveys.csv
. You will notice that there are some unusual NA values. Particularly, the “species_id” column has some odd values. Look at the help page for read_csv
. Can you find a way to read these unusual values as NAs? Are you able to process both NA and the odd value as NAs? Try it out.
#Enter your answer for Question 2 here
Remove the NA values from the hindfoot_length
column using a pipe and a filter.
#Enter your answer for Question 3 here
Verify that the NA values were removed. Don’t do this by looking at the column - use code.
#Enter your answer for Question 4 here
Explain the logic of your answer to Question Four. How did you accomplish this?
Imagine you are testing the hypothesis that mammal body sizes will be larger under climate change to decrease surface to body size ratio. First, what columns in the dataframe will you use to address this question?