ProjectOne.Rmd
Below are the questions for the first data practical assignment. This project uses the “FossilAnts.csv” file, located in the data directory for the project. The point value of each question is denoted next to it. A blank cell is below each for your answer; feel free to create more blank cells as needed.
projects_lastname
(sub your last name for lastname), and in it, a subdirectory called project_one_lastname
. Use download.files
to download the instructions: https://raw.githubusercontent.com/BiologicalDataAnalysis2019/2020/master/vignettes/ProjectOne.Rmd
and the data
tidyverse
package and load the data using read_csv
. The data for this part of the practical is located in the data directory. Save the data in a variable called project_dat
. Print the data to the screen to ensure it loaded correctly.#Enter Your Answer Here
# Answer here
as.character
function.#Answer Here
Tribe
column. In your opinion, are these intelligent missing values for the dataset? Why or why not? If not, how would you like to change them?#Answer here
na_if
, which replaces nonstandard NA values. Please first look at the help page for na_if
.#Answer here
# Answer here
separate
function.# Show how you would pull up the help
#Answer here
Next, we will test a hypothesis. Your hypothesis is that there are more specimens in the 75 million years ago (mya) - 100 mya interval than the 30 mya to the present interval.
#Answer Here
#Answer here
#Answer here
project_one_data_output_lastname
. Save it as a csv file called “column_separated.csv”# Answer here
#Answer here
Do the undergrad part of the exam. It’s actually kind of hard? Find your name below for the additional part of the exam to complete.
Replace your ‘?’ with NAs.
Convert your date column to a date object with lubridate
. Store as a year column, month column, day column.
Save in your project one folder as “tyler_modified_data.csv”
If you were to group by Oiling Category, and calculate, say, the mean above ground biomass, some of your categories are going to be near zero because of all the zeroes. Replace your 0s with NAs. (Note - I’m aware I might have just described the correct outcome of your analysis. Humor me anyway.)
Convert your date column to a date object with lubridate
. Store as a year column, month column, day column.
Save in your project one folder as “ariel_modified_data.csv”
Which site has the highest mean diameter for the Acer rubrum in 2016? Does the same site have the largest diameter all years?
Which tree has the largest average diameter?
I’m going to have you do something a little different. You’re going to reshape your dataset. Ultimately, you will have four columns: Treatment, Specimen, Date Entered, Days until death. The final columns should be how many days the animal lived before death. If it was entered 3/13/20 and made it to 3/15/20, put ‘2’.
Export as a CSV and put on your RStudio server. Which treatment had the highest average days until death?
I’m looking at the SirenSpecimenRecords spreadsheet. Make sure when you read it in, the blanks read in as NA. If not, replace them.
Split your species column into genus and species columns, but retain what you’re currently calling species as “Specific Epithet”.
Save in your project one folder as “josh_modified_data.csv”
These spreadsheets are great. Very consistent.
What is the most commonly-collected species on here. I’m not sure if any of the species names are shared across genera (ie Genus_species, Genus1_species). If so, you might consider using unite()
to put the genus and species columns into one cell before counting.
Which site has the most observations?
readexl
library(readxl)
See if you can read in the measurements
spreadsheet in your Excel worksheet. Pick two or three columns in the ratios
spreadsheet and see if you can reproduce them in R here: