rNOAA

In order to test our hypothesis, we are going to get some climate data from NOAA.

## Installing package into '/home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)

Next, we will get an API key for accessing government data. Go to this website to obtain one. We will save this key in a file called .rprofile. API keys are basically like passwords between ourselves and a website. They enable us to access data securely.

Once this is complete, we can use the package. Here is the first command you will use:

library(rnoaa)
ncdc_locs(locationcategoryid='CITY', sortfield='name', sortorder='desc')

What information do you think you should get from this command?

We can get local weather, basically. But we don’t really know what weather station we need! Let’s see if we can find out. First, let’s install the lawn package.

install.packages("lawn", dependencies = TRUE)

Now we’ll try using it.

library("lawn")
lawn_bbox_polygon(c(-122.2047, 47.5204, -122.1065, 47.6139)) %>% view

How could you use Google Maps to get the information to make a box that covers Portal, AZ? Try it!

library("lawn")
lawn_bbox_polygon(c(-114, 32, -115, 31)) %>% view

OK, let’s give those coordinates to rNOAA and see if we have weather stations in there.

ncdc_stations(extent = c(Your coordinates here!))

Does this work? If no, try expanding.

Now, once you have your base station, try pulling the data for it:

ncdc(datasetid='NORMAL_DLY', stationid=?, datatypeid=?, startdate = ?, enddate = ?)

Have a look at the surveys data set to see when you should start and stop.

We then found a good station and pulled the minimum and maximum temperatures for all dates in the database:

temp_data <- meteo_tidy_ghcnd(stationid = "USW00003145", var = c("tmin", "tmax"))

And converted the temperatures to Fahrenheit while plotting them to get a sense of temperature variation over time:

And we fit a linear model to see if temperature was increasing over time:

We will want to merge these data with the data from surveys.csv to look at mammal size trends over time. Which means we need a column on which to merge. Dates make sense. So then, we massaged the dates in surveys.csv into a more usable format.

library(tidyverse)
## ── Attaching packages ──────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Parsed with column specification:
## cols(
##   record_id = col_double(),
##   month = col_double(),
##   day = col_double(),
##   year = col_double(),
##   plot_id = col_double(),
##   species_id = col_character(),
##   sex = col_character(),
##   hindfoot_length = col_double(),
##   weight = col_double(),
##   genus = col_character(),
##   species = col_character(),
##   taxa = col_character(),
##   plot_type = col_character()
## )
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

Now, we’ll merge the two data sources. Have a look at the merge function. Try it.

## Parsed with column specification:
## cols(
##   id = col_character(),
##   date = col_date(format = ""),
##   tmax = col_double(),
##   tmin = col_double()
## )
merged_data <- merge(noaa_data , new_surveys)

Next, plot an animal body size measure by temp. Is there a relationship? What about body size measures by date?