Homework Five: Due Sept 30 at 5 pm.

For this first part of the exam, you can either use surveys_complete.csv or your own data. If you are using your own data, you must have data in which you think you have a numerical predictor variable and a numerical response variable. If you are using surveys_complete, you can use weight and hindfoot_length for this.

  1. Load in your data. Which variable will you be using as a predictor, and which as a response? (2 pts)
# read in data here
# Answer which column is predictor and which is response
  1. Plot the two against each other with a scatter plot. Do the data appear to be related linearly? (2 pts)
# Plot here
#Answer here
  1. Fit the linear model. View the summary. (2 pts)
# Code here
  1. Does the summary make sense? Does our model have good predictive power? Evaluate the residual standard error, intercept, and R-Squared in particular. (2 pts)
# Answer here
  1. Plot the model on the graph. Increase the size of the text so it is comfortably readable at 5 feet. (2 pts)
# Code here

MS Students

I’m going to have you start outlining what will eventually be your final project. In homework four, I informed you that your final R package will need 5 functions in it. Now, I need you to start describing them. For your data cleaning and either plotting or statistical test, tell me the following. You may attach a dataset if needed:

  • Your data cleaning function: What does it need to do? What will be the expected input and output? For example, perhaps you have an Excel data set with multiple sheets, and you want to output one single dataframe with all the NAs removed.
  • Plotting: Which data columns are you looking to plot? Or, if it’s not a dataframe you’re looking to plot from, what other data are you plotting? What is the expected plot to be produced? Histogram? Map? + Statistical test: What is the hypothesis you’ll be testing, and what data will you use?