Homework Five: Due Sept 30 at 5 pm.
For this first part of the exam, you can either use
surveys_complete.csv
or your own data. If you are using
your own data, you must have data in which you think you have a
numerical predictor variable and a numerical response variable. If you
are using surveys_complete
, you can use weight and
hindfoot_length for this.
- Load in your data. Which variable will you be using as a predictor,
and which as a response? (2 pts)
# Answer which column is predictor and which is response
- Plot the two against each other with a scatter plot. Do the data
appear to be related linearly? (2 pts)
#Answer here
- Fit the linear model. View the summary. (2 pts)
- Does the summary make sense? Does our model have good predictive
power? Evaluate the residual standard error, intercept, and R-Squared in
particular. (2 pts)
# Answer here
- Plot the model on the graph. Increase the size of the text so it is
comfortably readable at 5 feet. (2 pts)
# Code here
MS Students
I’m going to have you start outlining what will eventually be your
final project. In homework four, I informed you that your final R
package will need 5 functions in it. Now, I need you to start describing
them. For your data cleaning and either plotting or statistical
test, tell me the following. You may attach a dataset if needed:
- Your data cleaning function: What does it need to do? What will be
the expected input and output? For example, perhaps you have an Excel
data set with multiple sheets, and you want to output one single
dataframe with all the NAs removed.
- Plotting: Which data columns are you looking to plot? Or, if it’s
not a dataframe you’re looking to plot from, what other data are you
plotting? What is the expected plot to be produced? Histogram? Map? +
Statistical test: What is the hypothesis you’ll be testing, and what
data will you use?