Plotting and choices

10 points, due Sept. 13 at 5 PM

download.file(url = "https://raw.githubusercontent.com/BiologicalDataAnalysis2019/2023/main/vignettes/homework_3.Rmd", destfile = "/cloud/project/homeworks/homework_3.Rmd")
## Warning in download.file(url =
## "https://raw.githubusercontent.com/BiologicalDataAnalysis2019/2023/main/vignettes/homework_3.Rmd",
## : URL
## https://raw.githubusercontent.com/BiologicalDataAnalysis2019/2023/main/vignettes/homework_3.Rmd:
## cannot open destfile '/cloud/project/homeworks/homework_3.Rmd', reason 'No such
## file or directory'
## Warning in download.file(url =
## "https://raw.githubusercontent.com/BiologicalDataAnalysis2019/2023/main/vignettes/homework_3.Rmd",
## : download had nonzero exit status
  1. Read in the surveys.csv dataset. Remove NAs from the datasets.

  2. In this homework, we’re going to explore how different ways of visualizing the same data. First, take a look at geom_col(). Group the surveys data by the species_id column and count how many of each species there are. Use this count data to make a bar plot of the counts per species.

#Answer here
  1. Take a look at the argument fct_reorder. It reorders variables on one or both axes. Try using this to plot the species in order from most to least members.
# Answer here
  1. What we canonically think of as a bar plot can also be made in R. These can have some interesting properties, such as being able to fill in bars by other aesthetics. Using the surveys dataset, try to plot the number of members of each species, with the bar filled in by sex.
ggplot(______, mapping = aes(______, fill=____)) +
  geom_col()
  1. geom_col accepts various arguments. Try postion="dodge" or position=“stack”. How does this change the plot and how you interpret it?
#Answer here.
  1. Even a simple histogram can lead to different interpretations. Make a histogram of hindfoot_lengths in the surveys dataset. Try choosing different binwidths. How does a high bindwith (like 100) this change your interpretation of how the data are distributed?
#Answer here.
  1. Next, let’s take a look at density plots. First, look at the help for geom_density. Density plots are like a smoothed histogram, mostly used for continuous data. But how density is calculated is done using what is called a kernel. To get a sense for what this means, try different kernel types. Some common ones are “triangular”, “rectangular”, and “gaussian.” Try them out, and put the one you think best represents the data in the answer below.
ggplot(______, aes(______)) +
  geom_density(fill = "blue", binwidth = 10, kernel ="______")

Graduate Students:

A common set of journal figure requirements several of you submitted include the following:

  • PDF or PNG
  • At least 300 DPI
  • One column (80 mm) or two column (160mm) wide.

For each of your plots on the above homework, save figures meeting all requirements in your lastname_directory in a directory called output.

As a longer-term goal, you each picked a few figures. See if you can find the data that the authors used to make the figures. One of the best ways to learn to make good figures in your field is to copy from what’s already published.

What data did the author’s use?

Answer here

Is the data you’re using for class similar to these data?

Answer here

Could you process your data to be similar to these data?

Answer here

Finally, in the final project in this class, you’ll need to produce an R package with five functions:

  • One or two for data cleaning and/or data manipulation
  • One or two for a statistical test (ANOVA, linear models, other specialized analyses)
  • One or two for plotting
  • One of your choice for funsies.

In next week’s homework, you’ll be expected to pick and write one. So start thinking about one now.