Want to help out or contribute?

If you find any typos, errors, or places where the text could be improved, please let us know by providing feedback either in the feedback survey (given during class), by using GitLab, or directly in this document with hypothes.is annotations.

  • Open an issue or submit a merge request on GitLab with the feedback or suggestions.
  • Hypothesis Add an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page.

5 Group project assignment

To maximize how much you learn and how much you will retain, you as a group will take what you learn in the course and apply it to create a reproducible project as a GitHub repository and report as an HTML document based on a simple data analysis of a dataset of your choice. The dataset cannot be the same as the one we used in class, and must be an open dataset obtained from an online archive (we recommend a few below).

On the last day of the course, you as a group will give a ~10 min presentation of your analysis and report. Before the presentation, the lead instructor will download your project from GitHub and re-generate your report to check that it is reproducible. Then, during the presentation, you as a group will:

  • Show your project on GitHub.
  • Show your generated HTML report.
  • Describe what you did and the reasons behind what you did.
  • Explain any challenges you encountered or things you would do differently
  • Explain anything you liked and would do more of in the future

When your group isn’t presenting, you as an audience member can participate in the discussion by:

  • Asking questions to clarify anything you may not understand or are confused about.
  • Give constructive feedback on what they could have improved on.
  • Give concrete suggestions on how they could handle the things they found challenging.

5.1 Specific tasks

Throughout the course, you will work together as a group on a few exercises, especially the final exercises in each session that are not related to the assignment. Near to the end of the course you will have dedicated time to work on your assignment and complete the tasks given below. Before you do any tasks though, decide as a group who will present the project at the end. We’d recommend assigning one or two people to be the “presenters”.

It is important to note that you will be using Git and GitHub to manage your group assignment, so you should take care to set this up correctly prior doing any substantial work on the assignment. You are likely to push and pull a lot of content (which you will learn how to do later), so you will need to maintain regular and open communication with your team members.

For the group project, your specific tasks (based on lesson order) are to:

  1. Create a new project using the “New Project” setup with the necessary files (covered in Management of R Projects) by using the prodigenr package.
    • Assign one person to be the “coordinator” to make this project.
    • Name the project the same as your team name (which is given for your group).
    • Run these functions from the usethis package to setup the project and to create these files: r usethis::use_blank_slate() usethis::use_data_raw("original-data") usethis::use_r("generate-figures")
  2. Put the project under Git version control and upload to GitHub.
    • The coordinator needs to add and commit all the files in the R Project.
    • The coordinator must connect the project to GitHub and push (“upload”) the repository (covered in Version Control).
    • All other team members must clone their team’s GitHub repository onto their computers.
    • Each team member must open up the README.md file and write a few sentences about themselves below the first Markdown header (#). After saving the changes, add and commit them into the Git repository.
    • Push and pull the changes and deal with the resulting merge conflicts.
  3. Find an open dataset to use for your analysis and report. You have two options here: 1) choose from one of the datasets provided below; or 2) search for another dataset from an online data archive (e.g., figshare). If you choose the second option, make sure to select a dataset that isn’t too big (maybe max. 4 Mb) and contains some basic quantitative data that allows you to use some basic functions on. Don’t spend too much time on this task. Here are some datasets we’ve identified that would be good for the assignment:
  1. Download the dataset and put it into the data-raw/ folder of your project. For the GitHub-linked datasets above, you should be able to click on the “Raw” or “Download” buttons, then use the keyboard shortcut (Ctrl-S) to save the page (as a .csv file). It is advised that you push/pull appropriately and check that all team members can “see” the dataset. If you have documentation on the dataset, e.g. variable definitions, put those files in the data-raw/ folder as well.

  2. Create an R script inside data-raw/ that cleans up and prepares the raw dataset (as covered in Data Management). Save the new dataset in a folder called data/. Again, use Git to make and manage these changes. You can (and probably should) distribute the tasks of exploring the data and figuring out what should be or needs to be cleaned in the dataset. It also makes sure that you as a group have a good understanding of the data.

  3. Create an R Markdown file named report.Rmd in the doc/ folder of your project (covered in Reproducible Documents). Do some simple analyses of the dataset in this report file and do the tasks below. Distribute the tasks so each team member is (mostly) doing something different. Keep using the “Git workflow” by adding to the staged area, commit, push, and pull the changes you’ve made. You’ll likely deal with merge conflicts, which is a good chance to practice with Git more.

    • Create section headers (e.g. “Introduction”, “Methods”, “Results”, “Discussion”).
    • Write up a basic description of what the dataset is and where you got it and what you did to process or analyze the data in a “Methods” section.
    • Create a table in the report that is generated from the data in a “Results” section.
    • Create one R Markdown file for each team member, and each person works on their own file. Later this file will be merged into the final report.
    • Try to “knit” your document to HTML often, to make sure the analysis is reproducible.
    • Note: For now, do not add and commit the HTML output file.
  4. Create R code chunks in the report.Rmd file that create one or more figures to visualize the cleaned dataset (covered in Data Visualization). A good place to get started on this is to explore your data first by making some ggplot2 graphs and find a few you’d like to include in the report. It’s a good idea to distribute exploration to each group member, so that each member gets practice making ggplot2 graphs.

    • Each member creates an R script in the R/ folder to start visually exploring the data. Name these new files with your name at the start and end it with -exploring.R.
    • Add, commit, and push these files to your GitHub repository.
    • Find some visually interesting and insightful plots that you could show in the presentation and in the report.
    • Decide which figures to include in the report and presentation.
    • Save these figures in the doc/images/ folder.
    • Add, commit, and push the new images to GitHub.
  5. Add discussion items to the report.Rmd file:

    • Write up a few sentences on some things you liked about doing the project with the tools you learned and a few sentences on some challenges you had in a “Discussion” section. Add your thoughts as a group.
  6. Generate an HTML of the report and commit it to Git, then push it up to GitHub. Include all the updated code and files on GitHub for the presentation.

These tasks may seem like a lot, with a lot of new terminology and tools to use. But don’t worry! We will be going over many of these topics and you will have time to complete the project over the three days.

At the end, the lead instructor will download each of the teams’ Git projects, knit the R Markdown documents, and show them on the screen for each team to present on.

5.2 Quick “checklist” for a good project

  • Project used Git and is on GitHub.
  • Included a good README describing the project and the team.
  • Separated “raw data” from “cleaned data”.
  • Used scripts to clean the data.
  • Included R code within an R Markdown file to show results and make figures.
  • Wrote about methods and datasets.
  • Went over your challenges and general experiences.
  • Generated an HTML file from an R Markdown file.

5.3 Expectations for the project

What we expect you to do for the group project:

  • Use Git and GitHub throughout your work.
  • Work collaboratively as a group and share responsibilities and tasks.
  • Use as much of what we covered in the course to practice what you learned.

What we don’t expect:

  • Complicated analysis or coding. The simpler it is, the easier is to for you to do the coding and understand what is going on. It also helps us to see that you’ve practiced what you’ve learned.
  • Clever or overly concise code. Clearly written and readable code is always better than clever or concise code. Keep it simple and understandable!

Essentially, the group project is a way to reinforce what you learned during the course, but in a more relaxed and collaborative setting.