If you find any typos, errors, or places where the text could be improved, please let us know by providing feedback either in the feedback survey (given during class), by using GitLab, or directly in this document with hypothes.is annotations.
5 Group project assignment
To maximize how much you learn and how much you will retain, you as a group will take what you learn in the course and apply it to create a reproducible project. This project, based on a simple data analysis of a dataset of your choice, will have a GitHub repository and an HTML document as a report to demonstrate the reproducibility of the analysis. The dataset cannot be the same as the one we used in class, and it must be an open dataset obtained from an online archive (we recommend a few below).
On the last day of the course, you as a group will give a ~10 min presentation of your analysis and report. Before the presentation, the lead instructor will download your project from GitHub and re-generate your report to check that it is reproducible. Then, during the presentation, you as a group will:
- Show your project on GitHub.
- Show your generated HTML report.
- Describe what you did and the reasons behind what you did.
- Explain any challenges you encountered or things you would do differently.
- Explain anything you liked and would do more of in the future.
When your group isn’t presenting, you as an audience member can participate in the discussion by:
- Asking questions to clarify anything you may not understand or are confused about.
- Give constructive feedback on what could have been improved on.
- Give concrete suggestions on how things they found challenging could have been handled.
5.1 Specific tasks
Throughout the course, you will work together as a group on a few exercises, especially the final exercises in each session that are not related to the assignment. Near the end of the course you will have dedicated time to work on your assignment and complete the tasks given below. Before you do any tasks though, decide as a group who will present the project at the end. We’d recommend assigning one or two people to be the “presenters”.
It is important to note that you will be using Git and GitHub to manage your group assignment, so you should take care to set this up correctly before doing any substantial work on the assignment. You are likely to push and pull a lot of content (which you will learn how to do later), so you will need to maintain regular and open communication with your team members.
For the group project, your specific tasks (based on lesson order) are to:
- Create a new project using the “New Project” setup with the necessary files
(covered in Management of R Projects)
by using the prodigenr package. Then assign one person to be the “coordinator”
to make this project and to complete the following tasks:
- Name the project the same as your team name (which is given for your group).
- Run the below functions from the usethis package to setup the project and to create the following files:
- Put the project under Git version control and upload to GitHub.
- The coordinator needs to add and commit all the files in the R Project.
- The coordinator must connect the project to GitHub and push (“upload”) the repository (covered in Version Control).
- All other team members must clone their team’s GitHub repository onto their computers.
- Each team member must open up the
README.mdfile and write a few sentences about themselves below the first Markdown header (
#). After saving the changes, add and commit them into the Git repository.
- Push and pull the changes and deal with the resulting merge conflicts.
- Find an open dataset to use for your analysis and report. You have two options: 1) choose from one of the datasets provided below; or 2) search for another dataset from an online data archive (e.g., figshare). If you choose the second option, make sure to select a dataset that isn’t too big (maybe max. 4 Mb) and contains some basic quantitative data that allows you to use some basic functions on the dataset. Don’t spend too much time on this task. Here are some datasets we’ve identified that would be good for the assignment:
Download the dataset and put it into the
data-raw/folder of your project. For the GitHub-linked datasets above, you should be able to click on the “Raw” or “Download” buttons, then use the keyboard shortcut (
Ctrl-S) to save the page (as a
.csvfile). It is advised that you push/pull appropriately and check that all team members can “see” the dataset. If you have documentation on the dataset, e.g. variable definitions, put those files in the
data-raw/folder as well.
Create an R script inside
data-raw/that cleans up and prepares the raw dataset (as covered in Data Management). Save the new dataset in a folder called
data/. Again, use Git to make and manage these changes. You can (and probably should) distribute the tasks of exploring the data and figuring out what should be or needs to be cleaned in the dataset. It also makes sure that you as a group have a good understanding of the data.
Create an R Markdown file named
doc/folder of your project (covered in Reproducible Documents). Do some simple analyses of the dataset in this report file and do the tasks below. Distribute the tasks so each team member is (mostly) doing something different. Keep using the “Git workflow” by adding to the staged area, commit, push, and pull the changes you’ve made. You’ll likely deal with merge conflicts, which is a good chance to practice some more with Git.
- Create section headers (e.g. “Introduction”, “Methods”, “Results”, “Discussion”).
- In a “Methods” section, write up a basic description of what the dataset is, where you got it from, and what you did to process or analyze the data.
- Create a table in the report that is generated from the data in a “Results” section.
- Create one R Markdown file for each team member, and each person works on their own file. Later this file will be merged into the final report.
- Try to “knit” your document to HTML often, to make sure the analysis is reproducible.
- Note: For now, do not add and commit the HTML output file.
Create R code chunks in the
report.Rmdfile that create one or more figures to visualize the cleaned dataset (covered in Data Visualization). A good place to get started on this is to explore your data by making some ggplot2 graphs and find a few you’d like to include in the report. It’s a good idea to distribute exploration to each group member, so that each member gets practice in making ggplot2 graphs.
- Each member creates an R script in the
R/folder to start visually exploring the data. Name these new files with your name at the start and end it with
- Add, commit, and push these files to your GitHub repository.
- Find some visually interesting and insightful plots that you could show in the presentation and in the report.
- Decide which figures to include in the report and presentation.
- Save these figures in the
- Add, commit, and push the new images to GitHub.
- Each member creates an R script in the
Add discussion items to the
report.Rmdfile, where you write up a few sentences on some things you liked about doing the project with the tools you learned, and a few sentences on some challenges you had in a “Discussion” section. Add your thoughts as a group as well.
Generate an HTML of the report and commit it to Git, then push it up to GitHub. Include all the updated code and files on GitHub for the presentation.
These tasks may seem like a lot, with a lot of new terminology and tools to use. But don’t worry! We will be going over many of these topics and you will have time to complete the project over the three days.
At the end, the lead instructor will download each of the teams’ Git projects, knit the R Markdown documents, and show them on the screen for each team to present on.
5.2 Quick “checklist” for a good project
- Project used Git and is on GitHub.
- Included a good README describing the project and the team.
- Separated “raw data” from “cleaned data”.
- Used scripts to clean the data.
- Included R code within an R Markdown file to show results and make figures.
- Wrote about methods and datasets.
- Went over challenges and general experiences.
- Generated an HTML file from an R Markdown file.
5.3 Expectations for the project
What we expect you to do for the group project:
- Use Git and GitHub throughout your work.
- Work collaboratively as a group and share responsibilities and tasks.
- Use as much of what we covered in the course to practice what you learned.
What we don’t expect:
- Complicated analysis or coding. The simpler it is, the easier is to for you to do the coding and understand what is going on. It also helps us to see that you’ve practiced what you’ve learned.
- Clever or overly concise code. Clearly written and readable code is always better than clever or concise code. Keep it simple and understandable!
Essentially, the group project is a way to reinforce what you learned during the course, but in a more relaxed and collaborative setting.