If you find any typos, errors, or places where the text could be improved, please let us know by providing feedback either in the feedback survey (given during class), by using GitLab, or directly in this document with hypothes.is annotations.
- Open an issue or submit a merge request on GitLab with the feedback or suggestions.
Add an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page.
2 Syllabus
Reproducibility and open scientific practices are increasingly demanded of scientists and researchers. Training on how to apply these practices in data analysis has not kept up with demand. With this course, we hope to begin meeting that demand. Using a very practical approach based mostly on code-along sessions (instructor and learner coding together), the course will:
- Explain what an open and reproducible data analysis workflow is, what it looks like, and why it is important.
- Explain and demonstrate why R is rapidly becoming the standard program of choice for doing modern data analysis in science.
- Demonstrate and apply collaborative tools and techniques when working in team settings (including working with your future self).
- Show and apply the fundamental tools and skills for conducting a reproducible and modern analysis for a research project.
- Show where to go to get help and to continue learning modern data analysis skills.
In this course, we’ll be addressing the following questions:
- What is R, why should I use it, and how do I use it?
- What does a modern data analysis setup and workflow look like?
- What is reproducibility and how is it different from replicability?
- How can I ensure my data analysis project is reproducible?
- How can I import and work with my data in R?
- How can I visualize my data and make publication-quality figures?
- Why should I and how can I keep track of changes to my analysis files?
- How can I write reports to document, describe, and present analyses in a reproducible way?
By the end of the course, participants will have a basic level of proficiency in using the R statistical computing language, enabling them to improve their data and code literacy, and to conduct a modern and reproducible data analysis. The course will place particular emphasis on research in diabetes and metabolism; it will be taught by instructors working in this field and it will use relevant examples where possible.
2.1 Is this course for you?
To help manage expectations and develop the material for this course, we make a few assumptions about who you are as a participant in the course:
- You are a researcher, likely working in the biomedical field (ranging from experimental to epidemiology).
- You currently or will soon do some quantitative data analysis.
- You:
While we have these assumptions to help focus the content of the course, if you have an interest in learning R but don’t fit any of the above assumptions, you are still welcome to attend the course! We welcome everyone, that is until the course capacity is reached.
In addition to the assumptions, we also have a fairly focused scope for teaching and expectations for learning. So this may also help you decide if this course is for you.
- We do teach how to use R, starting from the very basics and targeted to beginners.
- We do not teach statistics (these are already covered by most university curriculums).
- We do teach from a team science, reproducible research, and open scientific perspective (i.e. by including a collaborative group project that uses a transparent and reproducible analysis workflow).
- We do teach using practical, applied, and hands-on lessons and exercises, with a few short lectures that introduce a topic.
2.2 General schedule
The course is structured as a series of participatory live-coding sessions interspersed with hands-on exercises and group work, using either a practice dataset or some other real-world dataset. There are some lectures given, mainly at the start and end of the course. The general schedule outline is shown in the below table. This is not a fixed schedule of the timings of each session — some may be shorter and others may be longer. Instead, it is meant to be an approximate guide and overview.
Date and time | Session topic | Type |
---|---|---|
Day 1 | ||
Arrival. Coffee, tea, and snacks | ||
10:00 | Introduction to the course | Lecture |
10:30 | Management of R projects (with short break) | Code-along |
12:30 | Lunch | |
13:30 | Collaboration and teamwork in research | Lecture |
14:00 | Version control and collaborative practices | Code-along |
14:30 | Break with coffee, tea, and snacks | |
15:30 | Version control and collaborative practices | Code-along |
17:30 | End-of-day short survey | |
Day 2 | ||
9:00 | Data management and wrangling | Code-along |
10:15 | Break with coffee, tea, and snacks | |
10:30 | Data management and wrangling (with short break) | Code-along |
12:15 | Lunch | |
13:15 | Research in the era of (ir)reproducibility and open science | Lecture |
14:00 | Creating reproducible documents | Code-along |
14:45 | Break with coffee, tea, and snacks | |
15:00 | Creating reproducible documents | Code-along |
17:00 | End-of-day short survey | |
Day 3 | ||
9:00 | Data visualizations | Code-along |
10:15 | Break with coffee, tea, and snacks | |
11:00 | Data visualizations (with short break) | Code-along |
12:15 | Lunch | |
13:15 | Group work | |
15:30 | Presentation of projects, and discussions | |
16:45 | Closing remarks and short survey | |
17:00 | Farewell |