If you find any typos, errors, or places where the text could be improved, please let us know by providing feedback either in the feedback survey (given during class), by using GitLab, or directly in this document with hypothes.is annotations.
- Open an issue or submit a merge request on GitLab with the feedback or suggestions.
Add an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page.
10 Analytically reproducible documents
When in RStudio, quickly jump to this page using
r3::open_reproducible_documents()
.
Session objectives:
- Learn what a reproducible document is, how R Markdown helps with reproducibility, and why it can save you time and effort.
- Write and use R code within a document, so that it will automatically insert the R output into the final document.
- Learn about and use Markdown formatting and syntax for writing documents.
- Learn about and create different document types like HTML or Word from an R Markdown document.
10.1 Why try to be reproducible?
Take about 5 min to read over this section.
Both reproducibility and replicability are cornerstones for doing rigorous and sound science. As we’ve learned, reproducibility in science is fairly lacking, which this course aims to address. However, being reproducible isn’t just about doing better science. It can also mean that:
- You are much more efficient and productive, as less time is spent between coding and putting your results in a document. No need to copy and paste!
- You can be confident in your results, since what you report and show as figures or tables will be exactly what you get from your analysis. Again, no copying and pasting required!
Hopefully by the end of this session, you’ll start using R Markdown files for writing your manuscripts and other technical documents. Believe us, you can save so much time and make your analysis/work more reproducible, when you’ve learned how to incorporate text with R code. Plus you can create some very aesthetically appealing reports, way more easily than you could if you did it in Word.
.](images/art-reproducibility.png)
Figure 10.1: Have a more reproducible workflow by using R Markdown. Artwork by @allison_horst.
10.2 What is R Markdown?
Take about 5 min to read over this section.
R Markdown is a file format (a plain text format like R scripts or
.csv
files) that allows you to be more reproducible in your analysis and to be
more productive in your work. R Markdown is an extension of
Markdown that integrates R code with written text (as Markdown
formatting).
So, what is Markdown? It is a markup syntax and formatting tool, like HTML, that allows you write a document in plain text that can then be converted into a vast range of other document types, e.g. HTML, PDF, Word documents, slides, posters, or websites. In fact, this website is built from R and Markdown (plus other things like HTML)! The Markdown used in R Markdown is based on pandoc (“pan” means all and “doc” means document, so “all documents”). Pandoc is a very powerful, popular, and well-maintained software tool for document conversion. You can use R Markdown in conjunction with other packages (e.g., bookdown) to do any number of things. Here are some examples:
- Blogging (e.g., Yihui Xie’s blog, creator of many commonly-used R packages like rmarkdown) with the blogdown package.
- Writing your thesis (e.g., Tyson Barrett’s experience of using pure R Markdown versus Ed Cherry’s experience of using R Markdown with the bookdown package).
- Writing and formatting journal articles (e.g., a reproducibility project).
- Writing book chapters (including the guide to bookdown).
For now, we’re going to focus on the main reason to use it: to incorporate R code and output into a document. By using R code in a document, you can have a seamless integration between data analysis and document-writing.
Why would you use this? There are many reasons, with some of them being:
- There is less time between exploring a new dataset or analysis and sharing your findings with collaborators, because the writing and documenting is woven in with your code for analysis.
- If you have already produced a report and later get new data or find out there are problems with the existing data, updating your report is as easy as clicking a button to regenerate it.
- How you got and present your results is based on the exact sequence of steps given in your R Markdown document, so showing others how the analysis was done is easy because the how is explicitly shown in the document.
- Likewise, by reading others’ R Markdown documents, it is easier to learn what was done in their analysis because the logic and sequence is shown in the document itself.
Let’s take a look at R Markdown together.
10.3 Creating an R Markdown file
Now, we will create and save an R Markdown file.
Go to File -> New File -> R Markdown
,
and a dialog box will then appear.
Enter “Reproducible documents” in the title section
and your name in the author section.
Choose HTML as the output format.
Then save this file as rmarkdown-session.Rmd
in the doc/
folder.
Now we have an R Markdown file! Inside the file, there is some text that gives a brief overview of how to use the R Markdown file. For now, let’s ignore the text.
At the top of the R Markdown file, you will see something that looks a bit like this:
---
title: "Reproducible documents"
author: "Your Name"
date: "6/18/2020"
output: html_document
---
This section is called the YAML header
and it contains commands and meta-data about the document.
Most Markdown documents have this YAML header at the top of the document
and they are always surrounded by ---
on the top and bottom of the section.
YAML is a data format that has the form of a key: value
pairing to store data.
The keys in this case are title
, author
, date
, and output
.
The values are those that follow the key (e.g. “Your Name” for author
).
In the case of R Markdown, these key
data are used to store the settings
that R Markdown will use to create the output
document.
The keys listed above are some of many settings that R Markdown has available to use.
In the case of this YAML header,
the R Markdown document will generate an HTML file because of the output: html_document
setting.
You can also create a word document with output: word_document
.
While PDF documents are also able to be created,
they require installing LaTeX through the R package tinytex,
which can sometimes be complicated to install.
So we will only cover HTML and Word documents in this course.
So, how do we create a HTML (or Word) document from the R Markdown document?
By “knitting” it! At the top of the pane near the “Save” button,
there is a button with the word “Knit” and yarn symbol beside it,
as shown in Figure 10.2.
To knit, you either click that button
or use the shortcut Ctrl-Shift-K
anywhere in the R Markdown document.

Figure 10.2: Location of the Knit button in RStudio.
When you click the “Knit” button, a bunch of processing messages should appear in a new pane beside the Console called “R Markdown,” followed by a new window popping up with the newly created document. Alternatively, the HTML document may pop up in the “Viewer” pane.
Cool, now you’ve created a HTML document!
Let’s try making a Word document.
Change the YAML value in the key output:
from html_document
to word_document
.
Then knit the document again (with the “Knit” button or with Ctrl-Shift-K
).
A Word document should open up.
This is the basic approach to creating documents from R Markdown.
Before doing the following exercise, add and commit the newly created R Markdown file
into the Git history.
10.4 Exercise: Create another R Markdown document.
Time: 7 min
- Create another R Markdown document using RStudio’s interface.
- Enter “Trying out R Markdown” as the title.
- Choose “HTML” as the document type.
- Save the document in the
doc/
folder and name itexercises.Rmd
. - Knit the document with either
Ctrl-Shift-K
or with the RStudio “Knit” button. - Look at the output document, then change the YAML value for the
output:
key fromhtml_document
toword_document
. Knit again. - Open the Word file if it hasn’t been opened already.
- Finally, add and commit only the
.Rmd
file to the Git history and push to your GitHub repository.
10.5 Inserting R code into your document
Being able to insert R code directly into a document is one of the most powerful characteristics of using R Markdown. This frees you from having to switch between programs when simultaneously writing text and running R code to derive output that you’d then put into the document.
Running and including R code in R Markdown is done through “R code chunks.”
You insert these chunks into the document by placing the cursor at the location where you want the chunk to be, then using the shortcut Ctrl-Alt-I
or the menu item Code -> Insert Chunk
to insert a new code chunk.
Before we do that,
delete all the text in your R Markdown document (the rmarkdown-session.Rmd
file),
excluding the YAML header.
Make sure that the YAML key output:
is set to html_document
.
Then, place your cursor two lines below the YAML header
and insert a code chunk (Ctrl-Alt-I
or Code -> Insert Chunk
).
The code chunk should look something like this:
```{r}
```
In the code chunk, type out 2 + 2
, so it looks like:
```{r}
2 + 2
```
You can run R code inside the code chunk the same way as you would write it in an R script.
Typing Ctrl-Enter
on the line will send the code 2 + 2
to the console,
with the output appearing directly below the code chunk in the R Markdown document.
This output though is temporary.
To ensure it is inserted into the HTML document,
knit (Ctrl-Shift-K
) the document and see what happens in the resulting HTML document.
The output 4
should appear below the code chunk in the HTML document, something like this:
2 + 2
#> [1] 4
This is a very simple example of how code chunks work. Things are usually more complicated than this though. Normally, we have to load R packages to use for our subsequent code, and this is no different in an R Markdown document. We will set this up together now.
Create a new code chunk and then type setup
right after the r
.
It should look like:
```{r setup}
```
This area that you just typed in is for code chunk labels.
In this case, we labelled the code chunk with the name setup
.
Code chunk labels should be named without _
, spaces, or .
and instead should be one word or be separated by -
.
An error may not necessarily occur if you don’t follow this rule,
but there can be unintended side effects that you may not realize
and R will likely not tell you about it,
probably causing you quite a bit of annoyance and frustration.
A nifty thing about using chunk labels is that you can see the names of sections
when using “Document Outline”
(found using Ctrl-Shift-O
),
but only if you have this option set in the
Tools -> Global Options -> R Markdown -> Show in document outline
.
The name setup
also has a special meaning for R Markdown.
When you run other code chunks in the document, R Markdown will first run the code in the setup
chunk.
Therefore, this is a good place to put your library()
calls or, in our case,
the function source()
to load all the packages.
Let’s enter some code to load packages and the dataset we have been using to the setup chunk:
```{r setup}
source(here::here("R/package-loading.R"))
load(here::here("data/nhanes_small.rda"))
```
Let’s insert another code chunk below this one,
and simply put nhanes_small
in it:
```{r}
nhanes_small
```
You can run this code as you normally would in a script file, by placing the cursor over the code
and using the shortcut Ctrl-Enter
.
We can also knit (Ctrl-Shift-K
) the document and see what it looks like.
When the HTML document opens,
you should see some text below the setup
chunk that might look something like this:
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ─────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.2 ✓ purrr 0.3.4
✓ tibble 3.0.1 ✓ dplyr 1.0.0.9000
✓ tidyr 1.1.0 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
You probably don’t want this text in your generated document, so we will add something to remove this message. You can change how code chunks work by using chunk options. They are available either by clicking on the gear in the top right corner of the chunk (shown in Figure 10.3) or by typing them in the area after the chunk label section.

Figure 10.3: Changing the settings for the code chunk actions.
If you want to run the code but not show those messages and warnings,
you can add the options message=FALSE
and warning=FALSE
:
```{r setup, message=FALSE, warning=FALSE}
source(here::here("R/package-loading.R"))
load(here::here("data/nhanes_small.rda"))
```
If you want to hide the code, the messages, the warnings, and the output,
but still run the code,
you can use the option include=FALSE
.
```{r setup, include=FALSE}
source(here::here("R/package-loading.R"))
load(here::here("data/nhanes_small.rda"))
```
Other common options are:
echo
: To show the code. Default value to show isTRUE
. UseFALSE
to hide.results
: To show the output results. Default is'markup'
. Use'hide'
to hide.eval
: To evaluate (run) the R code in the chunk. Default value isTRUE
, whileFALSE
does not run the code.
These options all work on the individual code chunk.
Note, that all the chunk options must be on one line,
after the {r
tag.
If you want to set an option to all the code chunks (e.g. to hide all the code but keep the output),
you can use the function knitr::opts_chunk$set(echo = FALSE)
.
We won’t do this in this session,
but here is what it looks like:
```{r setup}
source(here::here("R/package-loading.R"))
load(here::here("data/nhanes_small.rda"))
knitr::opts_chunk$set(echo = FALSE)
```
10.6 Creating tables of results
Let’s try running some R code to get R Markdown to create a table.
First, create a new code chunk and name it mean-age-bmi-table
.
Second, copy the code from the Data Wrangling session,
from Section 9.17.
```{r mean-age-bmi-table}
nhanes_small %>%
filter(!is.na(diabetes)) %>%
group_by(diabetes, sex) %>%
summarise(mean_age = mean(age, na.rm = TRUE),
mean_bmi = mean(bmi, na.rm = TRUE)) %>%
ungroup()
```
#> # A tibble: 4 x 4
#> diabetes sex mean_age mean_bmi
#> <fct> <fct> <dbl> <dbl>
#> 1 No female 36.5 26.2
#> 2 No male 34.3 26.1
#> 3 Yes female 59.9 33.7
#> 4 Yes male 58.6 31.5
This output is almost in a table format.
We have the columns that would be the table headers,
and we have rows that would be meaningful table rows too.
To convert it into a pretty table in the R Markdown HTML output document,
we use the kable()
function from the knitr package.
Because we don’t want to load all of the knitr functions,
we’ll use knitr::kable()
instead:
```{r mean-age-bmi-table}
nhanes_small %>%
filter(!is.na(diabetes)) %>%
group_by(diabetes, sex) %>%
summarise(mean_age = mean(age, na.rm = TRUE),
mean_bmi = mean(bmi, na.rm = TRUE)) %>%
ungroup() %>%
knitr::kable(caption = "Table caption. Mean values of Age and BMI for each sex and diabetes status.")
```
diabetes | sex | mean_age | mean_bmi |
---|---|---|---|
No | female | 36.46581 | 26.21885 |
No | male | 34.34953 | 26.10141 |
Yes | female | 59.90476 | 33.70212 |
Yes | male | 58.64764 | 31.53878 |
Now, knit (Ctrl-Shift-K
) and view the output in the HTML document.
Pretty eh! Let’s add and commit these changes into the Git history.
10.7 Exercise: Creating a table using R code
Time: 12 min
- In the
doc/exercises.Rmd
file, create a new code chunk and call itsetup
. Include thesource()
function to load the packages and useload()
withhere::here()
to load thenhanes_small
dataset. - Create another code chunk and call it
prettier-table
. Copy the code from above that calculates the mean BMI and age and paste the code into the new chunk. Add the optionecho = FALSE
to the code chunk. - Use
mutate()
to perform the following wrangling tasks:- Apply the
round()
function tomean_age
andmean_bmi
columns, to round the values to 1 digit (digits
is the second argument ofround()
). - Use
str_to_sentence(sex)
to capitalize the first letter of the word for “male” and “female” in thesex
column.
- Apply the
- Rename
diabetes
to"Diabetes Status"
,sex
toSex
, andmean_age
andmean_bmi
to"Mean Age"
and"Mean BMI"
, usingrename()
function. Hint: You can rename columns to include spaces by using"
around the new column name (e.g."Diabetes Status" = diabetes
). Don’t forget, the renaming form isnew = old
. - Run the code chunk to make sure the code works.
Include the
knitr::kable()
function at the end of the pipe, with a table caption of your choice. - Knit the document and check what the created table looks like.
- End the exercise by adding, committing, and pushing the files to your GitHub repository.
Click for the (possible) solution.
# 1. Loading libraries
source(here::here("R/package-loading.R"))
load(here::here("data/nhanes_small.rda"))
# 2. Calculating mean BMI and Age
%>%
nhanes_small filter(!is.na(diabetes)) %>%
group_by(diabetes, sex) %>%
summarise(mean_age = mean(age, na.rm = TRUE),
mean_bmi = mean(bmi, na.rm = TRUE)) %>%
ungroup() %>%
# 3. Round the means to 1 digit and
# modify the `sex` column so that male and female get capitalized.
mutate(mean_age = round(mean_age, 1),
mean_bmi = round(mean_bmi, 1),
sex = str_to_sentence(sex)) %>%
# 4. Rename `diabetes` to `"Diabetes Status"` and `sex` to `Sex`
rename("Diabetes Status" = diabetes, Sex = sex,
"Mean Age" = mean_age, "Mean BMI" = mean_bmi) %>%
# 5. Include the `knitr::kable()` function at the end of the pipe.
::kable(caption = "A prettier Table. Mean values of Age and BMI for each sex and diabetes status.") knitr
10.8 Formatting text with Markdown syntax
Take about 8 min to read over the first few parts of this section, then move to the next exercise.
You can also access a quick guide to formatting features of Markdown using the RStudio menu:
Help -> Cheatsheets -> R Markdown Cheat Sheet
. To learn more, check out Appendix B.3.
Formatting text in Markdown is done using characters that are considered “special” and act like commands. These special characters indicate what text is bolded, what is a header, what is a list, and so on. Almost every feature you will need to write a scientific document is available in Markdown, although some are missing. If you can’t get Markdown to do what you want, our suggestion would be to try to fit your writing around Markdown, rather than force or fight with Markdown to do something it wasn’t designed to do. You might actually find that the simpler Markdown approach is easier than what you wanted or were thinking of doing, and that you can actually do quite a lot with Markdown’s capabilities.
10.8.1 Headers
Creating headers (like chapters or sections) is done by using one
or more #
at the beginning of a line. Headers should always be preceded and followed by an empty line:
# Header 1
Paragraph.
## Header 2
Paragraph.
### Header 3
Paragraph.
10.8.2 Lists
Lists are created by adding either -
or 1.
to the beginning of a line
and an empty line must be at the start and end of the list.
For unnumbered lists, it looks like:
- item 1
- item 2
- item 3
which gives…
- item 1
- item 2
- item 3
And numbered lists look like:
1. item 1
2. item 2
3. item 3
which gives…
- item 1
- item 2
- item 3
10.8.3 General text formatting
**bold**
gives bold.*italics*
gives italics.super^script^
gives superscript.sub~script~
gives subscript.
10.8.4 Inline R code
R Markdown also allows you to include numbers (or other output) directly into a paragraph. For instance, if you want to add the mean of some values into the text, it would look like this:
The mean of BMI is
`r round(mean(nhanes_small$bmi, na.rm = TRUE), 2)`
.
which gives…
The mean of BMI is 26.66.
But note that using inline R code can only insert a single number or character value, and nothing more.
Let’s add and commit these changes into the Git history.
10.9 Exercise: Practice using Markdown for writing text
Time: 5 min
Complete these tasks in the doc/exercises.Rmd
file.
- Right under the YAML header, insert a list item with your name. Put your affiliations and your university or institution as other list items below your name.
- Create three level 1 headers (
#
), called “Intro,” “Methods and Results,” and “Discussion.” - Create a level 2 header (
##
) under “Methods and Results” called “Analysis.” - Write one random short sentence under each header.
Bold (
**word**
) one word in each and italicize (*word*
) another. - Insert a code chunk to make a simple calculation (e.g.
2 + 2
).
10.10 Inserting figures as files or from R code
Take about 7-10 mins to read over and work through the next few sections.
Aside from tables,
figures are the other most common form of output inserted into documents.
Like tables, you can insert figures into the document
either with Markdown or R code chunks.
We’ll do it with Markdown in this session and with R code in the next session.
First, we need an image to use.
Open a browser and search for a picture to use
(we’re using a kitten, because they’re cute).
Download the image, create a folder in doc/
called images
,
and save the image in that folder.
Then, in your R Markdown document,
use the Markdown syntax for images: 
.
The image can be in png, jpeg, or pdf formats.
If you download an image and intend to use it in an official document,
you will need to add text on the source and author of the image for copyright
purposes.

Gives…

Image by Dimitri Houtteman from Pixabay
You can also directly include a link to a picture instead of downloading the image, though this may only work in HTML documents and only if you have internet access.
Markdown syntax to control the image is limited. If you want to change the size of the image, it can be difficult. However, using R code chunks can simplify this!
First, let’s create a new code chunk (Ctrl-Alt-I
),
name the code chunk kitten-image
,
and add the function knitr::include_graphics()
.
To make it easier to find the image,
use here::here()
to point to the picture.
It should look like this:
```{r kitten-image}
knitr::include_graphics(here::here("doc/images/kitten.jpg"))
```
Knit the document again (Ctrl-Shift-K
)
and view the HTML document with the new picture.
Now, let’s change the width and height of the image,
and add a figure caption.
We do this with these code chunk options:
fig.cap
: To write the figure caption.fig.align
: To align the figure, either in"center"
,"left"
, or"right"
.out.width
andout.height
: To set the image width and height for external images (not created by R). You can use percent to set the size as well, e.g."75%"
.
Now, try to change the width and height to "50%"
,
and add a caption like "Kittens attacking flowers!"
:
```{r kitten-image, out.width="50%", out.height="50%", fig.cap="Kitten attacking flowers!"}
knitr::include_graphics(here::here("images/kitten.jpg"))
```

Figure 10.4: Kitten attacking flowers!
Knit again to see how the image changed. Great!
10.11 Other R Markdown features
Take 5 min to read these sections before proceeding to the final exercise.
10.11.1 Making your report prettier
For HTML documents, customizing the appearance (e.g. fonts) is pretty easy,
since settings to change the theme can be used directly in the YAML header.
Go back to your rmarkdown-session.Rmd
document.
You can for instance change a setting within html_document
called theme
.
It would look like this:
---
title: "Reproducible documents"
output:
html_document:
theme: sandstone
---
Notice the indentations.
Indentation tells YAML what key is related to another key,
including if it is a sub-key (an option).
The key theme
is a sub-key of html_document
,
which is a sub-key (an option) of output
.
Check out the R Markdown documentation to see other themes you can use.
The themes are all Bootswatch themes,
with most of them being available for use in HTML documents.
Modifying the theme and appearance of Word documents,
on the other hand, is much more difficult.
Since Word can’t easily be programmatically modified like HTML can,
changing the appearance of the document itself requires
that you manually create a Word template file first,
manually modify the appearance,
and then link to that template file with the reference_docx
option in the YAML header (as a sub-key of word_document
).
More detail on this can be found in the documentation.
Before going further, let’s add and commit these changes into the Git history.
10.11.2 Collaborating on R Markdown documents
In general, there are multiple ways of collaborating on a document:
- One person has the primary task of writing up the report and then gets feedback from other collaborators through the use of “Track Changes” or by inserting comments in Word.
- Multiple people are responsible for writing the report and probably use different documents that will end up being merged later on. Or they email back and forth (or use something like Dropbox or shared folders) and work on a single document.
The first workflow is not possible in an R Markdown document. Instead, you’d use a workflow that probably resembles how peer reviews are done; reading the document and making comments in a separate file to upload to the journal later. Or you’d use a workflow that revolves around GitHub and Git, an efficient workflow that has been tried and tested by tens of thousands of teams in tens of hundreds of companies globally. The goal of this course is to slowly move researchers more into the modern era, based on modern technology, tools, and workflows.
The second workflow is pretty similar. You might split up a document into sections that each collaborator may work on, and then later on merge them together. This last approach is what we will get you to do for the group project.
10.12 Exercise: Adding figures and changing the theme
Time: 20 minutes
Complete these tasks in the doc/exercises.Rmd
file.
- Search online for a picture that you like and want to put into the
R Markdown file. Download it and save the image in
doc/
. - In the R Markdown file, insert the image somewhere (e.g. under
# Results
) and give it an appropriate caption.- Use the code-chunk with
knitr::include_graphics()
to insert the image. - Use
fig.cap
to include a caption, center align withfig.align
, and resize the image to"75%"
without.width
.
- Use the code-chunk with
- Knit the document to make sure the image gets inserted.
- Go to the R Markdown documentation site and look at the different available
themes for the HTML output. Find a theme you like by changing the
theme
option in the YAML header and re-knit the document. - Finally, add and commit the changes you’ve made to the
doc/exercises.Rmd
. For now, don’t add and commit the HTML output file.
10.13 Summary of session
- Making your research reproducible not only improves the scientific quality of your work, but also makes you more efficient, productive, and have more confidence in your results.
- Use R Markdown to construct files that can easily be turned into a variety of file types such as HTML or Word.
- Insert R code chucks in R Markdown and automatically include the results in the final document.
- Make tables by using
knitr::kable()
- Use headers (
# Header 1
), text formatting (**bold**
) and lists (-
) in the R Markdown file. - Insert pictures directly into the R Markdown file with

orknitr::include_graphics("path/to/image.png"))
in a code chunk. - For HTML, choose different themes to personalise the appearance of your R Markdown output document.