Want to help out or contribute?

If you find any typos, errors, or places where the text could be improved, please let us know by providing feedback either in the feedback survey (given during class), by using GitLab, or directly in this document with hypothes.is annotations.

  • Open an issue or submit a merge request on GitLab with the feedback or suggestions.
  • Hypothesis Add an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page.

7 Management of R projects

When in RStudio, quickly jump to this page using r3::open_rproject_management().

Session objectives:

  1. Create self-contained projects to be more reproducible.
  2. Use built-in tools in RStudio to make it easier to manage R projects.
  3. Become familiar with the very basics of R.
  4. Apply tools to use a consistent “grammar” and “styling” when writing R code and making files.
  5. Know of and use different approaches to getting and finding help.

7.1 What is a project and why use it?

Take 5 minutes and read through this section.

Before we create a project, we should first define what we mean by “project”. What is a project? In this case, a project is a set of files that together lead to some type of scientific “output” (for instance a manuscript). Use data for your output? That’s part of the project. Do any analysis on the data to give some results? Also part of the project. Write a document, e.g. a manuscript, based on the data and results? Have figures inserted into the output document? These are also part of the project.

More and more how we make a claim in a scientific product is just as important as the output describing the claim. This includes not only the written description of the methods but also the exact steps taken, i.e. the code used. So, using a project setup can help with keeping things self-contained and easier to track and link with the scientific output. Here is some things to consider when doing projects:

  • Organise all R scripts and files in the same folder (also called “directory”) so it is more self-contained (doesn’t rely on other components in your computer).
  • Use a common and consistent folder and file structure for your projects.
  • Use version control to track changes to your files.
  • Make raw data “read-only” (don’t edit it directly) and use code to show what was done.
  • Whenever possible, use code to create output (figures, tables) rather than manually creating or editing them.
  • Think of your code and project like you do with your manuscript or thesis: that other people will eventually look at it and review it, and that it will likely also be published or archived online.

These simple steps can be huge steps toward being reproducible in your analysis. And by managing your projects in a reproducible fashion, you’ll not only make your science better and more rigorous, it also makes your life easier too!

7.2 RStudio and R Projects

RStudio helps us with managing projects by making use of R Projects. RStudio R Projects make it easy to divide your work projects into a “container”, that have their own working directory (the folder where your analysis occurs), workspace (all the R activity and output is temporarily saved), history, and documents.

There are many ways one could organise a project folder. We’ll be setting up a project folder and file structure using prodigenr. We’ll use RStudio’s New Project menu item under File -> New Project. We’ll call the new project LearningR. Save it on your Desktop/. See Figure 7.1 for the steps to do it:

Creating a new analysis project in RStudio.

Figure 7.1: Creating a new analysis project in RStudio.

You can also use the Console with the below function, but we won’t do that in this session.

prodigenr::setup_project("~/Desktop/LearningR")

Just a reminder, when we use the :: colon here, we are saying:

Hey R, from the prodigenr package use the setup_project function.

After we’ve created a New Project in RStudio, we’ll have a bunch of new files and folders.

LearningR
├── R
│   ├── README.md
│   ├── fetch_data.R
│   └── setup.R
├── data
│   └── README.md
├── doc
│   └── README.md
├── .Rbuildignore
├── .gitignore
├── DESCRIPTION
├── LearningR.Rproj
├── README.md
└── TODO.md

This forces a specific, and consistent, folder structure to all your work. Think of this like the “introduction”, “methods”, “results”, and “discussion” sections of your paper. Each project is then like a single manuscript or report, that contains everything relevant to that specific project. There is a lot of power in something as simple as a consistent structure. Projects are used to make life easier. Once a project is opened within RStudio the following actions are automatically taken:

  • A new R session (process) is started.
  • The R session’s working directory is set to the project directory.
  • RStudio project options are loaded.

Before moving on, let’s go over a bit about how R works, and what the “R session” means. An R session is the way you normally interact with R, where you would write code in the Console to tell R to do something. Normally, when you open an R session without an R Project, the session defaults to assuming you will be working in the ~/Desktop or ~ (your Home folder) location. But this location isn’t where you actually work… you would work in the folder that has your R scripts or data files. The assumption with R Projects on the other hand, is that the R session working directory should be where the R Project is, since that is where you have your R scripts and data files.

Within the project we created, there are several README files in each folder that explain a bit about what should be placed there. Briefly:

  1. Documents like manuscripts, abstracts, and exploration type documents should be put in the doc/ directory (including R Markdown files which we will cover later).
  2. Data, raw data, and metadata should be in either the data/ directory or in data-raw/ for the raw data. We’ll explain the data-raw/ folder and creating it later in the lesson.
  3. All R files and code should be in the R/ directory.
  4. Name all new files to reflect their content or function. Follow the tidyverse style guide for file naming. Either _ or - are recommended to be used instead of a space, though using - tends to be more commonly used.

For this course, we’ll need to delete the files fetch_data.R and setup.R in the R/ folder, as well as the .Rbuildignore file, since we won’t need them. For any project, it is highly recommended to use version control, which we’ll cover in more detail later.

7.3 Exercise: Reading the READMEs

Time: 7 min

  1. Briefly read through each of the README.md files by opening them up in RStudio.

7.4 Exercise: Better file naming

Time: 10 min

Take some time to think about file naming. Look at the list of file names below. Which file names are good names and which aren’t? We’ll discuss afterwards why some are good names and others are not.

fit models.R
fit-models.R
foo.r
stuff.r
get_data.R
Manuscript version 10.docx
manuscript.docx
new version of analysis.R
trying.something.here.R
plotting-regression.R
utility_functions.R
code.R
Click for the (possible) solution.

# Bad: Has a space.
fit models.R
# Good: Descriptive with no space.
fit-models.R
# Bad: Not descriptive.
foo.r
stuff.r
# Good: Descriptive with no space.
get_data.R
# Bad: Has space
Manuscript version 10.docx
# Good: Descriptive.
manuscript.docx
# Bad: Not descriptive and has spaces.
new version of analysis.R
# Bad: Not descriptive and has dots.
trying.something.here.R
plotting-regression.R
utility_functions.R
# Bad: Not descriptive.
code.R

7.5 Next steps after creating the project

Now that we’ve created a project and associated folders, let’s add some more options to the project. One option to set is to ensure that every R session you start with is a “blank slate”, by typing and running in the Console:

usethis::use_blank_slate()

Now, let’s add some R scripts that we will use in later sessions of the course.

usethis::use_r("project-session")
usethis::use_r("version-control-session")
usethis::use_r("wrangling-session")

The usethis::use_r() command creates R scripts in the R/ folder. As you may tell, the usethis package can be quite handy.

7.6 RStudio layout and usage

Open up the R/project-session.R file and type out the code in that file for the code-along parts. You’ve already gotten a bit familiar with RStudio in the pre-course tasks, but if you want more details, RStudio has a great cheatsheet on how to use RStudio. The items to know right now are the “Console”, “Files”/“Help”, and “Source” tabs.

Code is written in the “Source” tab, where it saves the code and text as a file. You send code to the console from the opened file by typing Ctrl-Enter (or clicking the “Run” button). In the “Source” tab (where R scripts and R Markdown files are shown), there is a “Document Outline” button (top right beside the “Run” button) that shows you the headers or “Sections” (more on that later). Click it to enable the outline from now on.

7.7 Basics of using R

One useful thing to do to make your R script more readable and understandable is to use “Sections”. They’re like “headers” in Word and they split up an R script into sections, which then show up when you use the “Document Outline” opened by either using Ctrl-Shift-O or by going to Code -> Show Document Outline. You can use sections through the menu item (Code -> Insert Section) or with the keyboard shortcut (Ctrl-Shift-R).

In R, everything is an object and every action is a function. A function is an object, but an object isn’t always a function. To create an object, also called a variable, we use the <- assignment operator:

weight_kilos <- 100
weight_kilos
#> [1] 100

The new object now stores the value we assigned it. We can read it like:

weight_kilos contains the number 100” or “put 100 into the object weight_kilos

You can name an object in R almost anything you want, but it’s best to stick to a style guide. For instance, use snake_case to name things.

There are also several main “classes” (or types) of objects in R: lists, vectors, matrices, and data frames. For now, the only two we will cover are vectors and data frames. Vectors are a string of values put together while data frames are multiple vectors put together as columns. Data frames are a form of data that you’d typically see as a spreadsheet. This type of data is called “rectangular data” since it has two dimensions: columns and rows.

So these are vectors, which have different types like character, number, or factor:

# Character vector
c("a", "b", "c")
# Logic vector
c(TRUE, FALSE, FALSE)
# Numeric vector
c(1, 5, 6)
# Factor vector
factor(c("low", "high", "medium", "high"))

While this is what a data frame looks like:

head(CO2)
#>   Plant   Type  Treatment conc uptake
#> 1   Qn1 Quebec nonchilled   95   16.0
#> 2   Qn1 Quebec nonchilled  175   30.4
#> 3   Qn1 Quebec nonchilled  250   34.8
#> 4   Qn1 Quebec nonchilled  350   37.2
#> 5   Qn1 Quebec nonchilled  500   35.3
#> 6   Qn1 Quebec nonchilled  675   39.2

Notice how we use the # to write comments or notes. Whatever we write after the “hash” (#) tells R to ignore it and not run it. The c() function puts values together and head() prints the first 6 rows. Both c() and head() are functions since they do an action and they can be recognized by the () at their end. Functions take an input (known as arguments) and give back an output. Each argument is separated by a comma ,. Some functions can take unlimited arguments if they have a ... as an input (like c()). Others, like head() only can take a few arguments. In the case of head(), the first argument is for the data frame.

If we want to get more information from data frames, we can use other functions like:

# Column names
colnames(CO2)
#> [1] "Plant"     "Type"      "Treatment" "conc"      "uptake"

# Structure
str(CO2)
#> Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   84 obs. of  5 variables:
#>  $ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
#>  $ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ conc     : num  95 175 250 350 500 675 1000 95 175 250 ...
#>  $ uptake   : num  16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
#>  - attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
#>   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
#>  - attr(*, "outer")=Class 'formula'  language ~Treatment * Type
#>   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
#>  - attr(*, "labels")=List of 2
#>   ..$ x: chr "Ambient carbon dioxide concentration"
#>   ..$ y: chr "CO2 uptake rate"
#>  - attr(*, "units")=List of 2
#>   ..$ x: chr "(uL/L)"
#>   ..$ y: chr "(umol/m^2 s)"

# Summary statistics
summary(CO2)
#>      Plant             Type         Treatment       conc          uptake     
#>  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95   Min.   : 7.70  
#>  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175   1st Qu.:17.90  
#>  Qn3    : 7                                    Median : 350   Median :28.30  
#>  Qc1    : 7                                    Mean   : 435   Mean   :27.21  
#>  Qc3    : 7                                    3rd Qu.: 675   3rd Qu.:37.12  
#>  Qc2    : 7                                    Max.   :1000   Max.   :45.50  
#>  (Other):42

7.8 Using auto-completion in RStudio

To more quickly type out objects in R, use “tab-completion” to finish an object name for you. As you type out an object name, hit the “tab” key to see a list of objects available. RStudio will not only list out the objects, but also shows the possible options and help associated with the object.

Try it out. In the RStudio Console, start typing:

col

Then hit tab. You should see a list of functions to use. Hit tab again to finish with colnames(). This simple tool can save so much time and can prevent spelling mistakes.

7.9 R object naming practices

Take 5 minutes and read this section, and then complete the exercise.

If you’ve ever seen some old R code, you may notice that function and object names are usually short. For instance, str() is the function to see the “object structure”. Back then, there were no tab-completion tools, so typing out long names was painful. Now we have powerful auto-completion tools. So this also means that when you write R code, you should use descriptive names instead of short ones. For instance, the object weight_kilo would have been named something like x. But this doesn’t tell us what that is and doesn’t help us write better code.

The ability to read, understand, modify, and write simple pieces of code is an essential skill for modern data analysis tasks and projects. So! Here’s some tips for writing R code:

  • Be descriptive with your names.
  • As with natural languages like English, write as if someone will read your code.
  • Stick to a style guide.

Even though R doesn’t care about naming, spacing, and indenting, it really matters how your code looks. Coding is just like writing. Even though you may go through a brainstorming note-taking stage of writing, you eventually need to write correctly so others can read and understand what you are trying to say. In coding, brainstorming is fine, but eventually you need to code in a readable way. That’s why using a style guide is really important.

7.10 Exercise: Make code more readable

Time: 20 min

Read through these specific sections of the style guide:

Then try to make the below code more readable. Copy and paste the code below into the R/project-session.R file. NOTE: Don’t run this code, just edit it to improve the code style and object naming. There are some tricks in here that we haven’t covered yet, but will when we go through the exercise.

The code below is in some way either wrong or incorrectly written. Edit the code so it follows the correct style and so it’s easier to understand and read. You don’t need to understand what the code does, just follow the guide.

# Object names
DayOne
T <- FALSE
c <- 9

# Spacing
x[,1]
x[ ,1]
mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )
height<-feet*12+inches
df $ z
x <- 1 : 10

# Indenting and brackets
if (y < 0 && debug)
message("Y is negative")
Click for the (possible) solution.

# Object names

# Should be snake case (looks like `snake_case`)
# DayOne
day_one

# Should not over write existing function names
# T = TRUE, so don't name anything T
# T <- FALSE
false <- FALSE
# c is a function name already. Plus c is not descriptive
# c <- 9
number_value <- 9

# Spacing
# Commas should be in correct place
# x[,1]
# x[ ,1]
x[, 1]
# Spaces should be in correct place
# mean (x, na.rm = TRUE)
# mean( x, na.rm = TRUE )
mean(x, na.rm = TRUE)
# height<-feet*12+inches
height <- feet * 12 + inches
# df $ z
df$z
# x <- 1 : 10
x <- 1:10

# Indenting should be done after if, for, else functions
# if (y < 0 && debug)
# message("Y is negative")
if (y < 0 && debug) {
    message("Y is negative")
}

7.11 Automatic styling in RStudio

You may have completed the exercise by hand, however it is possible to do it automatically. RStudio has an automatic styling tool, found in the menu item Code -> Reformat Code (or with Ctrl-Shift-A). Let’s try this styling out together by copy and pasting the exercise code again and running the reformatting on it.

The tidyverse style guide also has a package called styler that automates fixing code to fit the style guide. With styler you can fix styling on multiple files at once. We won’t be covering styler though, so this is just a reference to a possible future tool to try out.

7.12 Packages, data, and file paths

A major strength of R is in its ability for others to easily create packages that simplify doing complex tasks (e.g. running mixed effects models with the lme4 package or creating figures with the ggplot2 package) and for anyone to easily install and use that package. So make use of packages!

You load a package by writing:

library(tidyverse)

Working with multiple R scripts and files, it quickly gets tedious to always write out each library function at the top of each script. One possible easier way of managing this is by creating a new file and keeping all package loading code in that file. So:

usethis::use_r("package-loading")

This will create a new R script in the R/ folder called package-loading.R. In this file, add this to the top:

library(tidyverse)

Then, to run this package-loading.R script and load the packages in our other R scripts, you would use the source() function to run an external R script. In the project-session.R file, put this at the top of the file.

source(here::here("R/package-loading.R"))

You see there’s also a new thing here! The here package uses a function called here() that makes it easier to manage file paths within an R Project.

So, what is a file path and why is this here package necessary? A file path is the list of folders a file is found in. For instance, your CV may be found in /Users/Documents/personal_things/CV.docx. The problem with file paths in R is that when you run a script interactively (e.g. what we do in class and normally), the file path and “working directory” (the R session) are located at the Project level (where the .Rproj file is found). You can see the working directory by looking at the top of the RStudio Console.

But! When you source() an R script or run it not interactively, the R code may likely run in the folder it is saved in, e.g. in the R/ folder. So your file path R/packages-loading.R won’t work because there isn’t a folder called R/ in the R/ folder.

LearningR <--- R Project working directory starts here.
├── R
│   ├── README.md
│   ├── fetch_data.R <--- Working directory when running not interactively.
│   └── setup.R
├── data
│   └── README.md
├── doc
│   └── README.md
├── .Rbuildignore
├── .gitignore
├── DESCRIPTION
├── LearningR.Rproj <--- here() moves file path to start in this file's folder.
├── README.md
└── TODO.md

Often people use the function setwd(), but this is never a good idea since using it makes your script runnable only on your computer… which makes it no longer reproducible. We use the here() function to tell R to go to the project root (where the .Rproj file is found) and then use that file path. This simple function can make your work more reproducible and easier for you to use later on.

7.13 Encountering problems and finding help

Please take 5 min to read this section.

A common and frequent experience when working in R. Artwork by [\@allison_horst](https://github.com/allisonhorst/stats-illustrations).

Figure 7.2: A common and frequent experience when working in R. Artwork by @allison_horst.

You will encounter problems and errors when working with R, and you will encounter them all the time. In fact, a large amount of your time in R will be spent figuring out solutions to these errors (“debugging”). Error messages will appear in red text in your Console and will start with the word “Error:”. Warning messages are also in red text, but are often either harmless or informative, so make sure to read the message and see if it says “Error” or not. Here are some initial steps to take when you encounter an error:

  1. First, try to stay calm; problems happen to everyone, no matter their skill level. You can fix it! 😄
  2. Read through the error message and try to understand what R is telling you. Some common error messages include:
    • “Could not find function”: Usually means that you have misspelled the function or an R package has not loaded properly.
    • “Object not found”: Usually means that you have not initialized (created) the object or the object is initialized but empty.
    • “Error in…”: Usually means that you are referring to an object that doesn’t exist.
    • “Unexpected symbol in…”: Usually is because you misspelled a variable or object name, so R can’t find it.
  3. Go over the code again and carefully check for any mistakes:
    • Missing commas or pipes?
    • Missing end brackets like ], ), or }?
    • Capitalized something that didn’t need to be?
    • Object or column name misspelled?
    • Forgot to load your data before working on it?
    • Forgot to load or re-load your packages? Packages are automatically unload when you exit out of RStudio and R. So you need to load them in each new session with the library() function.
  4. Go back to the start of the code and run each line one at a time, to see where the problem occurs. You will get an opportunity to practice this later, once you are working with bigger chunks of code.

If you still can’t find the problem, here are some other steps to take:

  1. Restart the R session (Session -> Restart R or Ctrl-Shift-F10). Then run the code from beginning again and track what objects get created, and if the proper object name is used later on.

  2. (Rarely need to do) Close/re-open RStudio and try again.

  3. Use help() or ? to access built-in documentation about a function or package. You may be using the function incorrectly, so find out more about the function by looking at the built-in documentation. The documentation will open up in the “Help” pane of RStudio (bottom right-hand corner). Try it out: Enter either of the following commands into your console and run it (hit Enter).

    ?colnames
    
    help(colnames)

    Sometimes, this documentation can be hard to read and seem overly complex for a beginner. You can also try finding the website for the package you are having trouble with, as they often have guides that are a little easier to understand. The tidyverse packages all have amazing documentation that you can use to help you with problems you may have.

  4. Consider explaining the problem out loud to a colleague or friend. You might find that, in verbally going through the problem and explaining it, you will likely come up with the solution yourself.

  5. Take a break and come back to it later!

  6. Google it. Chances are that someone has already encountered that error and has asked about it online. In fact, those who are “experts” in coding languages like R are experts largely because of their skill in knowing the right words or terms or questions to ask Google. Usually googling the error message will be enough to find the answer, but sometimes you’ll need to include “R” or “rstats” and the relevant package or function as a keyword in your search.

  7. If all else fails, you can always turn to the trusty online R community. Check StackOverflow, a coding-related question and answer website, to see whether your issue has already been asked and solved by others. If it hasn’t and you are considering submitting a question, make sure to read the posting guides beforehand to ensure that you are asking the question in a helpful way.

Final words: It is important to always work towards writing “better” and “neater” code, as this can make it easier to break down pieces of code and troubleshoot problems. Ways to integrate this into your practice are to review documents like the tidyverse style guides regularly and perhaps join an online coding community.

7.14 Summary of session

  • Use R Projects in RStudio (e.g. with prodigenr).
  • Use a standard folder and file structure.
  • Use a consistent style guide for code and files.
  • Keep R scripts simple, focused, short.
  • Use the here() function from the here package.
  • Use tab auto-completion when writing code.
  • Use ? to get help on an R object.