Chapter 2 Good practices

2.1 Coding style

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.

– Hadley Wickham

Please make your code readable by following e.g. this coding style. Hereinafter I provide some examples from this guide.

You can use package styler (that provides RStudio addins) to correct your style:

You can also use Ctrl+Shift+A in RStudio; we will talk about RStudio shortcuts in section 2.2.

2.1.1 Naming

Be smart with your naming. I can’t tell the number of times I’ve seen df <- as.matrix(mtcars) on Stack Overflow.

2.1.2 Spacing

Put a space before and after = when naming arguments in function calls. Most infix operators (==, +, -, <-, etc.) are also surrounded by spaces, except those with relatively high precedence: ^, :, ::, and :::. Always put a space after a comma, and never before (just like in regular English).

# Good
average <- mean((feet / 12) + inches, na.rm = TRUE)
x <- 1:10
base::sum

# Bad
average<-mean(feet/12+inches,na.rm=TRUE)
x <- 1 : 10
base :: sum

2.1.3 Indenting

Curly braces, {}, define the most important hierarchy of R code. To make this hierarchy easy to see, always indent the code inside {} by two spaces. This should be automatic in RStudio.

# Good
if (y < 0 && debug) {
  message("y is negative")
}

if (y == 0) {
  if (x > 0) {
    log(x)
  } else {
    message("x is negative or zero")
  }
} else {
  y ^ x
}

# Bad
if (y < 0 && debug)
message("Y is negative")

if (y == 0)
{
    if (x > 0) {
      log(x)
    } else {
  message("x is negative or zero")
    }
} else { y ^ x }

2.1.4 Long lines

Strive to limit your code to 80 characters per line. This fits comfortably on your screen with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function. In RStudio -> Tools -> Global Options, set

2.1.5 Other

  • Use <-, not =, for assignment. Keep = for parameters.
# Good
x <- 5
system.time(
  x <- rnorm(1e6)
)

# Bad
x = 5
system.time(
  x = rnorm(1e6)
)
  • Don’t put ; at the end of a line, and avoid multiple commands on the same line.

  • Only use return() for early returns. Otherwise rely on R to return the result of the last evaluated expression.

# Good
add_two <- function(x, y) {
  x + y
}

# Bad
add_two <- function(x, y) {
  return(x + y)
}
  • Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.
# Good
"Text"
'Text with "quotes"'
'<a href="http://style.tidyverse.org">A link</a>'

# Bad
'Text'
'Text with "double" and \'single\' quotes'

2.2 RStudio

Download a recent enough version of RStudio (>= 1.2) and use it!

Learn more about the new features of RStudio at https://rstudio.com/products/rstudio/release-notes/.

RStudio features:

  • everything you can expect from a good IDE
  • keyboard shortcuts I use
    1. Ctrl + Space (auto-completion, better than Tab)
    2. Ctrl + Up (command history & search)
    3. Ctrl + Click (function source code)
    4. Ctrl + Enter (execute line of code)
    5. Ctrl + Shift + A (reformat code)
    6. Ctrl + Shift + C (comment/uncomment selected lines)
    7. Ctrl + Shift + K (knit)
    8. Ctrl + Shift + B (build package, website or book)
    9. Ctrl + Shift + M (pipe)
    10. Alt + Shift + K to see all shortcuts…
  • Panels (everything is integrated, including Git and a terminal)
  • Interactive data importation from files and connections (see this webinar)

  • RStudio Projects:
    • Meaningful structure in one folder
    • The working directory automatically switches to the project’s folder
    • The File tab displays the associated files and folders in the project
    • History of R commands and open files
    • Any settings associated with the project, such as Git settings, are loaded. Note that you can have a .Rprofile file in the project’s root directory to enable project-specific settings to be loaded each time people open the project.

Read more at https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ and also see chapter Efficient set-up of book Efficient R programming.

2.3 Version control (Git)

2.3.1 Why use Git? You don’t use Git?

You don't use Version Control?

Figure 2.1: You don’t use Version Control?

Have you ever:

  • Made a change to code, realized it was a mistake and wanted to revert back?
  • Lost code or had a backup that was too old?
  • Wanted to submit a change to someone else’s code?
  • Wanted to share your code, or let other people work on your code?

In these cases, and probably many others, a version control system should make your life easier (see https://stackoverflow.com/a/1408464/6103040).

  • Version control for the researcher: don’t do that, use Git

  • Version control for the data analyst: reproducible workflow

Also, see https://stackoverflow.com/q/2712421/6103040.

  • Use version control to work from anywhere

  • Working with GitHub can be a line on your CV (read more):

A lot of students have said to me later, even first-year undergraduates, that using GitHub has helped them a lot when they went for an internship or a research position interview.

They are able to say, “Oh, I already have worked with GitHub. I am familiar with it. I know how it works.” So I think they are at least able to put that on their CV and go into a situation where there’s a research or data analysis team and say, “Yeah, sure. I am actually familiar with the same tools that you use.”

– Mine Cetinkaya-Rundel, Duke University, RStudio

2.3.2 About Git

  • Main Git platforms (share your code, collaborate):

    • GitHub, documentation (only free for public repositories, now owned by Microsoft)
    • GitLab (open source & free)
    • Bitbucket (free when you have less than 5 collaborators)
    • any server..
  • 4 main commands:

    • add: add files to be part of the next commit
    • commit: snapshot of your code at a specified point in time (you can and you should use this even when having no internet connection)
    • push: merge your local modifications with the main project
    • pull: update your local project with the latest version of the main project
  • Simple (solo) use of git to prevent merge conflicts:

    • after opening a project, always pull
    • before closing a project, always commit/push
  • Use git even when you do not have any internet connection! (e.g. on a secure server) Just use commits for version control locally.

  • How to link between an RStudio project and a GitHub repository?

For Mac users, you might need to use the terminal for git clone, then create the RStudio project from the existing directory. If you have some permission denied for the public key, you might also need to run ssh-agent -s && ssh-add <path_to_public_key> (cf. this SO answer).

2.4 Getting help

2.4.1 Help yourself, learn how to debug

A basic solution is to print everything, but it’s usually not working well on complex problems. A convenient solution to see all the variables’ states in your code is to place some browser() from where you want to check the variables’ states. To debug functions, debugonce() is also very useful.

my_log <- function(x) log(x - 1)

my_fun <- function(a, b) {
  # browser()
  la <- my_log(a) 
  lb <- my_log(b)
  la + lb
}
my_fun(1, 0)
#> Warning in log(x - 1): NaNs produced
#> [1] NaN

Try to uncomment browser() or use debugonce(my_fun):

debugonce(my_fun)
my_fun(1, 0)

Learn more with this book chapter, this other book chapter, this webinar and this RStudio article.

2.4.2 External help

Can’t remember useful functions? Use cheat sheets.

You can search for specific R stuff on https://rseek.org/. You should also read documentations carefully. If you’re using a package, search for vignettes and a GitHub repository.

You can also use Stack Overflow. The most common use of Stack Overflow is when you have an error or a question, you google it, and most of the times the first links are Q/A on Stack Overflow.

You can ask questions on Stack Overflow (using the tag r). You need to make a great R reproducible example if you want your question to be answered. Sometimes, while making this minimal reproducible example, you end up understanding and solving the issue on your own.

If you are confident enough with your R skills, you can take the next step and answer questions on Stack Overflow. It’s a good way to increase your skills, or just to procrastinate while writing a scientific manuscript.

You can also join communities, e.g. join the French-speaking R community or join the R-Ladies community on Slack. These are generally much friendlier and welcoming spaces compared to Stack Overflow.