Chapter 2 Good practices

2.1 Coding style

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.

– Hadley Wickham

Please make your code readable by following e.g. this coding style (examples below come from this guide).

You can use package styler (that provides RStudio addins) to correct your style:

2.1.1 Naming

Be smart with your naming. I can’t tell the number of times I’ve seen df <- as.matrix(mtcars) on Stack Overflow.

2.1.2 Spacing

Put a space before and after = when naming arguments in function calls. Most infix operators (==, +, -, <-, etc.) are also surrounded by spaces, except those with relatively high precedence: ^, :, ::, and :::. Always put a space after a comma, and never before (just like in regular English).

# Good
average <- mean((feet / 12) + inches, na.rm = TRUE)
x <- 1:10
base::sum

# Bad
average<-mean(feet/12+inches,na.rm=TRUE)
x <- 1 : 10
base :: sum

2.1.3 Indenting

Curly braces, {}, define the most important hierarchy of R code. To make this hierarchy easy to see, always indent the code inside {} by two spaces. This should be automatic in RStudio.

# Good
if (y < 0 && debug) {
  message("y is negative")
}

if (y == 0) {
  if (x > 0) {
    log(x)
  } else {
    message("x is negative or zero")
  }
} else {
  y ^ x
}

# Bad
if (y < 0 && debug)
message("Y is negative")

if (y == 0)
{
    if (x > 0) {
      log(x)
    } else {
  message("x is negative or zero")
    }
} else { y ^ x }

2.1.4 Long lines

Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.

If a function call is too long to fit on a single line, use one line for the function name, for each argument, and for the closing ). This makes the code easier to read and to modify later.

# Good
do_something_very_complicated(
  something = "that",
  requires = many,
  arguments = "some of which may be long"
)

# Bad
do_something_very_complicated("that", requires, many, arguments,
                              "some of which may be long"

2.1.5 Other

  • Use <-, not =, for assignment. Keep = for parameters.
# Good
x <- 5
system.time(
  x <- rnorm(1e6)
)

# Bad
x = 5
system.time(
  x = rnorm(1e6)
)
  • Don’t put ; at the end of a line, and avoid multiple commands on the same line.

  • Only use return() for early returns. Otherwise rely on R to return the result of the last evaluated expression.

# Good
add_two <- function(x, y) {
  x + y
}

# Bad
add_two <- function(x, y) {
  return(x + y)
}
  • Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.
# Good
"Text"
'Text with "quotes"'
'<a href="http://style.tidyverse.org">A link</a>'

# Bad
'Text'
'Text with "double" and \'single\' quotes'

2.2 RStudio

Download the latest version of RStudio (> 1.1) and use it!

Learn more about new features of RStudio v1.1 there.

RStudio features:

  • everything you can expect from a good IDE
  • keyboard shortcuts I use
    1. Ctrl + Space (auto-completion, better than Tab)
    2. Ctrl + Up (command history & search)
    3. Ctrl + Click (function source code)
    4. Ctrl + Enter (execute line of code)
    5. Ctrl + Shift + A (reformat code)
    6. Ctrl + Shift + C (comment/uncomment selected lines)
    7. Ctrl + Shift + K (knit)
    8. Ctrl + Shift + B (build package, website or book)
    9. Ctrl + Shift + M (pipe)
    10. Alt + Shift + K to see all shortcuts…
  • Panels (everything is integrated, including Git and a terminal)
  • Interactive data importation from files and connections (see this webinar)

  • R Projects:
    • Meaningful structure in one folder
    • The working directory automatically switches to the project’s folder
    • The File tab displays the associated files and folders in the project
    • History of R commands and open files
    • Any settings associated with the project, such as Git settings, are loaded. Note that you can have a file set-up.R or .Rprofile in the project’s root directory to enable project-specific settings to be loaded each time people open the project.

Read more at https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ and also see chapter Efficient set-up of book Efficient R programming.

2.3 Version control (Git)

2.3.1 Why use Git? You don’t use Git?

You don't use Version Control?

Figure 2.1: You don’t use Version Control?

Have you ever:

  • Made a change to code, realized it was a mistake and wanted to revert back?
  • Lost code or had a backup that was too old?
  • Wanted to submit a change to someone else’s code?
  • Wanted to share your code, or let other people work on your code?

In these cases, and no doubt others, a version control system should make your life easier (see https://stackoverflow.com/a/1408464/6103040).

  • Version control for the researcher: don’t do that, use Git

  • Version control for the data analyst: reproducible workflow

Also, see https://stackoverflow.com/q/2712421/6103040.

  • Use version control to work from anywhere

  • Working with GitHub can be a line on your CV (read more):

A lot of students have said to me later, even first-year undergraduates, that using GitHub has helped them a lot when they went for an internship or a research position interview.

They are able to say, “Oh, I already have worked with GitHub. I’m familiar with it. I know how it works.” So I think they are at least able to put that on their CV and go into a situation where there’s a research or data analysis team and say, “Yeah, sure. I am actually familiar with the same tools that you use.”

– Mine Cetinkaya-Rundel, Duke University, RStudio

2.3.2 Git

  • Main Git platforms (share your code, collaborate):
  • 3 main commands:
    • pull: update your local project with the latest version of the main project
    • commit: snapshot of your code at a specified point in time
    • push: merge your local modifications with the main project
  • Simple (solo) use of git to prevent merge conflicts:
    • after opening a project, always pull
    • before closing a project, always commit/push
  • How to link between an RStudio project and a GitHub repository?

2.4 Getting help

2.4.1 Help yourself, learn how to debug

A basic solution is to print everything, but it’s usually not working well on complex problems. A convenient solution to see all the variables’ states in your code is to place some browser() from where you want to check the variables’ states.

Learn more with this book chapter, this other book chapter, this webinar and this RStudio article.

2.4.2 External help

Can’t remember useful functions? Use cheat sheets.

You can search for specific R stuff on https://rseek.org/. You should also read documentations carefully. If you’re using a package, search for vignettes and a GitHub repository.

You can also use Stack Overflow. The most common use of Stack Overflow is when you have an error or a question, you google it, and most of the times the first links are Q/A on Stack Overflow.

You can ask questions on Stack Overflow (using the tag r). You need to make a great R reproducible example if you want your question to be answered. Sometimes, while making this reproducible example, you find the answer to your problem.

If you’re confident enough with your R skills, you can take the next step and answer questions on Stack Overflow. It’s a good way to increase your skills, or just to procrastinate while writing a scientific manuscript.

You can also join communities, e.g. join the French-speaking R community or join the R-Ladies community. These are generally much friendlier and welcoming spaces as compared to Stack Overflow.