Types of documentation in R

There are two main locations in an R package which contain documentation:

  • help pages (sometimes called “man pages”) for each function
  • vignettes (longer form with R code chunks embedded in text)

We will show in this lecture note how to generate documentation for both. For the function help pages, I strongly recommend to use roxygen formatting and the document function from the devtools package, which will be covered here. This will automatically generate both the .Rd files that go in the man directory, and it will populate the NAMESPACE file which specifies what is exported and what is imported by your package.

Since we did not use Roxygen in the last lecture, lets recreate our foo package with Roxygen in mind from the start. In newer versions of R, this helps to avoid issues not overwriting the existing NAMESPACE file when we use the document() function later.

Lets run the following to prep our foo package. If you do not want to rewrite your DESCRIPTION file, you can elect not to overwrite it when asked below. But, definitely choose to overwrite your existing NAMESPACE file.

Alternatively, you can delete your existing foo folder and start from scratch. Just make sure to add the add function to the R/ directory in an R script file.

library(usethis)
create_package("foo", roxygen=TRUE)

Writing help for functions

Suppose we have a function in one of our R scripts, which is located in the R directory of our package.

add <- function(x,y,negative=FALSE) {
  z <- x + y
  if (negative) {
    z <- -1 * z
  }
  z
}

We need to tell our prospective users what the function does, what kind of arguments it takes, and what is the output of the function. It is useful to also supply references to literature or to specific manuscripts associated with the code.

Take a look at the help for the quantile function and the sapply function:

?quantile
?sapply

You can see there is some variation among function help pages, but key elements are repeated. Here we will cover some basics about documenting functions, and for further reference you can look up the notes from Hadley Wickham on writing documentation.

roxygen

To start documenting this function with roxygen, you type a pound key and a single quote mark #' in a line above the function, and then add a title to the function

#' Sum of two vectors of numbers
add <- function(x,y,negative=FALSE) {
...

If you press Enter for a new line from the title line, in most R-aware editors (Rstudio, Emacs + ESS, although not the R GUI) you will get a new line starting with #'.

The first line becomes the Title of the help page, the second line becomes the Description (briefly describe what the function does), and further lines will populate the Details section of the help page. For example:

#' Sum of two vectors of numbers
#' 
#' This function sums two vectors of numbers and optionally
#' allows for the negative of the sum to be returned.
#' 
#' This paragraph will start the Details section...
#' 
#' This will be the second paragraph of the Details section... 
add <- function(x,y,negative=FALSE) {
...

If it is desired to refer to the function itself or another function, one can use the following, \code{add}, which will be rendered as monospace type, add, in the HTML and PDF version of the help pages and as 'add' with single-quotes in the R console version of the help pages.

Arguments

Next we will document the arguments, which get a special tag to indicate their lines @param. One may ask why the arguments are tagged with @param as they will show up in the help page under Arguments and I believe the answer is that roxygen for documenting R code is patterned historically on Doxygen which uses the tag @param.

Here we take out the extra Details paragraphs from above, and just focus on the Title, Description and Arguments:

#' Sum of two vectors of numbers
#' 
#' This function sums two vectors of numbers and optionally
#' allows for the negative of the sum to be returned.
#' 
#' @param x a vector of numbers
#' @param y a vector of numbers
#' @param negative logical, whether to flip the sign of the sum
add <- function(x,y,negative=FALSE) {
...

The format is: @param name description. The description is a bit personal preference, and I tend to put the expected type of the argument (e.g. logical) in the front for certain arguments, and sometimes also the default value: “logical, whether to flip the sign of the sum (default is FALSE)”. The default value will also be printed in the Usage section, which is generated by default, so it’s not strictly necessary.

Returned values

It’s also important to add the Value that is returned by the function, which I tend to put below the @param lines. In this trivial example, it’s not very revealing, but some functions have complex outputs, e.g. a list with different elements, or a complex object, in which case it is useful to describe exactly what is being returned. If there is any ambiguity about the returned values, please indicate it in the help file, for example if the values are on the log, log2 or log10 scale, this would be the place to describe it.

...
#' @param negative logical, whether to flip the sign of the sum
#' 
#' @return the sum of the two vectors
add <- function(x,y,negative=FALSE) {
...

If a list-like object is being returned, one can use the following paradigm to describe each piece of the returned object:

#' @return a list with the following elements:
#' \itemize{
#' \item{...}
#' \item{...}
#' }

This will become a bulleted list in the help page. This paradigm can be used in the other sections, e.g. arguments as well.

Examples

In the Examples field, you can provide small examples of using the function that will run when you check the package. These are required by Bioconductor. Ideally the examples should take no more than a few seconds. The R code directly follows the line with the tag @examples:

...
#' @return the sum of the two vectors
#' 
#' @examples
#'
#' add(1:5, 6:10)
#' add(1:5, 6:10, negative=TRUE)
#'
add <- function(x,y,negative=FALSE) {
...

Import and export

While roxygen is not the only way to deal with function import and export, I find it the easiest by far. Any function which you desire to export (to make visible to users who load your package), you add the @export tag in the roxygen code. I tend to add it directly above the function definition:

...
#' @export
add <- function(x,y,negative=FALSE) {
...

Imports are only slightly more complicated. If you want to import a specific function from a package (and I recommend this, rather than importing entire packages), you can add the @importFrom tag in the roxygen code block. The format is @importFrom package function. I also recommend to use package::function in your R code when you use a function from another package. This clarifies that the function is not defined in your package and will avoid errors when you run R’s package check.

...
#' @importFrom gtools rdirichlet
#' 
#' @export
add <- function(x,y,negative=FALSE) {
  d <- gtools::rdirichlet(1, alpha=c(1,2,3))
  z <- x + y
  ...

Actually making the Rd files

Now that we have created the following roxygen code block above our file, we can run the document function from the devtools package to create the .Rd files for our package.

Altogether we have:

#' Sum of two vectors of numbers
#' 
#' This function sums two vectors of numbers and optionally
#' allows for the negative of the sum to be returned.
#' 
#' @param x a vector of numbers
#' @param y a vector of numbers
#' @param negative logical, whether to flip the sign of the sum
#'
#' @return the sum of the two vectors
#' 
#' @examples
#'
#' add(1:5, 6:10)
#' add(1:5, 6:10, negative=TRUE)
#'
#' @importFrom gtools rdirichlet
#' 
#' @export
add <- function(x,y,negative=FALSE) {
  d <- gtools::rdirichlet(1, alpha=c(1,2,3))
  z <- x + y
  if (negative) {
    z <- -1 * z
  }
  z
}

When we run document() the help pages with .Rd ending are created and also the NAMESPACE file is re-written to indicate the imports and exports. The DESCRIPTION file may also be updated with a single field RoxygenNote: x.y.z denoting the version of roxygen2, the R package that handles reading roxygen code chunks and writing .Rd files.

The following file is created in man/add.Rd. Note that it says the file is generated by the roxygen2 package and so should not be edited by hand. This is because the source of the help page is in the R file, and so any edits to this file will be wiped out the next time document() is run.

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/foo.R
\name{add}
\alias{add}
\title{Sum of two vectors of numbers}
\usage{
add(x, y, negative = FALSE)
}
\arguments{
\item{x}{a vector of numbers}

\item{y}{a vector of numbers}

\item{negative}{logical, whether to flip the sign of the sum}
}
\value{
the sum of the two vectors
}
\description{
This function sums two vectors of numbers and optionally
allows for the negative of the sum to be returned.
}
\examples{

add(1:5, 6:10)
add(1:5, 6:10, negative=TRUE)

}

Note that, if you are keeping your package on a repository such as GitHub, you will need to explicitly add man/foo.Rd to the repository and push it to the origin, so that others will have access to the documentation when they load your package.

Another note: if you just want to preview the help file, you can simply call load_all() and then type out ?foo, which will bring up your most recent edits.

The NAMESPACE file is updated from exporting all functions that do not begin with a period, to only exporting the add function:

# Generated by roxygen2: do not edit by hand

export(add)
importFrom(gtools,rdirichlet)

There is one thing left to do to manually, which is to add the following line to the DESCRIPTION file, which indicates that we are importing at least one function from the gtools package:

Imports: gtools

How to add a package vignette

We mentioned at the beginning that there are two types of documentation for R package, the help/man pages which we have showed how to build with roxygen, and the package vignettes, which are longer form discursive examples of using the functions in the package, and perhaps also including some of the motivation or theory behind the methods in the package.

Adding vignettes to a package is very simple, technically much simpler than writing the function help pages, but it takes a lot of time to hone the message of a vignette. First we will show the technical aspects of adding a vignette.

I recommend to use Rmarkdown (.Rmd) as the format of the vignette, although the Sweave format (.Rnw) is also possible (Sweave predates Rmd, producing PDF output). Rmarkdown can produce both HTML and PDF output, but typically it is used to produce HTML output. My reasoning for preferring Rmarkdown/HTML is that it allows the documentation to be easily viewed and scrolled through on computers, laptops and phones, whereas PDF is optimal for printing. I don’t think many users are printing vignettes for statistical software packages, but are instead probably often reading the documentation and vignettes on-demand while they are also in a separate window perhaps working on their own particular dataset and analysis. This is just my opinion, but I’ve heard from users that they appreciate HTML vignettes, which are just as easy to create as PDF vignettes.

We don’t teach Rmarkdown in this class explicitly, as the class teaching computing at a high level and builds on other classes which leverage Rmarkdown. It is a very easy format to learn if you haven’t written it yet (as you know, these lecture notes are written in Rmarkdown). A guide to writing Rmarkdown can be found here, or by Googling “Rmarkdown cheat sheet”.

There can be multiple vignettes for a package, but it’s probably most common that a package would have one vignette. The vignettes are then files with .Rmd endings in the vignettes directory of the package.

The preamble to a software vignette can look like the following (not all aspects here are necessary, but I find this preamble useful):

---
title: "A great title for a vignette"
date: "01/27/2021"
author: "Jane Doe, John Doe"
output: rmarkdown::html_document
abstract: |
  Lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum
  lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum
bibliography: library.bib
vignette: |
  %\VignetteIndexEntry{A great title for a vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

In order to have the package recognize and compile this vignette, we also need to do two things: (1) add knitr and rmarkdown to the Suggests field in DESCRIPTION and (2) add the following new field to the DESCRIPTION:

VignetteBuilder: knitr, rmarkdown

That concludes the technical aspects of adding an Rmarkdown vignette to a package, but I’ll add a few notes on what in my opinion should go in a package vignette, as I’ve written a number of these now and they have evolved over time to cover what users need to know.

What to put in a package vignette

Here are some bulleted thoughts on writing a good package vignette:

  • Use sections (and sub-sections) to demarcate meaningful steps in the process, e.g. “Preprocessing”, “Quality control”, “Main analysis”, “Plotting results”, “Diagnostics”, etc. You can specify toc: true (and toc_float: true) in the output section of the preamble in order to add a Table of Contents (and make it floating, that is, unfold/fold and follow as the user scrolls down). See the Rmarkdown cheat sheet for instructions on adding options to the preamble.
  • Most of the code chunks should be evaluated, it’s ok to add a few eval=FALSE code chunks for demonstration, but vignettes where none of the code chunks are evaluated are not as helpful in my opinion, as the output that a user gets when running the code may differ from what happens in the non-evaluated code chunks, which may even generate errors if they are out-of-date!
  • A Frequently Asked Questions (FAQ) section at the end can be useful to put together answers to common questions you expect (or compile) from users.
  • I like to put Acknowledgments and References sections, the first to list people who helped in writing the software, and the second to list relevant literature. See the preamble above for how to add a BibTex file (and see Rmarkdown cheat sheet for how to add references to an Rmarkdown file).

Vignette datasets

This is one of the trickiest parts to writing a good package vignette. You want the vignette to demonstrate a typical analysis, however a typical analysis will use files from an arbitrary location on a user’s machine or cluster. However, the package vignette is built whenever the package is built, and so it cannot point to a missing dataset, or else this would generate an error. Relying on a URL for downloading the dataset is therefore a bad idea. Therefore packages will typically put the example dataset (which may itself be smaller than an normal-sized dataset) into the inst/extdata or data directory of the package itself or another associated data package (the former for non-R data objects, the latter for .rda files). If the example dataset is more than 1 Mb, I’d suggest to put it into a separate data package, as the software packages should not be inflated by unnecessary data objects. Then in the vignette, you could use the following code to find the dataset:

dir <- system.file("extdata", package="fooData")
file <- file.path(dir, "example.xyz")

…for an arbitrary file, or for an .rda file named example.rda:

library(fooData)
data(example)

Obviously, these lines of code are not what a user would do when they are running your software on their files. Their data will not reside in an R package, but instead it will be at some location on their machine or cluster. And so before these types of code chunks, I will typically write a sentence that these code chunks are specifically for the vignette, and do not represent typical workflow code.