Summary
- Spouting the virtues of replicable, reproducible, and distributable research is commonplace.
- However, there is a shortage of current, descriptive, and detailed guides for enacting such worfklows.
- In this series of vignettes, we walk provide detailed guides for several key components to replicable, reproducible, and distributable workflows.
This vignette is an excerpt from the DANTE Project’s beta release of Open, Reproducible, and Distributable Research with R Packages. To view the entire current release, please visit the bookdown site. If you would like to contribute to this bookdown project, please visit the project GitLab repository.
Default Package Files
RStudio leaves you with a handful of files and directories after creating a new package. We’ll review the new files and create some additional commonly used directories.
.Rbuildignore
The build ignore file is where you list files you do not want to be bundled up with your package, but are inside the package root directory because they are used for package development. These may include images, notes, files that are used for pre-processing of larger embedded datasets, or any other file that is non-essential to the final package. By default the RStudio project files are listed in.Rbuildignore
. Build ignore uses regular expression syntax, but if you’re not comfortable with regular expression you can useusethis::use_build_ignore()
.DESCRIPTION
The package description file contains basic information about your package. By default it’s fairly simple. The only mandatory fields arePackage
,Version
,License
,Description
,Title
,Author
, andMaintainer
, however, it would be very rare to not have anImports
andSuggests
field.- The
man/
directory contains automatically generated manual and reference materials byroxygen2
. You do not have to edit this directory. It will populate every time youInstall and Restart
your package as long as you followed the steps above toConfigure Build Tools...
. - The
NAMESPACE
file is also automatically generated byroxygen2
when youInstall and Restart
. It’s not something to cloud your mind with as a beginner; more information is available here. - The
R/
directory contains all of your functions. By default it will contain thehello.R
file for thehello()
function. This directory should only have function files, and adata.R
file that we will discuss later. Current best practices are for each function to be in a single file named after the function, but you may also place multiple functions in a single file. - The final default file is the RStudio project file (
myresearch.Rproj
). You can execute this file from anywhere to open up an RStudio session for your package project.
The DESCRIPTION
The DESCRIPTION
merits additional discussion as one of the primary package files you edit directly. We can address important fields in more detail:
Default Fields
Title:
is slightly more explanatory title to your project beyond the package name.Version:
is not terribly important in this context. I usually leave it at the default. You can read more about R package versioning here.Authors:
is self explanatory and may be written in plain text, however, it’s strongly suggested that you replace this with theAuthors@R:
field. This sets the authors and roles in a more programmatic way and establishes emails and roles (author"aut"
, creator"cre"
, contributor"ctb"
, copyright holder"cph"
).
@R: person("Joshua", "Brinks", email = "jbrinks@isciences.com", Authors
role = c("aut", "cre"))
Maintainer:
is the package maintainer. Typically the same as the author. Written in plain text followed by the email address:Joshua Brinks <jbrinks@isciences.com>
.Description:
is a comprehensive description of your package functions. I usually include a few sentences for context and functionality.License:
is the operating license of your package determines the legality of how and whom may use your package. Being this is an article on open science we strongly recommend using a Free or Open Source Software Licence (FOSS) when possible, however, there are several contexts where this simply doesn’t work. There is lots of discussion regarding comparative software licenses on the internet. I suggest you acquire a greater understanding. When possible I implement a GPL3 open source license withusethis
.
::use_gpl3_license() usethis
Encoding:
determines your package encoding. Usually a good idea to leave thisUTF-8
.LazyData:
determines how the data you embed in your package is loaded when your package is loaded. It’s best to leave this set totrue
. This ensures that data embedded in your package is only loaded into memory when you call on the dataset. Otherwise any large datasets will use up memory as soon as your package is loaded.RoxygenNote:
specifies the version ofroxygen2
being used to manage your package documentation. It will be updated automatically.
Additional Fields
These are other common fields.
URL:
Any appropriate package or personal website. I usually list the Gitpkgdown
website here.Imports:
is a list of packages that your package depends on to carry out its core functions found in theR/
directory. If you have a function in theR/
directory that usesdata.table::merge()
,dplyr::filter()
, andggplot2::geom_point()
, these packages must be listed in theImports:
. This ensures that when your package is installed additional dependencies are also installed. Syntax for theImports:
andSuggests:
is:
: data.table, Imports
dplyr,
ggplot2
Suggests:
is similar toImports:
but for packages that are used in your vignettes, but not listed as part of your coreImports:
. These are typically packages used for your vignettes (rmarkdown
,leaflet
), but you may also have a package you use for a rare function in theR/
directory that you don’t want to automatically load as a courtesy for your users.Remotes:
is used to specify packages your package depends on that are not released on CRAN but are available on GitHub or GitLab. The syntax isgitsite::repository
.
: gitlab::dante-sttr/commonCodes, Remotes
::dante-sttr/untools gitlab
The simplest way to add a package dependency is with usethis
, although I typically edit the DESCRIPTION
file directly.
::use_package("ggplot2") usethis
Here is an example of a completed DESCRIPTION
from the duplicator
package.
Importing data.table
and tidyverse
Packages
When importing either the data.table
or tidyverse
packages you must accommodate their special operators and naming conventions (data.table
doesn’t need quoted variables in functions) that are not part of base R programming. For tidyverse
this refers to the %>%
(pipe) operator that comes from the magrittr
package. data.table
implements several additional operators including c(.N, .I, ':=')
. If these operators are not addressed your package will kickback warnings and errors when executing build checks. usethis
has functions to assist setting these up.
::use_data_table() usethis
::use_pipe() usethis
These functions will adjust your imports section. Additionally, they will both create non function files in your R/
directory (utils-data-table.R
and utils-pipe.R
). The utils-data-table.R
file needs an addendum to handle the special operators. The base file created is:
# data.table is generally careful to minimize the scope for namespace
# conflicts (i.e., functions with the same name as in other packages);
# a more conservative approach using @importFrom should be careful to
# import any needed data.table special symbols as well, e.g., if you
# run DT[ , .N, by='grp'] in your package, you'll need to add
# @importFrom data.table .N to prevent the NOTE from R CMD check.
# See ?data.table::`special-symbols` for the list of such symbols
# data.table defines; see the 'Importing data.table' vignette for more
# advice (vignette('datatable-importing', 'data.table')).
#
#' @import data.table
NULL
As stated you must add the additional line for their operators. I would add the most common.
# data.table is generally careful to minimize the scope for namespace
# conflicts (i.e., functions with the same name as in other packages);
# a more conservative approach using @importFrom should be careful to
# import any needed data.table special symbols as well, e.g., if you
# run DT[ , .N, by='grp'] in your package, you'll need to add
# @importFrom data.table .N to prevent the NOTE from R CMD check.
# See ?data.table::`special-symbols` for the list of such symbols
# data.table defines; see the 'Importing data.table' vignette for more
# advice (vignette('datatable-importing', 'data.table')).
#
#' @import data.table
#' @importFrom(data.table, .N, .I, ':=')
NULL
Additional Directories
There are additional directories that are both common constructs in the R community and helpful for research specific workflows. These include raw-data/
, raw-scripts/
, and inst/
. Click on New Folder
in the RStudio Files window to create these directories.
- The
/raw-data/
folder is where you place scripts used to import, pre-process, and embed datasets into your package. This will be explained in greater detail later. - The
/raw-scripts/
directory is where you keep standard scripts with notes as you work out your workflow and code you will eventually wrap up and document in a function. This directory is less common and the naming is not widely accepted, however, it’s good practice to keep rough drafts of the code prior to wrapping it up into a function. - The
/inst/
folder contains additional files vital to your package that are not scripts, vignettes, or can not be directly embedded as.RData
files. These files will be installed along with the package when someone else installs your package. Therefore, some consideration should be given to including massive amounts of data or otherwise potentially harmful or sensitive scripts and data. These may be complex copyright or licensing agreements that can not be captured by theDESCRIPTION
, external and unprocessed data, the package citation, and code from other languages. When your package is installed, everything in the/inst/
folder will be moved up to the root level. This is somewhat confusing at first. For example, while working directly on your package you may have:
myresearch/inst/COPYRIGHT.TXT
myresearch/inst/extdata/france.shp
When your package is installed locally or on another computer these files are accessible at:
myresearch/COPYRIGHT.TXT
myresearch/extdata/france.shp
We will discuss how to programmatically access /inst/
data in the embedded data section.
At this time you may also create the data/
and vignettes/
folders, but usethis
will do this automatically with functions specifically designed to embed data and create vignettes.
Add new comment