Create A Package

A package is a collection of functions that work together in a cohesive manner to achieve a goal. It (should) include detailed documentation through man pages and vignettes and (should) be tested for accuracy and efficiency. Writing R extensions is a very detailed manual provided by CRAN that explains writing packages and what the structure of a package should look like. This vignette is going to walk you through making your first package in RStudio as well as how to submitt a package to Bioconductor.

Using devtools to create a package

The devtools package provides a lot of options and utility for helping to construct a new package. You can get a list of all available devtools functions with ls("package:devtools").

Some useful references for using devtools to build packages are Rstudio Devtools Cheetsheet and Jennifer Bryan class.

Package shell

library(devtools)
create("myFirstPackage")

create() will create all the necessary files and sub-directories that are required by R to be a valid package: DESCRIPTION, NAMESPACE, and R directory. The DESCRIPTION file contains the basic information about the package and you will have to edit so that the information is pertinent to your package. The NAMESPACE file describes which functions will be imports and exports of the namespace. The R directory will contain the R code files for your package.

After running create(), the package has a vaild package structure which means it can be installed and loaded:

Version control

It is an excellent idea to version control whenever creating a package and especially when collaborating on a project, where multiple users are allowed to make changes. Version control allows for a constant recored of changes that can be advanced or reverted if necessary.

Only a project can be version controlled and to make a directory a project in RStudio go to: File -> New Project. In this case we started creating the directory so we will follow the prompts for the option Use Existing Directory. Now that it is a project we can go to Tools -> Version Control -> Project Setup and change the Version Control System to Git, then follow the prompts. Notice in the RStudio pane for environments/history/build there is a new tab named ‘Git’. The package can now start using git version control through making commits. To make a commit, you can go to the Git tab, select the check box next to any files that have been modified, added, or deleted that you would like to track, and select commit. Enter a new commit message in the window that pops up and select commit.

It is important to tell Git just who we are. In RStudio, select Tools -> Shell and type the following making sure to substitue your user.email and user.name. If you have github we recommend using your email and user.name associated with github here.

git config --global user.email "<someemail@gmail.com>"
git config --global user.name "<githubUserName>"

We could stop here but we also would like to put the package on GitHub. These next steps assume you have a github account. First, in RStudio go to Tools -> Global Options and select Git/SVN. Ensure the paths are correct. If you have not linked an RStudio project to GitHub, select Create RSA key. Close the window. Click on View public key and copy the displayed public key. Now in a web browser, open your GitHub accout. Go to Settings and SSH and GPG keys. Click on the option for New SSH key and paste the public key that was copied. Also on GitHub, create a new repository with the same name as the one you created with create() in RStudio. Back in RStudio, select Tools -> Shell and type the following making usre to substitue your GitHub user.name and the new package name.

git remote add origin https://github.com/<github user.name>/<package repo name>.git
git remote set-url origin git@github.com:<github user.name>/<package repo name>.git
git pull origin master
git push -u origin master

For instance, this is what the commands would look like for me:

git remote add origin https://github.com/Kayla-Morrell/myFirstPackage.git
git remote set-url origin git@github.com:Kayla-Morrell/myFirstPackage.git
git pull origin master
git push -u origin master

The git remote add command will create a new connection to the remote repository url and assign it the shortname ‘origin’ for easy referencing moving forward. Then the git remote set-url command is going to switch the url of the remote from ‘https’, which is public read only access, to ‘SSH’ so that you as a developer can have read and write access to the repository. git pull is going to fetch and download content from the remote repository and update the local repository to match. git push does the exact opposite, it will upload the local repo changes to the remote repo.

Now if you look in the RStudio tab for Git, the push and pull options are available. You can now push and pull from/to the local and GitHub repository version of your package.

DESCRIPTION file

devtools provides built in functions for building, checking and installing a package. The package we created earlier using create() has a valid package structure but if we did check() we will find the DESCRIPTION file needs to be updated. The information in the DESCRIPTION file needs to be reflective of your package. These are the fields that should be changed:

  • Title: Should be brief, but descriptive.
  • Version: Should be 0.99.0 for pre-release packages.
  • : Only one person should be listed as maintainer (cre). We do accept Author/Maintainer for this field, either can be used but not both.
  • Description: Should be relatively short but a detailed overview of what the package functionality entails.
  • License: Preferably a standard license, Bioconductor core packages typically use Artistic-2.0. If using a non-standard license, must include a LICENSE file in your package and use file LICENSE in this field.
  • LazyData: Should be set to false, we find that this field being true can slow down the loading of a package especially if it contains large data.

Throughout the development of your package you may have to update the DESCRIPTION file for appropriate Depends, Imports, and Suggests fields as you incorporate more functionality from other packages. As the package is developing we also require having a biocViews: field in the DESCRIPTION file. This field will contain at least two biocViews categories that reflect the nature of the package.

R functions

Now we want to starting writing R functions. In RStudio you can open an empty file by doing File -> New File and selecting R Script. Save the file in the R directory. Write your functions and document. You can either document functions manually or if you use roxygen you can use the devetools function document(). See the Writing R extensions for manual creation of Rd files, which belong in the man directory, but roxygen is growing increasingly popular. Some helpful links for roxygen tags can be found at RStudio Devtools Cheatsheet and Roxygen Help.

Some useful devtools commands while creating functions are:

  • load_all() which loads all package functions in environment to test,
  • check() which checks the package (R CMD check),
  • document() which generates or updates any documentation files.

Using the RStudio options Build -> Build and Reload and Build -> Clean and Rebuild will also help with function creation.

It is also recommended to have a man page for you package. devtools provides a framework for this. To create the file that needs to be modified use the function use_package_doc().

If you import any functions in your code, don’t forget to update the DESCRIPTION file for Depends, Imports, or Suggests. If the function provides essential functionality for users of your package, it belongs in Depends. It is unusual for more than three packages to be listed as Depends. For packages that provide functions, methods, or classes that are used inside your package namespace, they belong in Imports. Most packages will be listed here. For packages that are used in vignettes, examples, or conditional code, they should be listed as Suggests. This includes examples that may use annotation and/or experiment packages.

Testing

It is highly recommened to add unit tests to your package. Unit tests unsure that the package is working as expected. The two main ways to test are using RUnit or testthat. testthat functionality is included in devtools by using use_testthat(). This function will set up the needed directory structure and add the package suggestion to the DESCRIPTION file. Here are some examples of the structure of tests for testthat:

  • expect_identical(),
  • expect_true(),
  • expect_error().

There are other options as well that are discussed in testthat Wickham and testthat.

Vignettes

Vignettes are another major documentation piece to a package. More and more repository systems (CRAN, Bioconductor, ROpenSci) are making vignettes a standard requirement. Vignettes are contain a more indepth description and examples of the package usage. devtools also provides the function use_vignette() to set up the directory structure and the initial file for a vignette. For Bioconductor submissions we recommend changing the output: section for the vignette header to the following, which would require adding BiocStyle to the Suggests field in the DESCRIPTION file,

output:
  BiocStyle::html_document:
    toc: true
    toc_depth: 2

Or if BiocStyle is already installed on your system, you can also use RStudio to set up the vignette by doing New File -> Rmarkdown -> From Template -> Bioconductor HTML/PDF Vignette.

A helpful rmarkdown link, which is commonly used for vignette creation, can be found here: rmarkdown cheatsheet.

Bioconductor Standards

Now that we have gone over the basics of how to create a package, we will review what we look for (generally) in Bioconductor packages. Being mindful of these guidelines while developing your package will help the whole submission process.

  1. Proper coding and efficient coding:
  1. Bioconductor interconnectivity and S4 classes
  • S4 over S3 classes
  • Reuse existing infrastructure first!
    • DNA/RNA - Biostrings DNAstringset
    • Gene sets - GSEABase GeneSet
    • Genomic intervals - GenomicRanges Granges
    • Rectangular feature x sample data - (RNAseq count matrix, microarray) - SummarizedExperiment/MuliAssayExperiment
    • Single cell - SingleCellExperiment
    • Mass spec - MSnbase
    • import/loading data
      • GTF, GFF, BED, BigWig, … rtracklayer::import()
      • FASTA Biostrings::readDNAStringSet()
      • SAM/BAM Rsamtools::scanBam(), GenomicAlignments::readGAlignment*()
      • VCF VariantAnnotation::readVcf()
      • FASTQ ShortRead::readFastq()
  • BiocViews
  1. Tests
  1. Complete and detailed vignette(s) and man pages, with executable examples

  2. Check time < 5 minutes

  3. Package size < 5Mb

  4. All package guidelines can be found here

IMPORTANT: A clean build, check, and BiocCheck is not a guaranteed acceptance. The package will still go through a formal review process.

Submitting to Bioconductor

Be sure to read the Contributions Page and when you are ready to submit open a New Issue. The Title: should be the name of your package. Once the package is approved for building, don’t forget to set up the remotes. Some details about the review process can be found here.