vignettes/CreateAPackage.Rmd
CreateAPackage.Rmd
A package is a collection of functions that work together in a cohesive manner to achieve a goal. It (should) include detailed documentation through man pages and vignettes and (should) be tested for accuracy and efficiency. Writing R extensions is a very detailed manual provided by CRAN that explains writing packages and what the structure of a package should look like. This vignette is going to walk you through making your first package in RStudio as well as how to submitt a package to Bioconductor.
devtools
to create a packageThe devtools
package provides a lot of options and utility for helping to construct a new package. You can get a list of all available devtools
functions with ls("package:devtools")
.
Some useful references for using devtools
to build packages are Rstudio Devtools Cheetsheet and Jennifer Bryan class.
create()
will create all the necessary files and sub-directories that are required by R to be a valid package: DESCRIPTION, NAMESPACE, and R directory. The DESCRIPTION file contains the basic information about the package and you will have to edit so that the information is pertinent to your package. The NAMESPACE file describes which functions will be imports and exports of the namespace. The R directory will contain the R code files for your package.
After running create()
, the package has a vaild package structure which means it can be installed and loaded:
It is an excellent idea to version control whenever creating a package and especially when collaborating on a project, where multiple users are allowed to make changes. Version control allows for a constant recored of changes that can be advanced or reverted if necessary.
Only a project can be version controlled and to make a directory a project in RStudio go to: File -> New Project
. In this case we started creating the directory so we will follow the prompts for the option Use Existing Directory
. Now that it is a project we can go to Tools -> Version Control -> Project Setup
and change the Version Control System
to Git
, then follow the prompts. Notice in the RStudio pane for environments/history/build there is a new tab named ‘Git’. The package can now start using git
version control through making commits. To make a commit, you can go to the Git tab, select the check box next to any files that have been modified, added, or deleted that you would like to track, and select commit
. Enter a new commit message in the window that pops up and select commit
.
It is important to tell Git just who we are. In RStudio, select Tools -> Shell
and type the following making sure to substitue your user.email and user.name. If you have github we recommend using your email and user.name associated with github here.
git config --global user.email "<someemail@gmail.com>"
git config --global user.name "<githubUserName>"
We could stop here but we also would like to put the package on GitHub. These next steps assume you have a github account. First, in RStudio go to Tools -> Global Options
and select Git/SVN. Ensure the paths are correct. If you have not linked an RStudio project to GitHub, select Create RSA key
. Close the window. Click on View public key
and copy the displayed public key. Now in a web browser, open your GitHub accout. Go to Settings
and SSH and GPG keys
. Click on the option for New SSH key
and paste the public key that was copied. Also on GitHub, create a new repository with the same name as the one you created with create()
in RStudio. Back in RStudio, select Tools -> Shell
and type the following making usre to substitue your GitHub user.name and the new package name.
git remote add origin https://github.com/<github user.name>/<package repo name>.git
git remote set-url origin git@github.com:<github user.name>/<package repo name>.git
git pull origin master
git push -u origin master
For instance, this is what the commands would look like for me:
git remote add origin https://github.com/Kayla-Morrell/myFirstPackage.git
git remote set-url origin git@github.com:Kayla-Morrell/myFirstPackage.git
git pull origin master
git push -u origin master
The git remote add
command will create a new connection to the remote repository url and assign it the shortname ‘origin’ for easy referencing moving forward. Then the git remote set-url
command is going to switch the url of the remote from ‘https’, which is public read only access, to ‘SSH’ so that you as a developer can have read and write access to the repository. git pull
is going to fetch and download content from the remote repository and update the local repository to match. git push
does the exact opposite, it will upload the local repo changes to the remote repo.
Now if you look in the RStudio tab for Git, the push and pull options are available. You can now push and pull from/to the local and GitHub repository version of your package.
devtools
provides built in functions for building, checking and installing a package. The package we created earlier using create()
has a valid package structure but if we did check()
we will find the DESCRIPTION file needs to be updated. The information in the DESCRIPTION file needs to be reflective of your package. These are the fields that should be changed:
cre
). We do accept Author/Maintainer for this field, either can be used but not both.file LICENSE
in this field.Throughout the development of your package you may have to update the DESCRIPTION file for appropriate Depends, Imports, and Suggests fields as you incorporate more functionality from other packages. As the package is developing we also require having a biocViews:
field in the DESCRIPTION file. This field will contain at least two biocViews categories that reflect the nature of the package.
Now we want to starting writing R functions. In RStudio you can open an empty file by doing File -> New File
and selecting R Script
. Save the file in the R directory. Write your functions and document. You can either document functions manually or if you use roxygen you can use the devetools function document()
. See the Writing R extensions for manual creation of Rd files, which belong in the man directory, but roxygen is growing increasingly popular. Some helpful links for roxygen tags can be found at RStudio Devtools Cheatsheet and Roxygen Help.
Some useful devtools
commands while creating functions are:
load_all()
which loads all package functions in environment to test,check()
which checks the package (R CMD check),document()
which generates or updates any documentation files.Using the RStudio options Build -> Build and Reload
and Build -> Clean and Rebuild
will also help with function creation.
It is also recommended to have a man page for you package. devtools
provides a framework for this. To create the file that needs to be modified use the function use_package_doc()
.
If you import any functions in your code, don’t forget to update the DESCRIPTION file for Depends, Imports, or Suggests. If the function provides essential functionality for users of your package, it belongs in Depends. It is unusual for more than three packages to be listed as Depends. For packages that provide functions, methods, or classes that are used inside your package namespace, they belong in Imports. Most packages will be listed here. For packages that are used in vignettes, examples, or conditional code, they should be listed as Suggests. This includes examples that may use annotation and/or experiment packages.
It is highly recommened to add unit tests to your package. Unit tests unsure that the package is working as expected. The two main ways to test are using RUnit
or testthat
. testthat
functionality is included in devtools
by using use_testthat()
. This function will set up the needed directory structure and add the package suggestion to the DESCRIPTION file. Here are some examples of the structure of tests for testthat
:
expect_identical()
,expect_true()
,expect_error()
.There are other options as well that are discussed in testthat Wickham and testthat.
Vignettes are another major documentation piece to a package. More and more repository systems (CRAN, Bioconductor, ROpenSci) are making vignettes a standard requirement. Vignettes are contain a more indepth description and examples of the package usage. devtools
also provides the function use_vignette()
to set up the directory structure and the initial file for a vignette. For Bioconductor submissions we recommend changing the output:
section for the vignette header to the following, which would require adding BiocStyle
to the Suggests field in the DESCRIPTION file,
output:
BiocStyle::html_document:
toc: true
toc_depth: 2
Or if BiocStyle
is already installed on your system, you can also use RStudio to set up the vignette by doing New File -> Rmarkdown -> From Template -> Bioconductor HTML/PDF Vignette
.
A helpful rmarkdown link, which is commonly used for vignette creation, can be found here: rmarkdown cheatsheet.
Now that we have gone over the basics of how to create a package, we will review what we look for (generally) in Bioconductor packages. Being mindful of these guidelines while developing your package will help the whole submission process.
Biostrings
DNAstringset
GSEABase
GeneSet
GenomicRanges
Granges
SummarizedExperiment
/MuliAssayExperiment
SingleCellExperiment
MSnbase
rtracklayer::import()
Biostrings::readDNAStringSet()
Rsamtools::scanBam()
, GenomicAlignments::readGAlignment*()
VariantAnnotation::readVcf()
ShortRead::readFastq()
Complete and detailed vignette(s) and man pages, with executable examples
Check time < 5 minutes
Package size < 5Mb
All package guidelines can be found here
IMPORTANT: A clean build, check, and BiocCheck is not a guaranteed acceptance. The package will still go through a formal review process.
Be sure to read the Contributions Page and when you are ready to submit open a New Issue. The Title: should be the name of your package. Once the package is approved for building, don’t forget to set up the remotes. Some details about the review process can be found here.