- Conquer and permanently tame confusing folder soups of data and R scripts
- Make code ultra-compact, well-documented, highly replicable
- Dramatically shorten time to get back on track
ultimately:
- (truly) publish tools and methods
2022-04-13
ultimately:
(this is a small snippet of a data processing nightmare)
gps.dir <- "data/SonoranPronghorn/Locations_GPSCollarTelemetry/" pronghorn <- read.csv(paste0(gps.dir,f.v1[i])) %>% processRaw_v1(id = id.v1[i], filename = f.v1[i]) pronghorn.sf <- st_as_sf(df.raw, coords = c("ECEF_X..m.","ECEF_Y..m.","ECEF_Z..m.")) %>% st_set_crs(4978) %>% st_transform(4326) %>% st_coordinates with(df.raw, data.frame( File = filename, ID = CollarID, DateTime = mdy_hms(paste(UTC_Date, UTC_Time)), Latitude = ll[,"Y"], Longitude = ll[,"X"], Elevation = ll[,"Z"])) %>% subset(!is.na(DateTime))
require(pronghorn) data("pronghorn_gps") str(pronghorn_gps)
## 'data.frame': 25184 obs. of 5 variables: ## $ ID : Factor w/ 43 levels "451","F_61_8251",..: 25 25 25 25 25 25 25 25 25 25 ... ## $ DateTime : POSIXct, format: "2017-12-07 02:00:12" "2017-12-07 13:00:38" ... ## $ Latitude : num 32.4 32.4 32 32 32 ... ## $ Longitude: num -113 -113 -113 -113 -113 ... ## $ Elevation: num 458 451 583 530 538 ...
R
folder contains codedata
folder contains data - as .rda
man
folder contains documentationDESCRIPTION
- file contains essential infoNAMESPACE
- complicated file (mainly automated)DESCRIPTION
filePackage: pronghorn Type: Package Title: Sonoran pronghorn analysis project Version: 0.1.0 Author: Elie, Nicky, others Maintainer: The package maintainer <yourself@somewhere.net> Description: The pronghorn package is a PRIVATE collaborative package containing processed data, code and results for analysis of Sonoran pronghorn. License: PRIVATE Encoding: UTF-8 LazyData: false Depends: lubridate, magrittr, plyr, dplyr, ggplot2, ggpubr, sp, sf, stringr Suggests: mapview RoxygenNote: 7.1.1
R help document
R man file
% Generated by roxygen2: do not edit by hand % Please edit documentation in R/datadocumentation.R \docType{data} \name{pronghorn_gps} \alias{pronghorn_gps} \title{GPS data of Sonoran pronghorn} \format{ Contains only five columns: \describe{ \item{File}{Original file name} \item{ID}{ID of animal} \item{DateTime}{Date and time in POSIXct} \item{Longitude,Latitude}{} } } \source{ Unclear. Arizona DFG? Anyways - via Andy Goodwin. } \usage{ data(pronghorn_gps) } \description{ 43 GPS collared pronghorn collared between 2008 and 2020 }
Streamlines documentation by turning “comments” into help files. Need to install roxygen2
package and fiddle with some “build” settings.
#' GPS data of Sonoran pronghorn #' #' 43 GPS collared pronghorn collared between 2008 and 2020 #' #' @usage #' data(pronghorn_gps) #' #' @format Contains only five columns: #' \describe{ #' \item{File}{Original file name} #' \item{ID}{ID of animal} #' \item{DateTime}{Date and time in POSIXct} #' \item{Longitude,Latitude}{} #' } #' @example #' examples/pronghorn_gps_examples.R #' @source Unclear. Arizona DFG? Anyways - via Andy Goodwin. #' @keywords data
By hand
base::package.skeleton()
usethis::create_package()
build directly off of existing GitHub project
We’re going to take some of the scripts provided in fittingfunctions.R
and the datasets single.csv
and mixture.csv
and systematically build a package called competitor
(or any other name you choose).
It’s not too important what the package actually does, but basically it allows you to fit a logistic model and experiment with a competition model of two species of paramecium from a famous 1930’s experiment by Giorgi Gause.
First, know your working directory! And where you’d like your package to reside.
getwd()
## [1] "/home/elie/teaching/EFB_Rprogramming"
Also - if you’re on a Windows machine and you haven’t done so already, you will need to install the Rtools
bundle of compilation tools. The easiest way to do that is with:
installR::install.Rtools()
You’re ready to build your package skeleton:
require(usethis) create_package("combinator")
You’ll see output that looks like this:
Package: combinator Title: What the Package Does (One Line, Title Case) Version: 0.0.0.9000 Authors@R (parsed): * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID) Description: What the package does (one paragraph). License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a license Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) RoxygenNote: 7.1.0
Most importantly, it created a folder called combinator
with all the correct structure, and opened a new Rstudio project. You need to be inside of that project - it is a special package-building project.
Open it, and edit it with all sorts of useful information, including (e.g.) your license, name, contact.
There is a handy function:
use_gpl3_license("combinator")
That will edit the DESCRIPTION
file for you, and generate a crazy long description of the GNU General Public License
with the following message:
✓ Setting active project to '/home/elie/teaching/EFB_Rprogramming/combinator' ✓ Setting License field in DESCRIPTION to 'GPL-3' ✓ Writing 'LICENSE.md' ✓ Adding '^LICENSE\\.md$' to '.Rbuildignore'
None of that is important at all for a personal use package
Read the two csv files from Blackboard.
single <- read.csv("content/single.csv") mixture <- read.csv("content/mixture.csv")
Create a data
directory in your package directory and SAVE the two data files with (importantly) an .rda
extension;
save(single, file = "data/single.rda") save(mixture, file = "data/mixture.rda")
They are saved now in an R specific format, which (by the way), independent of package building you can always load via:
load("data/single.rda") load("data/mixture.rda")
Take the following three functions from fittingfunctions.R
or just copy paste from these slides:
logistic <- function(x, N0, K, r0){ K/(1 + ((K - N0)/N0)*exp(-r0*x)) } fitLogistic <- function(data, y = "N", time = "Day", N0 = 1, K = 200, r0 = 0.75){ Y <- with(data, get(y)) X <- with(data, get(time)) myfit <- nls(Y ~ logistic(X, N0, K, r0), start = list(N0 = N0, K = K, r0 = r0)) summary(myfit) } linesLogistic <- function(au.fit, ...){ curve(logistic(x, N0 = au.fit$coefficients[1,1], K= au.fit$coefficients[2,1], r0= au.fit$coefficients[3,1]), add = TRUE, ...) }
and copy (for now) EACH of these into three separate files in the R
directory called: logistic.R
, fitLogistic.R
and linesLogistic.R
. Note - you could place all of these functions into a single file, but this is cleaner / better practice.
roxygen
ROxygen is a tool which allows you to directly create documentaiton by - essentially - adding comments to your script files. This is very important and a great simplification over olden days when help files were made by hand and you had to match a whole bunch of braces.
First, install the package
install.packages("roxygen2")
And then - (a subtle step) - go to the Build > Configure Build Tools
in the menu, click on Configure
and click on the little empty square next to the Build and Restart
.
Now - modify the logistic.R
script file as follows:
#' Logistic function #' #' This is the logistic function #' which grows to a carrying capacity #' #' @export logistic <- function(x, N0, K, r0){ K/(1 + ((K - N0)/N0)*exp(-r0*x)) }
This will generate a fairly barebones help file, with a title and a brief description. Note, every comment line leads off with a #'
. Also - of huge importance - the #' @export
line tells the R package that you want to “export”, i.e. make available, this function.
Enter Ctrl-shift-Build
or Build > Clean and Rebuild
All sorts of exciting things will happen (and should work). Eventually you’ll get a message that says:
Restarting R session... > library(combinator)
Which means your package is built and ready to go!
Now, type in ?logistic
and bask in the glory of your first help file. Click on index
at the bottom of the help file.
Exercise: Modify the other two functions with a title and a description and rebuild the package.
Here are some additions to the help file that provide more context:
#' Logistic function #' #' This is the logistic function #' which grows to a carrying capacity #' #' @param x time #' @param N0 initial time #' @param K carrying capacity #' @param r0 intrinsic growth rate #' @examples curve(logistic(x, .01, 1, 10)) #' #' @export logistic <- function(x, N0, K, r0){ K/(1 + ((K - N0)/N0)*exp(-r0*x)) }
This is a slightly odd thing. But if you’d want to both access and document the data (and you should) then you need to make a new R script file, e.g. datadocumentation.R
and fill out a documentation file just as for a function.
With that, rebuild the package and revisit the dataset.
#' Single separate paramecium growth #' #' Growth of two species of paramecium, #' P aurelia and P caudatum. #' #' @usage data(single) #' #' @format Three columns: #' \describe{ #' \item{Day}{Day of experiment} #' \item{caudatum}{volume of P caudatum} #' \item{aurelia}{volume of P aurelia} #' } #' #' @examples #' data(single) #' plot(aurelia ~ Day, data = single) #' @source Gause (1934) The Struggle for Existence #' @keywords data "single"
You can make compact example code and tuck them into separate files. For example, save the following script into a file called logisticFitExample.R
:
require(combinator) data(single) plot(aurelia ~ Day, data = single, col = 1) points(caudatum ~ Day, data = single, col = 2) fit1 <- fitLogistic(single, y = "aurelia", time = "Day", 1, 200, .75) fit2 <- fitLogistic(single, y = "caudatum", time = "Day", 1, 200, .75) linesLogistic(fit1, lwd = 3) linesLogistic(fit2, col = 2, lwd = 3)
And now add the following line to the fitLogistic.R
file:
#' @example examples/fitLogisticExample.R
And the complete example will show up in the bottom. I often do this, actually, for complex data files, so that I can tuck away portions of code that allow me to quickly and easily visualize the data frames.
Note - weirdly - @example
works with separate scripts, but @examples
works with single lines of example code.
is gravy.
Other resources (varying techniques and amount of detail):