Credits: 2 | Meets: MW, 10:35-11:30am | Location: Baker 309
| Hyatt Green | Elie Gurarie |
|---|---|
| Email: hgreen@esf.edu | Email: egurarie@esf.edu |
| Phone: 315-470-4814 | Phone: 315-470-3817 |
| Office: 201 Illick Hall | Office: 206 Illick Hall |
| Office hours: W 1-2pm | Office hours: M 2-3pm |
Although technological advances and automation, from next-generation sequencing at the molecular scale to remote sensing of global phenomena at the macro-scale, have revolutionized research in many fields, they have also dramatically increased the size and number of typical datasets. R allows us to wield such large datasets and, thus, draw conclusions from potentially useful data. At the same time, scientists can also do a better job of ensuring validity of their analysis and communicating their results. The purpose of this course is to familiarize students with R and associated tools to a) allow handling and analysis of large or complex data and b) promote good practices for reproducible research.
R, a statistical programming language and environment, offers
capabilities to analyze data sets, big or small. RMarkdown, an authoring
format, allows you to embed your R work (data summaries, stats, plots,
etc.) within plain text. The two are linked together by
knitr which evaluates and translates any R code or results
embedded in a RMarkdown document so it’s readable by a translator
(e.g. pdfLaTeX, pandoc). Separately, these programs are extremely
useful, but using them together will, in the end, help you accomplish
many of the following research objectives.
We’ll start learning both R and RMarkdown simultaneously. Dr. Green will cover foundational R, then Dr. Gurarie will introduce a few intermediate topics building on that foundation.
Today’s world is flooded with data. We don’t feel right sending students, especially ones that might stay in the sciences, out of ESF without the tools to analyze larger datasets. With the current pressing global issues, environmental scientists in particular will fall under intense scrutiny and need to have the tools to manage and analyze large, complex data sets, and communicate the results of their analyses in an effective and responsible way.
There are no course prerequisites. Basic stats may be helpful at some points. Programming experience (Perl, Python, Matlab, C++, etc.) would be extremely helpful, but certainly not required since this course is designed to start at square one.
A laptop. Please attend the first lecture with the latest R and RStudio installed. Also, install the required packages for R. Confirm proper install and that things are working properly before the first class.
No. LaTeX is a type setting language that enables you to produce good-looking documents (PDFs) with R code and output embedded. I highly recommend you at least become familiar with LaTeX; however, its installation takes up a relatively large amount of memory (maybe 4-5 G or so). You will need LaTeX if you choose to make your final presentation using beamer.
Use all the AI you need. It’s great at generating code that’s error free, good at explaining how that code works, but not great at experimental design, bringing meaning to results, and a few other important steps where humans are useful. SU and ESF students have free access to Claude and Gemini.
Ten homework assignments will be assigned throughout the semester, each worth 6% of your final grade. Assignments will reinforce concepts and techniques covered in class that week. You may drop your lowest homework score.
Submit your source document (.Rmd) and knitted output as either a Word document (.docx) or PDF (converted from HTML).
Your final project demonstrates your ability to apply R skills to a real problem. Choose one of two formats:
Option A: R Package Build an R package that includes at least three custom functions. Your package should include proper documentation (roxygen2 comments), a working NAMESPACE, and a README explaining the package’s purpose and usage.
Option B: Shiny Application Build an interactive Shiny app that includes at least three custom functions. Your app should solve a meaningful problem or visualize data in a useful way, with a clean interface and clear documentation of its functionality.
Option C: Report A classic approach to the final project that more aligns with a thesis or dissertation and has typical sections: Introduction, Methods, Results, Discussion, and Conclusions. Your report should a) provide enough background about your study/data, b) describe the methods of data collection, treatment, and analysis, c) report the results, and d) discuss the results within a broader context. There is no page minimum or maximum and the report must be generated using reproducible methods used in class (i.e., Rmarkdown, knitr).
Schedule an initial project meeting with Dr. Green or Dr. Gurarie by Feb 28 to discuss your project idea.
| Date | Instructor | Topic | HW |
|---|---|---|---|
| Mon Jan 12 | Green | Orientation, R, RStudio, CRAN, getting help | |
| Wed Jan 14 | Green | R operators, built-in functions, saving work | HW1 assigned |
| Mon Jan 19 | — | MLK Day (no class) | |
| Wed Jan 21 | Green | Data classes | HW1 due, HW2 assigned |
| Mon Jan 26 | Green | Data structures | |
| Wed Jan 28 | Green/Gurarie | Review ‘Quantum Progress’ exercise; Data Structures ; knitr options | HW2 due, HW3 assigned |
| Mon Feb 2 | Green | Indexing matrices, data frames, lists | |
| Wed Feb 4 | Green | Source doc formatting; sorting/subsetting data | HW3 due, HW4 assigned |
| Mon Feb 9 | Green | Importing/exporting data | |
| Wed Feb 11 | Green | Plotting basics I | HW4 due |
| Mon Feb 16 | — | Project Workday | |
| Wed Feb 18 | Gurarie | More Plotting | HW5 assigned |
| Mon Feb 23 | Gurarie | More Plotting (i.e., exporting plots, ggplot2) | |
| Wed Feb 25 | Green | Summarizing and Aggregating data | HW5 due, HW6 assigned |
| Mon Mar 2 | Green | Merging data | |
| Wed Mar 4 | Green | Writing/debugging functions | HW6 due, HW7 assigned |
| Mar 8-15 | — | Spring Break (no class) | |
| Mon Mar 16 | Gurarie | More on functions | |
| Wed Mar 18 | Green | if, else, and ifelse() |
|
| Mon Mar 23 | Gurarie | apply and becoming a list ninja | |
| Wed Mar 25 | Green | ddply() and Date classes |
HW7 due, HW8 assigned |
| Mon Mar 30 | Review + Project work | ||
| Wed Apr 1 | Gurarie | Shiny Apps | |
| Mon Apr 6 | Green | Regular Expressions | |
| Wed Apr 8 | Gurarie | R packages I | |
| Mon Apr 13 | Gurarie | R packages II: building and testing | |
| Wed Apr 15 | — | Project work day / office hours | HW08 due |
| Mon Apr 20 | — | Project work day / office hours | |
| Wed Apr 23 | — | Final presentations (5 min + 2 min Q&A) | |
| Mon Apr 27 | — | Final presentations | |
| Final exam slot | — | Final presentations |
SUNY-ESF works with the Center for Disability Resources (CDR) at Syracuse University, who is responsible for coordinating disability-related accommodations. Students can contact CDR at 804 University Avenue- Room 309, 315-443-4498 to schedule an appointment and discuss their needs and the process for requesting accommodations. Students may also contact the ESF Office of Student Affairs, 110 Bray Hall, 315-470-6660 for assistance with the process. To learn more about CDR, visit https://disabilityresources.syr.edu. Authorized accommodation forms must be in the instructor’s possession one week prior to any anticipated accommodation. Since accommodations may require early planning and generally are not provided retroactively, please contact CDR as soon as possible.
Academic dishonesty in any form will not be tolerated. This includes, but is not limited to, turning in work that is not your own in its entirety, using unauthorized materials during tests or quizzes, aiding others in dishonest behavior, or any attempts to deceive the instructor about anything. Any participation in academic dishonesty will result in a failing grade.
Checks for plagarism from your peers and internet resources will be performed routinely. I’m actually pretty proud of my plagarism checker R scripts.
Academic dishonesty is a sign of disrespect and breach of trust between a student, one’s fellow students, or the instructor(s). By registering for courses at ESF you acknowledge your awareness of the ESF Code of Student Conduct (https://www.esf.edu/student-affairs/handbook/current/code-of-conduct.php), in particular academic dishonesty includes but is not limited to plagiarism and cheating, and other forms of academic misconduct. The Academic Integrity Handbook contains further information and guidance (https://www.esf.edu/student-affairs/handbook/current/appendix-b.php).
This course uses Claude AI (via ESF’s enterprise agreement with Anthropic) to analyze homework submissions and identify class-wide learning patterns. This approach aligns with SUNY’s Faculty Advisory Council on Teaching and Technology (FACT2 guidelines for responsible AI integration in higher education, which emphasize transparent use, human oversight, and student agency. We will use Sonnet 4.5 or greater versus more advanced and computationally-intense models like Opus 4.5.
Key Protections:
Above all, we will follow Stanford’s AI Golden Rule: “Use or share AI outputs as you would have others use or share output with you”.
For questions about AI use in this course, contact the instructors.
To download R visit: http://www.r-project.org/
To download RStudio visit: https://posit.co/download/rstudio-desktop/
On the Windows download page, click on base. Download the installer, Download the latest stable verison of R for Windows. It will take a few minutes to download, and I recommend you save it to your desktop (instead of running it directly from the web). Once the download is complete, double-click the icon to start the installation. Simple questions are posed during the installation. Accept the default settings.
The R Project page provides some helpful advice depending on your version of the Mac OS. Follow the instructions, and if there is a problem we can help you figure it out.
To download LaTeX visit: https://www.latex-project.org/get/
Do not contact me for help downloading and installing R or RStudio. I say this not because I’m an unhelpful SOB, but because you need to be able to figure this out on your own. But really, if you’ve tried all you can think of and then tried three more times and you’re still SOL, contact me.
I’ll be practically useless when it comes to trouble-shooting LaTeX installs. Although I’ve found LaTeX extremely useful in the past 11 years or so, it is not my forte. Fortunately, most of the time LaTeX downloads and installs integrate seamlessly with R Studio. We’ll have to work together on solving any issues with LaTeX.
If you really want to get into it:
The Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman
Advanced R by Hadley Wickham
See: http://www.r-project.org/doc/bib/R-books.html
R for Data Science by Hadley Wickham
See: https://r4ds.had.co.nz/index.html
Handling and Processing Strings in R by Gaston Sanchez
See: http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf
CRAN: http://r-project.org
TaskViews: http://cran.r-project.org/web/views
Advanced R: http://adv-r.had.co.nz
Quick-R: http://www.statmethods.net
R-bloggers: http://www.r-bloggers.com
RTutor: http://rtutor.ai
spatial cheatsheet: http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/cheatsheet.html
spatial tutorial: https://github.com/Robinlovelace/Creating-maps-in-R
Another R course for Python Programmers: https://ramnathv.github.io/pycon2014-r/
Debugging in R: http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/
qqplot2 quick reference: http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/
help on reshaping data: http://seananderson.ca/2013/10/19/reshape.html
Ordination for Ecologists: http://ordination.okstate.edu
Writing formulae: http://ww2.coastal.edu/kingw/statistics/R-tutorials/formulae.html
R popularity: http://r4stats.com/2015/10/13/rexer-analytics-survey-results/
Networks in R: http://kateto.net/network-visualization
rOpenSci: Packages to get data: https://ropensci.org/packages/