General Information

Credits: 2 | Meets: MW, 10:35-11:30am | Location: Baker 309

Instructors

Hyatt Green Elie Gurarie
Email: Email:
Phone: 315-470-4814 Phone: 315-470-3817
Office: 201 Illick Hall Office: 206 Illick Hall
Office hours: W 1-2pm Office hours: M 2-3pm

Course Description

Although technological advances and automation, from next-generation sequencing at the molecular scale to remote sensing of global phenomena at the macro-scale, have revolutionized research in many fields, they have also dramatically increased the size and number of typical datasets. R allows us to wield such large datasets and, thus, draw conclusions from potentially useful data. At the same time, scientists can also do a better job of ensuring validity of their analysis and communicating their results. The purpose of this course is to familiarize students with R and associated tools to a) allow handling and analysis of large or complex data and b) promote good practices for reproducible research.

R, a statistical programming language and environment, offers capabilities to analyze data sets, big or small. RMarkdown, an authoring format, allows you to embed your R work (data summaries, stats, plots, etc.) within plain text. The two are linked together by knitr which evaluates and translates any R code or results embedded in a RMarkdown document so it’s readable by a translator (e.g. pdfLaTeX, pandoc). Separately, these programs are extremely useful, but using them together will, in the end, help you accomplish many of the following research objectives.

  • Drastically improve reproducibility of your analyses
  • Save hours, days, or weeks of your time by automating your analysis
  • Improve your document’s appearance (which really does impress people)
  • Organizes your work and ideas so that they are more accessible to you or others
  • Imbeds your analysis methods along side your raw data and final results so they can be evaluated as a single body of work simultaneously

Course Organization

We’ll start learning both R and RMarkdown simultaneously. Dr. Green will cover foundational R, then Dr. Gurarie will introduce a few intermediate topics building on that foundation.

Background and Motivation

Today’s world is flooded with data. We don’t feel right sending students, especially ones that might stay in the sciences, out of ESF without the tools to analyze larger datasets. With the current pressing global issues, environmental scientists in particular will fall under intense scrutiny and need to have the tools to manage and analyze large, complex data sets, and communicate the results of their analyses in an effective and responsible way.

Skill Outcomes

R

  • Get data into and out of R
  • Obtain data summaries
  • Plot basic data summary plots
  • Perform basic built-in statistical functions in R
  • Write unique R functions efficiently
  • Demonstrate literate programming
  • Develop custom shiny apps
  • Practice working with spatial data in R
  • Build packages in R
  • Demonstrate effective use of version control (git and GitHub)
  • Be familiar with ways to optimize compute speed using parallelized code
  • Reach beyond course content to individualize R to your needs

Prerequisites

There are no course prerequisites. Basic stats may be helpful at some points. Programming experience (Perl, Python, Matlab, C++, etc.) would be extremely helpful, but certainly not required since this course is designed to start at square one.

Required Materials

A laptop. Please attend the first lecture with the latest R and RStudio installed. Also, install the required packages for R. Confirm proper install and that things are working properly before the first class.

Is LaTeX required?

No. LaTeX is a type setting language that enables you to produce good-looking documents (PDFs) with R code and output embedded. I highly recommend you at least become familiar with LaTeX; however, its installation takes up a relatively large amount of memory (maybe 4-5 G or so). You will need LaTeX if you choose to make your final presentation using beamer.

Use of AI

Use all the AI you need. It’s great at generating code that’s error free, good at explaining how that code works, but not great at experimental design, bringing meaning to results, and a few other important steps where humans are useful. SU and ESF students have free access to Claude and Gemini.

Evaluation

Homework (60%)

Ten homework assignments will be assigned throughout the semester, each worth 6% of your final grade. Assignments will reinforce concepts and techniques covered in class that week. You may drop your lowest homework score.

Submit your source document (.Rmd) and knitted output as either a Word document (.docx) or PDF (converted from HTML).

Final Project (40%)

Your final project demonstrates your ability to apply R skills to a real problem. Choose one of two formats:

Option A: R Package Build an R package that includes at least three custom functions. Your package should include proper documentation (roxygen2 comments), a working NAMESPACE, and a README explaining the package’s purpose and usage.

Option B: Shiny Application Build an interactive Shiny app that includes at least three custom functions. Your app should solve a meaningful problem or visualize data in a useful way, with a clean interface and clear documentation of its functionality.

Option C: Report A classic approach to the final project that more aligns with a thesis or dissertation and has typical sections: Introduction, Methods, Results, Discussion, and Conclusions. Your report should a) provide enough background about your study/data, b) describe the methods of data collection, treatment, and analysis, c) report the results, and d) discuss the results within a broader context. There is no page minimum or maximum and the report must be generated using reproducible methods used in class (i.e., Rmarkdown, knitr).

Schedule an initial project meeting with Dr. Green or Dr. Gurarie by Feb 28 to discuss your project idea.

Schedule

Date Instructor Topic HW
Mon Jan 12 Green Orientation, R, RStudio, CRAN, getting help
Wed Jan 14 Green R operators, built-in functions, saving work HW1 assigned
Mon Jan 19 MLK Day (no class)
Wed Jan 21 Green Data classes HW1 due, HW2 assigned
Mon Jan 26 Green Data structures
Wed Jan 28 Green/Gurarie Review ‘Quantum Progress’ exercise; Data Structures ; knitr options HW2 due, HW3 assigned
Mon Feb 2 Green Indexing matrices, data frames, lists
Wed Feb 4 Green Source doc formatting; sorting/subsetting data HW3 due, HW4 assigned
Mon Feb 9 Green Importing/exporting data
Wed Feb 11 Green Plotting basics I HW4 due
Mon Feb 16 Project Workday
Wed Feb 18 Gurarie More Plotting HW5 assigned
Mon Feb 23 Gurarie More Plotting (i.e., exporting plots, ggplot2)
Wed Feb 25 Green Summarizing and Aggregating data HW5 due, HW6 assigned
Mon Mar 2 Green Merging data
Wed Mar 4 Green Writing/debugging functions HW6 due, HW7 assigned
Mar 8-15 Spring Break (no class)
Mon Mar 16 Gurarie More on functions
Wed Mar 18 Green if, else, and ifelse()
Mon Mar 23 Gurarie apply and becoming a list ninja
Wed Mar 25 Green ddply() and Date classes HW7 due, HW8 assigned
Mon Mar 30 Review + Project work
Wed Apr 1 Gurarie Shiny Apps
Mon Apr 6 Green Regular Expressions
Wed Apr 8 Gurarie R packages I
Mon Apr 13 Gurarie R packages II: building and testing
Wed Apr 15 Project work day / office hours HW08 due
Mon Apr 20 Project work day / office hours
Wed Apr 23 Final presentations (5 min + 2 min Q&A)
Mon Apr 27 Final presentations
Final exam slot Final presentations

Students with Learning and Physical Disabilities

SUNY-ESF works with the Center for Disability Resources (CDR) at Syracuse University, who is responsible for coordinating disability-related accommodations. Students can contact CDR at 804 University Avenue- Room 309, 315-443-4498 to schedule an appointment and discuss their needs and the process for requesting accommodations. Students may also contact the ESF Office of Student Affairs, 110 Bray Hall, 315-470-6660 for assistance with the process. To learn more about CDR, visit https://disabilityresources.syr.edu. Authorized accommodation forms must be in the instructor’s possession one week prior to any anticipated accommodation. Since accommodations may require early planning and generally are not provided retroactively, please contact CDR as soon as possible.

Academic Dishonesty Statement

Academic dishonesty in any form will not be tolerated. This includes, but is not limited to, turning in work that is not your own in its entirety, using unauthorized materials during tests or quizzes, aiding others in dishonest behavior, or any attempts to deceive the instructor about anything. Any participation in academic dishonesty will result in a failing grade.

Checks for plagarism from your peers and internet resources will be performed routinely. I’m actually pretty proud of my plagarism checker R scripts.

Academic dishonesty is a sign of disrespect and breach of trust between a student, one’s fellow students, or the instructor(s). By registering for courses at ESF you acknowledge your awareness of the ESF Code of Student Conduct (https://www.esf.edu/student-affairs/handbook/current/code-of-conduct.php), in particular academic dishonesty includes but is not limited to plagiarism and cheating, and other forms of academic misconduct. The Academic Integrity Handbook contains further information and guidance (https://www.esf.edu/student-affairs/handbook/current/appendix-b.php).

AI-Assisted Learning Analytics

This course uses Claude AI (via ESF’s enterprise agreement with Anthropic) to analyze homework submissions and identify class-wide learning patterns. This approach aligns with SUNY’s Faculty Advisory Council on Teaching and Technology (FACT2 guidelines for responsible AI integration in higher education, which emphasize transparent use, human oversight, and student agency. We will use Sonnet 4.5 or greater versus more advanced and computationally-intense models like Opus 4.5.

Key Protections:

  • Your data is protected under FERPA and is never used to train AI models
  • AI analysis informs instruction but does not determine grades
  • All grading decisions are made by the instructor through independent review
  • You may opt out or request human-only assessment at any time
  • You may appeal any grade within 5 business days if you believe AI analysis influenced it unfairly

Above all, we will follow Stanford’s AI Golden Rule: “Use or share AI outputs as you would have others use or share output with you”.

For questions about AI use in this course, contact the instructors.

Downloading R and RStudio

To download R visit: http://www.r-project.org/
To download RStudio visit: https://posit.co/download/rstudio-desktop/

Getting Started with R and RStudio

Windows

On the Windows download page, click on base. Download the installer, Download the latest stable verison of R for Windows. It will take a few minutes to download, and I recommend you save it to your desktop (instead of running it directly from the web). Once the download is complete, double-click the icon to start the installation. Simple questions are posed during the installation. Accept the default settings.

Mac OS X

The R Project page provides some helpful advice depending on your version of the Mac OS. Follow the instructions, and if there is a problem we can help you figure it out.

Getting Started with LaTeX

Downloading LaTeX

To download LaTeX visit: https://www.latex-project.org/get/

Download/Install Help

Do not contact me for help downloading and installing R or RStudio. I say this not because I’m an unhelpful SOB, but because you need to be able to figure this out on your own. But really, if you’ve tried all you can think of and then tried three more times and you’re still SOL, contact me.

I’ll be practically useless when it comes to trouble-shooting LaTeX installs. Although I’ve found LaTeX extremely useful in the past 11 years or so, it is not my forte. Fortunately, most of the time LaTeX downloads and installs integrate seamlessly with R Studio. We’ll have to work together on solving any issues with LaTeX.

Other Useful Resources

Books

If you really want to get into it:

The Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Jay Sussman

Advanced R by Hadley Wickham
See: http://www.r-project.org/doc/bib/R-books.html

R for Data Science by Hadley Wickham
See: https://r4ds.had.co.nz/index.html

Handling and Processing Strings in R by Gaston Sanchez
See: http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf

Data

rOpenSci: Packages to get data: https://ropensci.org/packages/