Instructions

Complete all exercises below by filling in your answers in the designated spaces. When finished:

  1. Create your own .Rmd file
  2. Complete the homework
  3. Knit the document to HTML (click the “Knit” button or press Ctrl+Shift+K / Cmd+Shift+K)
  4. Submit both your .Rmd file and the knitted Word/PDF (HTMLs are not ‘uploadable’, but can be converted to PDF) to Blackboard

Due: Wednesday, January 28 by 11:59 PM


Part 1: Data Classes

Question 1 (1 point)

R has several fundamental data classes: logical, character, and numeric (which includes integer and double). You can check an object’s class with class() or test for a specific class with functions like is.logical(), is.character(), and is.numeric().

For each of the following, first predict the class, then use class() to verify:

  1. TRUE
  2. "TRUE"
  3. 3.14159
  4. 100L
  5. 1:5

In your own words, explain the difference between TRUE and "TRUE":

Delete this text and write your explanation here.


Question 2 (1 point)

There are several ways to create vectors in R:

  • c() combines values into a vector
  • : creates a sequence of integers
  • seq() creates a sequence with more control over the output

Create the following vectors using the most appropriate method:

  1. A vector containing the values 5, 10, 15, 20, 25
  2. A vector of integers from 1 to 100
  3. A vector from 0 to 1 in increments of 0.1
  4. A vector of 7 evenly spaced values between 0 and 100 (hint: use length.out)

Question 3 (0.5 points)

By default, R treats whole numbers as doubles (floating-point numbers), not integers. To explicitly create an integer, you append L to the number.

Run the following code and explain the output:

is.integer(5)
is.integer(5L)
is.double(5)
is.double(5L)

Why does is.integer(5) return FALSE even though 5 is clearly a whole number?

Delete this text and write your explanation here.


Part 2: Working with Vectors

Question 4 (0.5 points)

You can assign names to vector elements using the names() function. This makes it easier to understand what each value represents.

Create a vector called temps containing the values 32, 72, 98, and 212. Then assign the names “freezing”, “room”, “body”, and “boiling” to the elements. Finally, print the named vector.


Part 3: Factors

Factors are used to represent categorical data in R. They look like character vectors but are stored as integers with associated levels. This distinction matters for statistical modeling and plotting.

Question 5 (1.5 points)

The built-in mtcars dataset contains a column cyl representing the number of cylinders in each car’s engine. Although stored as numeric, this is really categorical data (cars come with 4, 6, or 8 cylinders - not 5.5).

  1. Load mtcars and examine the cyl column. What class is it currently?

  2. Create a new variable cyl_factor by converting mtcars$cyl to a factor.

  3. Use levels() to see the factor levels. Use table() to count how many cars have each number of cylinders.

  4. Use str() to examine the structure of your factor. Notice how it’s stored as integers (1, 2, 3) that map to the levels.

In your own words, why might it be useful to treat number of cylinders as a factor rather than a numeric variable?

Delete this text and write your explanation here.


Question 6 (1.5 points)

Factor levels have an order, which affects how they appear in tables, plots, and model output. By default, R orders levels alphabetically (for characters) or numerically.

Consider tree size categories: “Large”, “Medium”, “Small”. Alphabetical order would display them as Large, Medium, Small - which happens to be backwards from a logical small-to-large progression.

  1. Create a character vector called sizes with these values: “Medium”, “Small”, “Large”, “Small”, “Medium”, “Large”, “Large”, “Small”, “Medium”, “Large”

  2. Convert sizes to a factor and use levels() to check the order. Notice they’re alphabetical.

  3. Use table() to count how many trees are in each size category. Notice the table follows alphabetical order.

  4. Recreate the factor with levels in a logical order: “Small”, “Medium”, “Large”. Hint: use the levels argument inside factor().

  5. Run table() again and confirm the output now displays in your specified order.

Why might the order of factor levels matter when creating plots or running statistical models?

Delete this text and write your explanation here.


Submission Checklist

Before submitting, make sure you have:


This homework is worth 6 points total.