Instructions

Complete all exercises below by filling in your answers in the designated spaces. When finished:

  1. Create your own .Rmd file
  2. Complete the homework creating code chunks where needed
  3. Knit the document to HTML (click the “Knit” button or press Ctrl+Shift+K / Cmd+Shift+K)
  4. Submit both your .Rmd file and the knitted Word/PDF (HTMLs are not ‘uploadable’, but can be converted to PDF) to Blackboard

Due: Monday, February 4 by 11:59 PM


Part 1: Understanding Data Frame Structure

Data frames are the workhorse data structure in R. They’re rectangular structures where each column can be a different data class, but all columns must have the same length. Understanding their structure is essential for effective data analysis.

Question 1 (0.75 points)

Load the built-in mtcars dataset and examine its structure.

  1. Use str() to examine the structure of mtcars. How many observations (rows) and variables (columns) does it have?

  2. Use dim(), nrow(), and ncol() to verify the dimensions. Show that these three functions give you consistent information.

  3. Use names() or colnames() to display the column names. Use rownames() to display the first 10 row names.

  4. Use class() to check what type of object mtcars is. Then use typeof() on the same object. Explain why these two functions return different answers.

In your own words, explain why typeof() returns “list” even though mtcars is clearly a data frame:

Delete this text and write your explanation here.


Question 2 (0.75 points)

The head() and tail() functions are useful for quickly inspecting data frames without printing the entire object.

  1. Use head() to display the first 3 rows of mtcars.

  2. Use tail() to display the last 4 rows of mtcars.

  3. The summary() function provides statistical summaries for each column. Run summary(mtcars) and examine the output.

Delete this text and write your explanation here.


Part 2: Indexing Vectors

Before we index data frames, we need to master vector indexing. Vectors can be indexed by position (integer), name (character), or logical values.

Question 3 (0.75 points)

Create a vector called temps containing the values 32, 68, 72, 98.6, 100, and 212. Assign names to these elements: “freezing”, “cool”, “room”, “body”, “hot”, and “boiling”.

  1. Display the entire named vector.

  2. Extract the element named “body” using name indexing.

  3. Extract the 2nd and 5th elements using integer indexing.

  4. Extract all temperatures above 70 degrees using logical indexing. Hint: create a logical vector first using temps > 70, then use it to index.


Question 4 (0.75 points)

R’s indexing system allows powerful data manipulation. Understanding negative indexing and logical operations is particularly useful.

  1. Using the temps vector from Question 3, extract all elements except “freezing” and “boiling” using negative integer indexing.

  2. Create a logical vector that identifies which temperatures are between 60 and 100 degrees (inclusive). Use this to extract those temperature values.

  3. Extract all temperature values that are either less than 35 or greater than 200.

  4. The which() function returns the positions where a logical condition is TRUE. Use which() to find the positions of temperatures above 95 degrees.


Part 3: Indexing Data Frames

Data frames use the [row, column] notation for indexing. You can use integers, names, or logical vectors for both rows and columns.

Question 5 (1.5 points)

Using the mtcars dataset:

  1. Extract the value in the 5th row and 3rd column using integer indexing.

  2. Extract the entire row for the “Mazda RX4” using row name indexing.

  3. Extract the mpg column using the $ operator and display the first 6 values.

  4. Extract the mpg and cyl columns for all cars using column name indexing within [ , ]. Display only the first 4 rows.

  5. Find all cars with mpg greater than 25. Hint: first create a logical vector, then use it to index rows.

How many cars have mpg greater than 25?

Delete this text and write your answer here.


Question 6 (1 point)

More complex indexing often requires combining multiple conditions.

  1. Extract all columns for cars that have exactly 4 cylinders. Display only the first 5 rows of the result.

  2. Extract the mpg, cyl, and hp columns for cars that have more than 100 horsepower AND get more than 20 mpg. Use the & operator to combine conditions.

  3. Create a new data frame called efficient_cars containing only cars with mpg > 25 or cyl == 4. How many rows does this new data frame have?

  4. Calculate the mean mpg for all cars with 6 cylinders. You’ll need to first subset the data, then use mean().

Delete this text and write your explanation here.


Part 4: Working with Multiple Data Structures


Question 7 (0.5 points)

Working with the iris dataset:

  1. Load the dataset with data(iris) and display the column names.

  2. Create a logical vector that identifies all observations where Petal.Length is greater than twice the Petal.Width. How many observations meet this criterion?

In your own words, explain why you might want to index a data frame by row names rather than row numbers:

Delete this text and write your explanation here.


Submission Checklist

Before submitting, make sure you have:


This homework is worth 6 points total.