Complete all exercises below by filling in your answers in the designated spaces. When finished:
Due: Monday, February 4 by 11:59 PM
Data frames are the workhorse data structure in R. They’re rectangular structures where each column can be a different data class, but all columns must have the same length. Understanding their structure is essential for effective data analysis.
Load the built-in mtcars dataset and examine its
structure.
Use str() to examine the structure of
mtcars. How many observations (rows) and variables
(columns) does it have?
Use dim(), nrow(), and
ncol() to verify the dimensions. Show that these three
functions give you consistent information.
Use names() or colnames() to display
the column names. Use rownames() to display the first 10
row names.
Use class() to check what type of object
mtcars is. Then use typeof() on the same
object. Explain why these two functions return different
answers.
In your own words, explain why typeof() returns “list”
even though mtcars is clearly a data frame:
Delete this text and write your explanation here.
The head() and tail() functions are useful
for quickly inspecting data frames without printing the entire
object.
Use head() to display the first 3 rows of
mtcars.
Use tail() to display the last 4 rows of
mtcars.
The summary() function provides statistical
summaries for each column. Run summary(mtcars) and examine
the output.
Delete this text and write your explanation here.
Before we index data frames, we need to master vector indexing. Vectors can be indexed by position (integer), name (character), or logical values.
Create a vector called temps containing the values 32,
68, 72, 98.6, 100, and 212. Assign names to these elements: “freezing”,
“cool”, “room”, “body”, “hot”, and “boiling”.
Display the entire named vector.
Extract the element named “body” using name indexing.
Extract the 2nd and 5th elements using integer indexing.
Extract all temperatures above 70 degrees using logical indexing.
Hint: create a logical vector first using temps > 70,
then use it to index.
R’s indexing system allows powerful data manipulation. Understanding negative indexing and logical operations is particularly useful.
Using the temps vector from Question 3, extract all
elements except “freezing” and “boiling” using negative integer
indexing.
Create a logical vector that identifies which temperatures are between 60 and 100 degrees (inclusive). Use this to extract those temperature values.
Extract all temperature values that are either less than 35 or greater than 200.
The which() function returns the positions
where a logical condition is TRUE. Use which() to find the
positions of temperatures above 95 degrees.
Data frames use the [row, column] notation for indexing.
You can use integers, names, or logical vectors for both rows and
columns.
Using the mtcars dataset:
Extract the value in the 5th row and 3rd column using integer indexing.
Extract the entire row for the “Mazda RX4” using row name indexing.
Extract the mpg column using the $
operator and display the first 6 values.
Extract the mpg and cyl columns for all
cars using column name indexing within [ , ]. Display only
the first 4 rows.
Find all cars with mpg greater than 25. Hint: first create a logical vector, then use it to index rows.
How many cars have mpg greater than 25?
Delete this text and write your answer here.
More complex indexing often requires combining multiple conditions.
Extract all columns for cars that have exactly 4 cylinders. Display only the first 5 rows of the result.
Extract the mpg, cyl, and
hp columns for cars that have more than 100 horsepower AND
get more than 20 mpg. Use the & operator to combine
conditions.
Create a new data frame called efficient_cars
containing only cars with mpg > 25 or cyl == 4. How many rows does
this new data frame have?
Calculate the mean mpg for all cars with 6 cylinders. You’ll need
to first subset the data, then use mean().
Delete this text and write your explanation here.
Working with the iris dataset:
Load the dataset with data(iris) and display the
column names.
Create a logical vector that identifies all observations where
Petal.Length is greater than twice the
Petal.Width. How many observations meet this
criterion?
In your own words, explain why you might want to index a data frame by row names rather than row numbers:
Delete this text and write your explanation here.
Before submitting, make sure you have:
This homework is worth 6 points total.