Starting with Data


To download this RMarkdown file, go to:

Click “Download” and save the RMarkdown (.Rmd) file in the same folder as your RStudio project

Workshop Overview

Teaching: 20 minutes
Exercises: 10 minutes

Guiding Questions:

What is a data.frame?
How can I read a complete csv file into R?
How can I get basic summary information about my dataset?
Why would I want strings to be treated differently?

Lesson Objectives:

Describe what a data frame is.
Load external data from a .csv file into a data frame.
Summarize the contents of a data frame.
Describe the difference between a factor and a string.

What are data frames and tibbles?

Data frames are the de facto data structure for tabular data in R, and what we use for data processing, statistics, and plotting.

A data frame is the representation of data in the format of a table where the columns are vectors that all have the same length. Data frames are analogous to the more familiar spreadsheet in programs such as Excel, with one key difference. Because columns are vectors, each column must contain a single type of data (e.g., characters, integers, factors). For example, here is a figure depicting a data frame comprising a numeric, a character, and a logical vector.