Review

  • Which data structures have we learned so far?
    • Vector:
      • Atomic Vector
      • List
    • Matrix
    • Array

Data Frames

  • Data frames are the two-dimensional version of a list.
  • Store data in a table format, like an Excel spreadsheet.

Data Frames

faithful

Data Frames

  • Data frames group vectors together into a two-dimensional table.
  • Each vector becomes a column in the table.
    • Hence, every column must be of the same length.
  • Each column of a data frame can contain a different type of data.
  • But within a column, every cell must be of the same type of data.

Data Frames

A data frame is a list of vectors of the same length.

Creating a Data Frame

  • Use data.frame() function to creata a data frame by hand:
cards <- data.frame(face = c("ace", "two", "six"),
                    suit = c("clubs", "clubs", "clubs"),
                    value = c(1, 2, 3))
cards

Creating a Data Frame

  • The str() function show what types of objects are grouped together in the data frame.
str(cards)
## 'data.frame':    3 obs. of  3 variables:
##  $ face : Factor w/ 3 levels "ace","six","two": 1 3 2
##  $ suit : Factor w/ 1 level "clubs": 1 1 1
##  $ value: num  1 2 3

Loading Data Using RStudio Interface

  • Download the file deck.csv to your computer.
  • Let’s switch over to RStudio!

Take a Quick Look at the Data

  • head() returns the first six rows of the data set.
head(deck)

Take a Quick Look at the Data

  • tail() returns the last six rows of the data set.
tail(deck)

Loading Data Using RStudio Interface

  • Find a dataset on Kaggle that you’re interested in.
  • Download it to your computer and load it into RStudio.

Using read.csv() Function

  • read.csv() can be used to read a csv file.
  • Use ?read.csv to look at the documentation of the function.

Using read.csv() Function

deck <- read.csv("~/Desktop/deck.txt")
deck

Using read.csv() Function

  • We can also read a .csv file directly from the web.
gpa <- read.csv("https://raw.githubusercontent.com/wadefagen/datasets/master/gpa/uiuc-gpa-dataset.csv")
head(gpa)

Using read.csv() Function

  • Let’s try this to use read.csv() function to read the dataset you previously downloaded.

Saving Data

  • write.csv() saves a data frame to a new .csv file.
write.csv(deck, file = "cards.csv", row.names = FALSE)

Comparing Data Frame

Selecting Values with Positive Numbers

  • To select a value at a specific location in a data frame, use [ , ]:
deck[1, 2]
## [1] spades
## Levels: clubs diamonds hearts spades
deck[1, c(1, 2, 3)]

Selecting Values with Positive Numbers

deck[10:14, 1:3]

Selecting Values with Positive Numbers

Selecting Values with Positive Numbers

  • If you select two or more columns from a data frame, R will return a new data frame.
deck[1:2, c(1, 2, 3)]

Selecting Values with Positive Numbers

  • If you select a single column, R will return a vector.
deck[1:2, 1]
## [1] king  queen
## Levels: ace eight five four jack king nine queen seven six ten three two

Selecting Values with Positive Numbers

  • If you prefer a data frame instead, you can add the optional argument drop = FALSE between the brackets.
deck[1:2, 1, drop = FALSE]

Selecting Values with Negative Numbers

  • When using negative numbers as indexes, R will return every element except the elements in a negative index.
deck[-1, 1:3]

Selecting Values with Negative Numbers

deck[-(2:52), 1:3]

Selecting Values with Blank Space

  • Use blank space to tell R to extract every value in a dimension.
deck[1, ]

Selecting Values with Logical Value

  • Supply a vector of TRUE and FALSE as your index, R will match each TRUE and FALSE to a row (or column) in your data frame.
  • R will then return the rows (or columns) that corresponds to a TRUE.

Selecting Values with Logical Value

deck[1, c(TRUE, TRUE, FALSE)]

Selecting Values with Logical Value

deck[c(TRUE, FALSE), ]

Selecting Values with Names

deck[1, c("face", "suit")]
deck[ , "value"]
##  [1] 13 12 11 10  9  8  7  6  5  4  3  2  1 13 12 11 10  9  8  7  6  5  4  3  2
## [26]  1 13 12 11 10  9  8  7  6  5  4  3  2  1 13 12 11 10  9  8  7  6  5  4  3
## [51]  2  1

Before We End

What is a data frame?

References