## Review

• Which data structures have we learned so far?
• Vector:
• Atomic Vector
• List
• Matrix
• Array

## Data Frames

• Data frames are the two-dimensional version of a list.
• Store data in a table format, like an Excel spreadsheet.

## Data Frames

faithful

## Data Frames

• Data frames group vectors together into a two-dimensional table.
• Each vector becomes a column in the table.
• Hence, every column must be of the same length.
• Each column of a data frame can contain a different type of data.
• But within a column, every cell must be of the same type of data.

## Data Frames

A data frame is a list of vectors of the same length.

## Creating a Data Frame

• Use data.frame() function to creata a data frame by hand:
cards <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"),
value = c(1, 2, 3))
cards

## Creating a Data Frame

• The str() function show what types of objects are grouped together in the data frame.
str(cards)
## 'data.frame':    3 obs. of  3 variables:
##  $face : Factor w/ 3 levels "ace","six","two": 1 3 2 ##$ suit : Factor w/ 1 level "clubs": 1 1 1
##  \$ value: num  1 2 3

• Letâ€™s switch over to RStudio!

## Take a Quick Look at the Data

• head() returns the first six rows of the data set.
head(deck)

## Take a Quick Look at the Data

• tail() returns the last six rows of the data set.
tail(deck)

• Find a dataset on Kaggle that youâ€™re interested in.

## Using read.csv() Function

• read.csv() can be used to read a csv file.
• Use ?read.csv to look at the documentation of the function.

## Using read.csv() Function

deck <- read.csv("~/Desktop/deck.txt")
deck

## Using read.csv() Function

• We can also read a .csv file directly from the web.
gpa <- read.csv("https://raw.githubusercontent.com/wadefagen/datasets/master/gpa/uiuc-gpa-dataset.csv")
head(gpa)

## Using read.csv() Function

• Letâ€™s try this to use read.csv() function to read the dataset you previously downloaded.

## Saving Data

• write.csv() saves a data frame to a new .csv file.
write.csv(deck, file = "cards.csv", row.names = FALSE)

## Selecting Values with Positive Numbers

• To select a value at a specific location in a data frame, use [ , ]:
deck[1, 2]
## [1] spades
## Levels: clubs diamonds hearts spades
deck[1, c(1, 2, 3)]

## Selecting Values with Positive Numbers

deck[10:14, 1:3]

## Selecting Values with Positive Numbers

• If you select two or more columns from a data frame, R will return a new data frame.
deck[1:2, c(1, 2, 3)]

## Selecting Values with Positive Numbers

• If you select a single column, R will return a vector.
deck[1:2, 1]
## [1] king  queen
## Levels: ace eight five four jack king nine queen seven six ten three two

## Selecting Values with Positive Numbers

• If you prefer a data frame instead, you can add the optional argument drop = FALSE between the brackets.
deck[1:2, 1, drop = FALSE]

## Selecting Values with Negative Numbers

• When using negative numbers as indexes, R will return every element except the elements in a negative index.
deck[-1, 1:3]

## Selecting Values with Negative Numbers

deck[-(2:52), 1:3]

## Selecting Values with Blank Space

• Use blank space to tell R to extract every value in a dimension.
deck[1, ]

## Selecting Values with Logical Value

• Supply a vector of TRUE and FALSE as your index, R will match each TRUE and FALSE to a row (or column) in your data frame.
• R will then return the rows (or columns) that corresponds to a TRUE.

## Selecting Values with Logical Value

deck[1, c(TRUE, TRUE, FALSE)]

## Selecting Values with Logical Value

deck[c(TRUE, FALSE), ]

## Selecting Values with Names

deck[1, c("face", "suit")]
deck[ , "value"]
##  [1] 13 12 11 10  9  8  7  6  5  4  3  2  1 13 12 11 10  9  8  7  6  5  4  3  2
## [26]  1 13 12 11 10  9  8  7  6  5  4  3  2  1 13 12 11 10  9  8  7  6  5  4  3
## [51]  2  1

## Before We End

What is a data frame?