# STAT 385 Lab 02

## Data Frames and Modifying Values

We will continue working with the deck of cards example we have seen in lectures. First, let’s load the dataset to RStudio.

## Exercise 1: King of Hearts

### Modifying the deck

• We will modify the deck for a game called “King of Hearts” where:
• All kings get the highest value, 14 in the deck.
• All cards in the hearts suit get the value of 10 (except the king of hearts, its value should be 14).

Use the following code to check your new deck:

deck[deck$suit == "hearts", ] ## face suit value ## 40 king hearts 14 ## 41 queen hearts 10 ## 42 jack hearts 10 ## 43 ten hearts 10 ## 44 nine hearts 10 ## 45 eight hearts 10 ## 46 seven hearts 10 ## 47 six hearts 10 ## 48 five hearts 10 ## 49 four hearts 10 ## 50 three hearts 10 ## 51 two hearts 10 ## 52 ace hearts 10 deck[deck$face == "king", ]
##    face     suit value
## 1  king   spades    14
## 14 king    clubs    14
## 27 king diamonds    14
## 40 king   hearts    14

### Adding the jackpot column

• Now, create a new column called jackpot in the data frame deck.
• jackpot takes value of TRUE and FALSE.
• For the king of hearts, the value of jackpot is TRUE.
• For all other rows, the value of jackpot is FALSE.

Your deck should look like this after modifying the value.

deck
##     face     suit value jackpot
## 1   king   spades    14   FALSE
## 2  queen   spades    12   FALSE
## 3   jack   spades    11   FALSE
## 4    ten   spades    10   FALSE
## 5   nine   spades     9   FALSE
## 6  eight   spades     8   FALSE
## 7  seven   spades     7   FALSE
## 8    six   spades     6   FALSE
## 9   five   spades     5   FALSE
## 10  four   spades     4   FALSE
## 11 three   spades     3   FALSE
## 12   two   spades     2   FALSE
## 13   ace   spades     1   FALSE
## 14  king    clubs    14   FALSE
## 15 queen    clubs    12   FALSE
## 16  jack    clubs    11   FALSE
## 17   ten    clubs    10   FALSE
## 18  nine    clubs     9   FALSE
## 19 eight    clubs     8   FALSE
## 20 seven    clubs     7   FALSE
## 21   six    clubs     6   FALSE
## 22  five    clubs     5   FALSE
## 23  four    clubs     4   FALSE
## 24 three    clubs     3   FALSE
## 25   two    clubs     2   FALSE
## 26   ace    clubs     1   FALSE
## 27  king diamonds    14   FALSE
## 28 queen diamonds    12   FALSE
## 29  jack diamonds    11   FALSE
## 30   ten diamonds    10   FALSE
## 31  nine diamonds     9   FALSE
## 32 eight diamonds     8   FALSE
## 33 seven diamonds     7   FALSE
## 34   six diamonds     6   FALSE
## 35  five diamonds     5   FALSE
## 36  four diamonds     4   FALSE
## 37 three diamonds     3   FALSE
## 38   two diamonds     2   FALSE
## 39   ace diamonds     1   FALSE
## 40  king   hearts    14    TRUE
## 41 queen   hearts    10   FALSE
## 42  jack   hearts    10   FALSE
## 43   ten   hearts    10   FALSE
## 44  nine   hearts    10   FALSE
## 45 eight   hearts    10   FALSE
## 46 seven   hearts    10   FALSE
## 47   six   hearts    10   FALSE
## 48  five   hearts    10   FALSE
## 49  four   hearts    10   FALSE
## 50 three   hearts    10   FALSE
## 51   two   hearts    10   FALSE
## 52   ace   hearts    10   FALSE

### Function draw()

• Write a function draw() that randomly draws 4 cards from the deck WITHOUT replacement and compute the sum of the 4 cards that were drawn.
• Input: no input provided.
• Output: a number representing the sum of values of the 4 cards drawn.

Use the following code to test your function:

set.seed(385)
draw()
## [1] 22
set.seed(420)
draw()
## [1] 36
set.seed(400)
draw()
## [1] 29
set.seed(2020)
draw()
## [1] 31

### Function win_jackpot()

• This is a strange game where a person can win a jackpot through 3 possible ways:
• Getting a king of hearts in the 4 cards drawn.
• OR getting 4 cards from the same suit.
• OR getting 4 cards with the same face.
• Write a function win_jackpot() that:
• randomly draws 4 cards from the deck WITHOUT replacement
• print out the 4 cards using the print() function
• and returns TRUE if the 4 cards drawn satisfying one of the above conditions, FALSE otherwise.
• Hint: You might want to take a look at the unique() function to test the 2nd and 3rd conditions.
set.seed(385)
win_jackpot()
##     face   suit value jackpot
## 6  eight spades     8   FALSE
## 26   ace  clubs     1   FALSE
## 23  four  clubs     4   FALSE
## 18  nine  clubs     9   FALSE
## [1] FALSE
set.seed(35)
win_jackpot()
##     face   suit value jackpot
## 10  four spades     4   FALSE
## 6  eight spades     8   FALSE
## 8    six spades     6   FALSE
## 1   king spades    14   FALSE
## [1] TRUE
set.seed(174)
win_jackpot()
##    face   suit value jackpot
## 22 five  clubs     5   FALSE
## 1  king spades    14   FALSE
## 48 five hearts    10   FALSE
## 40 king hearts    14    TRUE
## [1] TRUE
set.seed(2260)
win_jackpot()
##    face     suit value jackpot
## 13  ace   spades     1   FALSE
## 52  ace   hearts    10   FALSE
## 39  ace diamonds     1   FALSE
## 26  ace    clubs     1   FALSE
## [1] TRUE

### Running simulation on win_jackpot()

Now, we want to estimate the probability of winning the jackpot in “King of Hearts”. How do we do that? We can do so using simulation studies, that is running the win_jackpot() function many many times and record the number of times we win. In this example, the number of simulations is 10000.

<!!!>BEWARE<!!!>: You need to change your win_jackpot() function before progressing to the next part.

• First, write a new function called win_jackpot2():
• Copy the code from your original win_jackpot() function.
• Remove the print() statement from the code.
• Test your win_jackpot2() function:
set.seed(174)
win_jackpot2()
## [1] TRUE
• Use the following code to run the simulation studies:
• Make sure you copy and paste! If you retype the code, remember win_jackpot2()!!! Missing the 2 will cost you heavily!
results <- replicate(win_jackpot2(), n = 5000)
• Now, use results to compute the estimated probability of winning a jackpot. Your answer should be 0.0886.

## Exercise 2: 2015 Flight Delays and Cancellations Data

Next, we take a quick look at the 2015 Flight Delays and Cancellations Data provided by the U.S. Department of Transportation. This is a huge dataset availalbe on Kaggle. But for us, we will only take a look at flights flying out from O’Hare International Airport (ORD) in January, 2015.

### Load data

• I have filtered out the data specific to O’Hare and stored it in ohare_jan.csv. You can TRY load the data from the following URL: https://nkha149.github.io/stat385-sp2020/files/data/ohare_jan.csv.
• You probably will get an error!
• What do you do?
• Try downloading the file and load the data through the RStudio interface instead!
• Save the data into a data frame named flights.

### Counting

• Count how many flights departed from ORD in January, 2015.
• Your answer should be 23484.
• Count how many flights departed from ORD that were American Airlines flights (AA) in January, 2015.
• Your answer should be 3899.
• Look at function table(). Use it to get a summary of the number of flights flying out from ORD for each airline in January, 2015.
• Your result should look as followed:
##
##   AA   AS   B6   DL   EV   F9   MQ   NK   OO   UA   US   VX
## 3899  100  170  569 3767  283 5655  767 3181 4383  634   76