STAT 385 Lab 02

Data Frames and Modifying Values

Ha Khanh Nguyen

We will continue working with the deck of cards example we have seen in lectures. First, let’s load the dataset to RStudio.

Exercise 1: King of Hearts

Load data

Modifying the deck

  • We will modify the deck for a game called “King of Hearts” where:
    • All kings get the highest value, 14 in the deck.
    • All cards in the hearts suit get the value of 10 (except the king of hearts, its value should be 14).

Use the following code to check your new deck:

##     face   suit value
## 40  king hearts    14
## 41 queen hearts    10
## 42  jack hearts    10
## 43   ten hearts    10
## 44  nine hearts    10
## 45 eight hearts    10
## 46 seven hearts    10
## 47   six hearts    10
## 48  five hearts    10
## 49  four hearts    10
## 50 three hearts    10
## 51   two hearts    10
## 52   ace hearts    10
##    face     suit value
## 1  king   spades    14
## 14 king    clubs    14
## 27 king diamonds    14
## 40 king   hearts    14

Adding the jackpot column

  • Now, create a new column called jackpot in the data frame deck.
    • jackpot takes value of TRUE and FALSE.
    • For the king of hearts, the value of jackpot is TRUE.
    • For all other rows, the value of jackpot is FALSE.

Your deck should look like this after modifying the value.

##     face     suit value jackpot
## 1   king   spades    14   FALSE
## 2  queen   spades    12   FALSE
## 3   jack   spades    11   FALSE
## 4    ten   spades    10   FALSE
## 5   nine   spades     9   FALSE
## 6  eight   spades     8   FALSE
## 7  seven   spades     7   FALSE
## 8    six   spades     6   FALSE
## 9   five   spades     5   FALSE
## 10  four   spades     4   FALSE
## 11 three   spades     3   FALSE
## 12   two   spades     2   FALSE
## 13   ace   spades     1   FALSE
## 14  king    clubs    14   FALSE
## 15 queen    clubs    12   FALSE
## 16  jack    clubs    11   FALSE
## 17   ten    clubs    10   FALSE
## 18  nine    clubs     9   FALSE
## 19 eight    clubs     8   FALSE
## 20 seven    clubs     7   FALSE
## 21   six    clubs     6   FALSE
## 22  five    clubs     5   FALSE
## 23  four    clubs     4   FALSE
## 24 three    clubs     3   FALSE
## 25   two    clubs     2   FALSE
## 26   ace    clubs     1   FALSE
## 27  king diamonds    14   FALSE
## 28 queen diamonds    12   FALSE
## 29  jack diamonds    11   FALSE
## 30   ten diamonds    10   FALSE
## 31  nine diamonds     9   FALSE
## 32 eight diamonds     8   FALSE
## 33 seven diamonds     7   FALSE
## 34   six diamonds     6   FALSE
## 35  five diamonds     5   FALSE
## 36  four diamonds     4   FALSE
## 37 three diamonds     3   FALSE
## 38   two diamonds     2   FALSE
## 39   ace diamonds     1   FALSE
## 40  king   hearts    14    TRUE
## 41 queen   hearts    10   FALSE
## 42  jack   hearts    10   FALSE
## 43   ten   hearts    10   FALSE
## 44  nine   hearts    10   FALSE
## 45 eight   hearts    10   FALSE
## 46 seven   hearts    10   FALSE
## 47   six   hearts    10   FALSE
## 48  five   hearts    10   FALSE
## 49  four   hearts    10   FALSE
## 50 three   hearts    10   FALSE
## 51   two   hearts    10   FALSE
## 52   ace   hearts    10   FALSE

Function draw()

  • Write a function draw() that randomly draws 4 cards from the deck WITHOUT replacement and compute the sum of the 4 cards that were drawn.
    • Input: no input provided.
    • Output: a number representing the sum of values of the 4 cards drawn.

Use the following code to test your function:

## [1] 22
## [1] 36
## [1] 29
## [1] 31

Function win_jackpot()

  • This is a strange game where a person can win a jackpot through 3 possible ways:
    • Getting a king of hearts in the 4 cards drawn.
    • OR getting 4 cards from the same suit.
    • OR getting 4 cards with the same face.
  • Write a function win_jackpot() that:
    • randomly draws 4 cards from the deck WITHOUT replacement
    • print out the 4 cards using the print() function
    • and returns TRUE if the 4 cards drawn satisfying one of the above conditions, FALSE otherwise.
  • Hint: You might want to take a look at the unique() function to test the 2nd and 3rd conditions.
##     face   suit value jackpot
## 6  eight spades     8   FALSE
## 26   ace  clubs     1   FALSE
## 23  four  clubs     4   FALSE
## 18  nine  clubs     9   FALSE
## [1] FALSE
##     face   suit value jackpot
## 10  four spades     4   FALSE
## 6  eight spades     8   FALSE
## 8    six spades     6   FALSE
## 1   king spades    14   FALSE
## [1] TRUE
##    face   suit value jackpot
## 22 five  clubs     5   FALSE
## 1  king spades    14   FALSE
## 48 five hearts    10   FALSE
## 40 king hearts    14    TRUE
## [1] TRUE
##    face     suit value jackpot
## 13  ace   spades     1   FALSE
## 52  ace   hearts    10   FALSE
## 39  ace diamonds     1   FALSE
## 26  ace    clubs     1   FALSE
## [1] TRUE

Running simulation on win_jackpot()

Now, we want to estimate the probability of winning the jackpot in “King of Hearts”. How do we do that? We can do so using simulation studies, that is running the win_jackpot() function many many times and record the number of times we win. In this example, the number of simulations is 10000.

<!!!>BEWARE<!!!>: You need to change your win_jackpot() function before progressing to the next part.

  • First, write a new function called win_jackpot2():
    • Copy the code from your original win_jackpot() function.
    • Remove the print() statement from the code.
    • Test your win_jackpot2() function:
## [1] TRUE
  • Use the following code to run the simulation studies:
    • Make sure you copy and paste! If you retype the code, remember win_jackpot2()!!! Missing the 2 will cost you heavily!
  • Now, use results to compute the estimated probability of winning a jackpot. Your answer should be 0.0886.

Exercise 2: 2015 Flight Delays and Cancellations Data

Next, we take a quick look at the 2015 Flight Delays and Cancellations Data provided by the U.S. Department of Transportation. This is a huge dataset availalbe on Kaggle. But for us, we will only take a look at flights flying out from O’Hare International Airport (ORD) in January, 2015.

Load data

  • I have filtered out the data specific to O’Hare and stored it in ohare_jan.csv. You can TRY load the data from the following URL: https://nkha149.github.io/stat385-sp2020/files/data/ohare_jan.csv.
    • You probably will get an error!
    • What do you do?
    • Try downloading the file and load the data through the RStudio interface instead!
  • Save the data into a data frame named flights.

Counting

  • Count how many flights departed from ORD in January, 2015.
    • Your answer should be 23484.
  • Count how many flights departed from ORD that were American Airlines flights (AA) in January, 2015.
    • Your answer should be 3899.
  • Look at function table(). Use it to get a summary of the number of flights flying out from ORD for each airline in January, 2015.
    • Your result should look as followed:
## 
##   AA   AS   B6   DL   EV   F9   MQ   NK   OO   UA   US   VX 
## 3899  100  170  569 3767  283 5655  767 3181 4383  634   76