Lab 1 - working with Databases

Author

your name goes here

Published

January 8, 2024

Today’s lab will provide practice working with the different tools we can use to implement SQL code. We haven’t covered much SQL syntax, so the focus will be more on the tools than on writing code. You should, however, be trying to understand the SQL code as you go along.

The goals for lab 1 include:

Advice for turning in the assignment

  • render early and often. In fact, go ahead and render your .qmd file right now. Maybe set a timer so that you render every 5 minutes. Do not wait until you are done with the assignment to render

  • save the .Rproj file somewhere you can find it. Don’t keep everything in your downloads folder. Maybe make a folder called SDS261 or something. That folder could live on your Desktop. Or maybe in your Dropbox.

Assignment

Preliminary: You’ll need to create a quarto document. To do so click on: New file -> quarto document.

  • The document should be saved in the R Project as lab1-sds261-yourlastname-yourfirstname.qmd.
  • The yaml at the top of the file should look like this (change the author and date fields):
---
title: 'Lab 1 - working with Databases'
author: 'your name goes here'
date: 'due 10am Tuesday, January 9, 2024'
execute:
  echo: true
  warning: false
  message: false
---

For much of the assignment, Consider the task of figuring out how many flights came into or flew out of Bradley International Airport (BDL) last year.

  1. Set up a connection within an R chunk. See the README file.
  1. Pull in the flights table as a tbl (do not use collect()!). Use dplyr commands to calculate the number of flights that flew into or out of BDL in 2012. Then use dplyr::show_query() to output the SQL query. (All done inside an R chunk.)
  1. Again, inside an R chunk, use DBI::dbGetQuery() to run the SQL code calculated in the previous question.
  1. Now, inside a SQL chunk, use the SQL commands directly (same task: how many flights in and out of BDL in 2012).
  1. Rerun all three scenarios, using the tictoc R package to see which method calculates the desired number most quickly. Below is an example of how to use tictoc. For the SQL chunk, you’ll need to place tic() in an R chunk before the SQL chunk and toc() in an R chunk after the SQL chunk. Write 1-2 sentences describing the results.
library(tictoc)

tic()
rand_numbers <- rnorm(1000000)
quantile(rand_numbers)
           0%           25%           50%           75%          100% 
-4.7441954424 -0.6750814407  0.0006767307  0.6734974364  5.6445201338 
toc()
0.097 sec elapsed
  • Using R on the tbl.
  • Using dbGetQuery().
  • Using SQL chunk.
  1. Now create a tibble (not a tbl) that includes only flights in 2012 either to or from BDL (use collect()). After you have created the tibble, summarize the observations to get the count. Use tictoc to evaluate how long the process takes. Write 1-2 sentences describing the results.
  1. DBeaver: open DBeaver from your applications. Following the instructions in the DBeaver online notes, set up a connection to the mdsr SQL server. Create a file called lab1-sds261-yourlastname-yourfirstname.sql and SAVE THE FILE in the lab1 R Project folder. Run the same SQL code as above. Did you get the same result? Were there any adjustments to the SQL code from above? Explain in a few sentences.
  1. Reflect on at least 3 things that you learned during this lab. It could be about technical aspects. It could be about understanding databases. It could be about the difference between R and SQL. Anything you learned while doing the lab.
  1. Always a good idea to terminate the SQL connection when you are done with it.
dbDisconnect(con_air, shutdown = TRUE)
  1. render - commit - push to GitHub. You should push three files, those with extensions .qmd, .html, and .sql. In a browser, look at your GitHub website to make sure that all three files were successfully pushed to the correct repo (yours for lab1).