library(tidyverse)
library(DBI)
library(RMariaDB)
library(dbplyr)
library(mdsr)
Lab 1 - working with Databases
Today’s lab will provide practice working with the different tools we can use to implement SQL code. We haven’t covered much SQL syntax, so the focus will be more on the tools than on writing code. You should, however, be trying to understand the SQL code as you go along.
The goals for lab 1 include:
- connecting to GitHub / turning in the lab via GitHub.
- comparing / contrasting three different ways to implement SQL queries in RStudio.
- writing SQL queries in DBeaver.
Advice for turning in the assignment
render early and often. In fact, go ahead and render your .qmd file right now. Maybe set a timer so that you render every 5 minutes. Do not wait until you are done with the assignment to render
save the .Rproj file somewhere you can find it. Don’t keep everything in your downloads folder. Maybe make a folder called
SDS261
or something. That folder could live on your Desktop. Or maybe in your Dropbox.
Assignment
Preliminary: You’ll need to create a quarto document. To do so click on: New file -> quarto document.
- The document should be saved in the R Project as
lab1-sds261-yourlastname-yourfirstname.qmd
. - The yaml at the top of the file should look like this (change the author and date fields):
---
title: 'Lab 1 - working with Databases'
author: 'your name goes here'
date: 'due 10am Tuesday, January 9, 2024'
execute:
echo: true
warning: false
message: false
---
For much of the assignment, Consider the task of figuring out how many flights came into or flew out of Bradley International Airport (BDL) last year.
- Set up a connection within an R chunk. See the README file.
- Pull in the
flights
table as atbl
(do not usecollect()
!). Use dplyr commands to calculate the number of flights that flew into or out of BDL in 2012. Then usedplyr::show_query()
to output the SQL query. (All done inside an R chunk.)
- Again, inside an R chunk, use
DBI::dbGetQuery()
to run the SQL code calculated in the previous question.
- Now, inside a SQL chunk, use the SQL commands directly (same task: how many flights in and out of BDL in 2012).
- Rerun all three scenarios, using the tictoc R package to see which method calculates the desired number most quickly. Below is an example of how to use tictoc. For the SQL chunk, you’ll need to place
tic()
in an R chunk before the SQL chunk andtoc()
in an R chunk after the SQL chunk. Write 1-2 sentences describing the results.
library(tictoc)
tic()
<- rnorm(1000000)
rand_numbers quantile(rand_numbers)
0% 25% 50% 75% 100%
-4.7441954424 -0.6750814407 0.0006767307 0.6734974364 5.6445201338
toc()
0.097 sec elapsed
- Using R on the
tbl
.
- Using
dbGetQuery()
.
- Using SQL chunk.
- Now create a
tibble
(not atbl
) that includes only flights in 2012 either to or from BDL (usecollect()
). After you have created thetibble
, summarize the observations to get the count. Use tictoc to evaluate how long the process takes. Write 1-2 sentences describing the results.
- DBeaver: open DBeaver from your applications. Following the instructions in the DBeaver online notes, set up a connection to the mdsr SQL server. Create a file called
lab1-sds261-yourlastname-yourfirstname.sql
and SAVE THE FILE in the lab1 R Project folder. Run the same SQL code as above. Did you get the same result? Were there any adjustments to the SQL code from above? Explain in a few sentences.
- Reflect on at least 3 things that you learned during this lab. It could be about technical aspects. It could be about understanding databases. It could be about the difference between R and SQL. Anything you learned while doing the lab.
- Always a good idea to terminate the SQL connection when you are done with it.
dbDisconnect(con_air, shutdown = TRUE)
- render - commit - push to GitHub. You should push three files, those with extensions .qmd, .html, and .sql. In a browser, look at your GitHub website to make sure that all three files were successfully pushed to the correct repo (yours for lab1).