Lab 6 - editing databases

Author

your name goes here

Published

Invalid Date

library(tidyverse)
library(DBI)
library(RMariaDB)
library(dbplyr)
library(mdsr)

Today’s lab will provide practice working with editing SQL databases using DuckDB.

The goals for lab 6 include:

Advice for turning in the assignment

  • Be sure to indicate (in the .qmd file) which problem is being answered with which code. A sentence or two with each response goes a long way toward your understanding!

  • save the .Rproj file somewhere you can find it. Don’t keep everything in your downloads folder. Maybe make a folder called SDS261 or something. That folder could live on your Desktop. Or maybe in your Dropbox.

  • The .qmd document should be saved in the R Project as lab6-sds261-yourlastname-yourfirstname.qmd.

  • Set up a connection to a database of your naming using DuckDB.

con_college <- DBI::dbConnect(duckdb::duckdb(),
                             dbdir = "samenameaslab5")

Assignment

The data we will work with today is on college tuition, pay, and diversity. The original data source is the US Department of Education, but it has been compiled as part of TidyTuesday on March 10, 2020. Tuition and fees are for 2018-19. Diversity is for 2014.

  1. Use the same database from Lab 5. If the database somehow got deleted or corrupted, go back and re-run your code from Lab 5.
con_college <- DBI::dbConnect(duckdb::duckdb(),
                             dbdir = "collegeDB")
  1. Use UPDATE to change the NA values to NULL (for room_and_board in the tuition table and make_world_better_percent in the salary table). (Originally I had a follow up task to convert the data type from VARCHAR to DOUBLE as well. However, DuckDB is particularly finicky with those conversions, and so we aren’t going to do it. If/when you work on a SQL server, you might adjust the data type using something like: ALTER TABLE tuition MODIFY room_and_board DOUBLE;.)1
SELECT COUNT(*) AS number_null
FROM tuition
WHERE room_and_board IS NULL;
1 records
number_null
1094
UPDATE tuition
SET room_and_board = NULL
WHERE room_and_board = 'NA'
SELECT COUNT(*) AS number_null
FROM tuition
WHERE room_and_board IS NULL;
1 records
number_null
1094
SELECT * FROM tuition LIMIT 5;
5 records
name state state_code type degree_length room_and_board in_state_tuition in_state_total out_of_state_tuition out_of_state_total
Aaniiih Nakoda College Montana MT Public 2 Year NA 2380 2380 2380 2380
Abilene Christian University Texas TX Private 4 Year 10350 34850 45200 34850 45200
Abraham Baldwin Agricultural College Georgia GA Public 2 Year 8474 4128 12602 12550 21024
Academy College Minnesota MN For Profit 2 Year NA 17661 17661 17661 17661
Academy of Art University California CA For Profit 4 Year 16648 27810 44458 27810 44458
SELECT COUNT(*) AS number_null
FROM salary
WHERE make_world_better_percent IS NULL;
1 records
number_null
33
UPDATE salary
SET make_world_better_percent = NULL
WHERE make_world_better_percent = 'NA'
SELECT COUNT(*) AS number_null
FROM salary
WHERE make_world_better_percent IS NULL;
1 records
number_null
33
SELECT * FROM salary LIMIT 5;
5 records
rank name state_name early_career_pay mid_career_pay make_world_better_percent stem_percent
1 Auburn University Alabama 54400 104500 51 31
2 University of Alabama in Huntsville Alabama 57500 103900 59 45
3 The University of Alabama Alabama 52300 97400 50 15
4 Tuskegee University Alabama 54500 93500 61 30
5 Samford University Alabama 48400 90500 52 3
  1. Using INSERT INTO add records to the diversity table. Add ‘Barden University’ in ‘Louisiana’ that has a total enrollment of 10,000 students, half of whom are female. Add an additional value to indicate, if the Bellas are representative of the student population, that Barden University has 1667 Black students.
INSERT INTO diversity (name, total_enrollment, state, category, enrollment)
   VALUES('Barden University', 10000, 'Louisiana', 'Women', 5000),
         ('Barden University', 10000, 'Louisiana', 'Black', 1667);
  1. Check to see that the values inserted into the diversity table are correct. In order to tell, use a SELECT clause where you only look at schools from ‘Louisiana’ and sort by the name of the school.
SELECT * FROM diversity
WHERE state = 'Louisiana'
ORDER BY name
LIMIT 50;
Displaying records 1 - 10
name total_enrollment state category enrollment
Barden University 10000 Louisiana Black 1667
Barden University 10000 Louisiana Women 5000
Barden University 10000 Louisiana Black 1667
Barden University 10000 Louisiana Black 1667
Barden University 10000 Louisiana Black 1667
Baton Rouge Community College 7740 Louisiana Unknown 633
Baton Rouge Community College 7740 Louisiana American Indian / Alaska Native 24
Baton Rouge Community College 7740 Louisiana Asian 149
Baton Rouge Community College 7740 Louisiana Black 3263
Baton Rouge Community College 7740 Louisiana Hispanic 241
  1. Using the diversty table, DELETE all the records that don’t record race or ethnicity. That is, delete the records that have a category of ‘Total Minority’, ‘Women’, or ‘Non-Resident Foreign’. Again, use a SELECT command to confirm that your code worked.
DELETE FROM diversity
WHERE category = 'Total Minority' OR category = 'Women' OR category = 'Non-Resident Foreign'
SELECT * FROM diversity
WHERE state = 'Louisiana'
ORDER BY name
LIMIT 50;
Displaying records 1 - 10
name total_enrollment state category enrollment
Barden University 10000 Louisiana Black 1667
Barden University 10000 Louisiana Black 1667
Barden University 10000 Louisiana Black 1667
Barden University 10000 Louisiana Black 1667
Baton Rouge Community College 7740 Louisiana Unknown 633
Baton Rouge Community College 7740 Louisiana American Indian / Alaska Native 24
Baton Rouge Community College 7740 Louisiana Asian 149
Baton Rouge Community College 7740 Louisiana Black 3263
Baton Rouge Community College 7740 Louisiana Hispanic 241
Baton Rouge Community College 7740 Louisiana Native Hawaiian / Pacific Islander 12
  1. Use ALTER TABLE to change the variable type of make_world_better_percent in the salary table from VARCHAR to DOUBLE. Follow up by running DESCRIBE on the salary table to make sure your change worked. (Note: for some reason, the same code doesn’t seem to work on the tuition table with room_and_board. It says that tuition depends on something. You are welcome to try to make it work on your own computers!)
ALTER TABLE salary ALTER make_world_better_percent TYPE DOUBLE;
DESCRIBE salary;
7 records
column_name column_type null key default extra
rank DOUBLE YES NA NA NA
name VARCHAR YES NA NA NA
state_name VARCHAR YES NA NA NA
early_career_pay DOUBLE YES NA NA NA
mid_career_pay DOUBLE YES NA NA NA
make_world_better_percent DOUBLE YES NA NA NA
stem_percent DOUBLE YES NA NA NA
  1. Create a temporary table (e.g., to use in a follow up SELECT query) that contains only the ‘Private’ universities from the tuition table.
CREATE TEMP TABLE tuition_private AS
    SELECT * FROM tuition WHERE type = 'Private';
SELECT * FROM tuition_private LIMIT 10;
Displaying records 1 - 10
name state state_code type degree_length room_and_board in_state_tuition in_state_total out_of_state_tuition out_of_state_total
Abilene Christian University Texas TX Private 4 Year 10350 34850 45200 34850 45200
Adelphi University New York NY Private 4 Year 16030 38660 54690 38660 54690
Adrian College Michigan MI Private 4 Year 11318 37087 48405 37087 48405
Adventist University of Health Sciences Florida FL Private 4 Year 4200 15150 19350 15150 19350
Agnes Scott College Georgia GA Private 4 Year 12330 41160 53490 41160 53490
Alaska Bible College Alaska AK Private 4 Year 5700 9300 15000 9300 15000
Alaska Pacific University Alaska AK Private 4 Year 7300 20830 28130 20830 28130
Albany College of Pharmacy and Health Sciences New York NY Private 4 Year 10920 35105 46025 35105 46025
Albertus Magnus College Connecticut CT Private 4 Year 13200 32060 45260 32060 45260
Albion College Michigan MI Private 4 Year 12380 45775 58155 45775 58155
  1. Create a temporary table consisting of the JOIN of all three tables but including only the Smith College data.
CREATE TEMP TABLE smith_data AS
SELECT tuition.name, category, enrollment, type, 
       degree_length, room_and_board,
       in_state_tuition, out_of_state_tuition, in_state_total,
       out_of_state_total, early_career_pay, mid_career_pay,
       make_world_better_percent, stem_percent,
       total_enrollment
FROM tuition
JOIN salary ON tuition.name = salary.name
JOIN diversity ON tuition.name = diversity.name
WHERE tuition.name = 'Smith College'
LIMIT 20;
SELECT * FROM smith_data;
8 records
name category enrollment type degree_length room_and_board in_state_tuition out_of_state_tuition in_state_total out_of_state_total early_career_pay mid_career_pay make_world_better_percent stem_percent total_enrollment
Smith College American Indian / Alaska Native 5 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989
Smith College Asian 348 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989
Smith College Black 149 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989
Smith College Hispanic 276 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989
Smith College Native Hawaiian / Pacific Islander 2 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989
Smith College White 1447 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989
Smith College Two Or More Races 128 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989
Smith College Unknown 246 Private 4 Year 17520 52404 52404 69924 69924 52700 98200 51 26 2989

Disconnect from the server

dbDisconnect(con_college, shutdown = TRUE)

Footnotes

  1. To change the data type only during a query, use SELECT CAST(room_and_board AS DOUBLE) FROM tuition.↩︎

Reuse