Lab 6 - editing databases

Author

your name goes here

Published

Invalid Date

Today’s lab will provide practice working with editing SQL databases using DuckDB.

The goals for lab 6 include:

Advice for turning in the assignment

  • Be sure to indicate (in the .qmd file) which problem is being answered with which code. A sentence or two with each response goes a long way toward your understanding!

  • save the .Rproj file somewhere you can find it. Don’t keep everything in your downloads folder. Maybe make a folder called SDS261 or something. That folder could live on your Desktop. Or maybe in your Dropbox.

  • The .qmd document should be saved in the R Project as lab6-sds261-yourlastname-yourfirstname.qmd.

  • Set up a connection to a database of your naming using DuckDB.

con_college <- DBI::dbConnect(duckdb::duckdb(),
                             dbdir = "samenameaslab5")

Assignment

The data we will work with today is on college tuition, pay, and diversity. The original data source is the US Department of Education, but it has been compiled as part of TidyTuesday on March 10, 2020. Tuition and fees are for 2018-19. Diversity is for 2014.

  1. Use the same database from Lab 5. If the database somehow got deleted or corrupted, go back and re-run your code from Lab 5.
  1. Use UPDATE to change the NA values to NULL (for room_and_board in the tuition table and make_world_better_percent in the salary table). (Originally I had a follow up task to convert the data type from VARCHAR to DOUBLE as well. However, DuckDB is particularly finicky with those conversions, and so we aren’t going to do it. If/when you work on a SQL server, you might adjust the data type using something like: ALTER TABLE tuition MODIFY room_and_board DOUBLE;.)1
  1. Using INSERT INTO add records to the diversity table. Add ‘Barden University’ in ‘Louisiana’ that has a total enrollment of 10,000 students, half of whom are female. Add an additional value to indicate, if the Bellas are representative of the student population, that Barden University has 1667 Black students.
  1. Check to see that the values inserted into the diversity table are correct. In order to tell, use a SELECT clause where you only look at schools from ‘Louisiana’ and sort by the name of the school.
  1. Using the diversty table, DELETE all the records that don’t record race or ethnicity. That is, delete the records that have a category of ‘Total Minority’, ‘Women’, or ‘Non-Resident Foreign’. Again, use a SELECT command to confirm that your code worked.
  1. Use ALTER TABLE to change the variable type of make_world_better_percent in the salary table from VARCHAR to DOUBLE. Follow up by running DESCRIBE on the salary table to make sure your change worked. (Note: for some reason, the same code doesn’t seem to work on the tuition table with room_and_board. It says that tuition depends on something. You are welcome to try to make it work on your own computers!)
  1. Create a temporary table (e.g., to use in a follow up SELECT query) that contains only the ‘Private’ universities from the tuition table.
  1. Create a temporary table consisting of the JOIN of all three tables but including only the Smith College data.

Disconnect from the server

dbDisconnect(con_college, shutdown = TRUE)

Footnotes

  1. To change the data type only during a query, use SELECT CAST(room_and_board AS DOUBLE) FROM tuition.↩︎

Reuse