Today’s lab will provide practice working with editing SQL databases using DuckDB.
The goals for lab 6 include:
using UPDATE to change the data.
using INSERT INTO to add records to a table.
using DELETE to delete records.
using ALTER TABLE to change the table structure.
creating temporary tables to use in a follow up SELECT query.
Advice for turning in the assignment
Be sure to indicate (in the .qmd file) which problem is being answered with which code. A sentence or two with each response goes a long way toward your understanding!
save the .Rproj file somewhere you can find it. Don’t keep everything in your downloads folder. Maybe make a folder called SDS261 or something. That folder could live on your Desktop. Or maybe in your Dropbox.
The .qmd document should be saved in the R Project as lab6-sds261-yourlastname-yourfirstname.qmd.
Set up a connection to a database of your naming using DuckDB.
The data we will work with today is on college tuition, pay, and diversity. The original data source is the US Department of Education, but it has been compiled as part of TidyTuesday on March 10, 2020. Tuition and fees are for 2018-19. Diversity is for 2014.
Use the same database from Lab 5. If the database somehow got deleted or corrupted, go back and re-run your code from Lab 5.
Use UPDATE to change the NA values to NULL (for room_and_board in the tuition table and make_world_better_percent in the salary table). (Originally I had a follow up task to convert the data type from VARCHAR to DOUBLE as well. However, DuckDB is particularly finicky with those conversions, and so we aren’t going to do it. If/when you work on a SQL server, you might adjust the data type using something like: ALTER TABLE tuition MODIFY room_and_board DOUBLE;.)1
Solution
SELECTCOUNT(*) AS number_nullFROM tuitionWHERE room_and_board ISNULL;
SELECTCOUNT(*) AS number_nullFROM salaryWHERE make_world_better_percent ISNULL;
1 records
number_null
33
Solution
SELECT*FROM salary LIMIT5;
5 records
rank
name
state_name
early_career_pay
mid_career_pay
make_world_better_percent
stem_percent
1
Auburn University
Alabama
54400
104500
51
31
2
University of Alabama in Huntsville
Alabama
57500
103900
59
45
3
The University of Alabama
Alabama
52300
97400
50
15
4
Tuskegee University
Alabama
54500
93500
61
30
5
Samford University
Alabama
48400
90500
52
3
Using INSERT INTO add records to the diversity table. Add ‘Barden University’ in ‘Louisiana’ that has a total enrollment of 10,000 students, half of whom are female. Add an additional value to indicate, if the Bellas are representative of the student population, that Barden University has 1667 Black students.
Check to see that the values inserted into the diversity table are correct. In order to tell, use a SELECT clause where you only look at schools from ‘Louisiana’ and sort by the name of the school.
Solution
SELECT*FROM diversityWHERE state ='Louisiana'ORDERBY nameLIMIT50;
Displaying records 1 - 10
name
total_enrollment
state
category
enrollment
Barden University
10000
Louisiana
Black
1667
Barden University
10000
Louisiana
Women
5000
Barden University
10000
Louisiana
Black
1667
Barden University
10000
Louisiana
Black
1667
Barden University
10000
Louisiana
Black
1667
Baton Rouge Community College
7740
Louisiana
Unknown
633
Baton Rouge Community College
7740
Louisiana
American Indian / Alaska Native
24
Baton Rouge Community College
7740
Louisiana
Asian
149
Baton Rouge Community College
7740
Louisiana
Black
3263
Baton Rouge Community College
7740
Louisiana
Hispanic
241
Using the diversty table, DELETE all the records that don’t record race or ethnicity. That is, delete the records that have a category of ‘Total Minority’, ‘Women’, or ‘Non-Resident Foreign’. Again, use a SELECT command to confirm that your code worked.
SELECT*FROM diversityWHERE state ='Louisiana'ORDERBY nameLIMIT50;
Displaying records 1 - 10
name
total_enrollment
state
category
enrollment
Barden University
10000
Louisiana
Black
1667
Barden University
10000
Louisiana
Black
1667
Barden University
10000
Louisiana
Black
1667
Barden University
10000
Louisiana
Black
1667
Baton Rouge Community College
7740
Louisiana
Unknown
633
Baton Rouge Community College
7740
Louisiana
American Indian / Alaska Native
24
Baton Rouge Community College
7740
Louisiana
Asian
149
Baton Rouge Community College
7740
Louisiana
Black
3263
Baton Rouge Community College
7740
Louisiana
Hispanic
241
Baton Rouge Community College
7740
Louisiana
Native Hawaiian / Pacific Islander
12
Use ALTER TABLE to change the variable type of make_world_better_percent in the salary table from VARCHAR to DOUBLE. Follow up by running DESCRIBE on the salary table to make sure your change worked. (Note: for some reason, the same code doesn’t seem to work on the tuition table with room_and_board. It says that tuition depends on something. You are welcome to try to make it work on your own computers!)
Solution
ALTERTABLE salary ALTER make_world_better_percent TYPEDOUBLE;
Solution
DESCRIBE salary;
7 records
column_name
column_type
null
key
default
extra
rank
DOUBLE
YES
NA
NA
NA
name
VARCHAR
YES
NA
NA
NA
state_name
VARCHAR
YES
NA
NA
NA
early_career_pay
DOUBLE
YES
NA
NA
NA
mid_career_pay
DOUBLE
YES
NA
NA
NA
make_world_better_percent
DOUBLE
YES
NA
NA
NA
stem_percent
DOUBLE
YES
NA
NA
NA
Create a temporary table (e.g., to use in a follow up SELECT query) that contains only the ‘Private’ universities from the tuition table.