Today’s lab will provide practice working with SQL clauses in DBeaver. Don’t forget the importance of the order of the SQL clauses.
The goals for lab 3 include:
additional working with clauses in SQL.
understanding each of SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT.
practicing running SQL queries in a SQL client, DBeaver.
JOINing tables
Advice for turning in the assignment
Be sure to indicate (in the .sql file) which problem is being answered with which SQL code. Use the following syntax to comment within a .sql file: /* here is where comments go */. Indeed, feel free to copy the question into the .sql file so that you have it for your own records.
save the .Rproj file somewhere you can find it. Don’t keep everything in your downloads folder. Maybe make a folder called SDS261 or something. That folder could live on your Desktop. Or maybe in your Dropbox.
The SQL document should be saved in the R Project as lab3-sds261-yourlastname-yourfirstname.sql. You will have to navigate to the R Project to save the DBeaver file in the correct place.
Connect to the yelp database, which contains the businesses, reviews, and users tables. See README file for connection details.
SHOW TABLES;
3 records
Tables_in_yelp
businesses
reviews
users
Assignment
For each of the three tables in the yelp database, identify the number of records and the variables. (That means, write at least a sentence for each table.)
Solution
SELECT*FROM businesses LIMIT0, 5;
5 records
id
name
stars
open
review_count
city
longitude
latitude
rncjoVoEFUJGCUoC1JgnUA
Peoria Income Tax Service
5
TRUE
3
Peoria
-112.2416
33.58187
0FNFSzCFP_rGUoJx8W7tJg
Bike Doctor
5
TRUE
5
Phoenix
-112.1059
33.60405
3f_lyB6vFK48ukH6ScvLHg
Valley Permaculture Alliance
5
TRUE
4
Phoenix
-112.0739
33.46053
usAsSV36QmUej8–yvN-dg
Food City
4
TRUE
5
Phoenix
-112.0854
33.39221
PzOqRohWw7F7YEPBz6AubA
Hot Bagels & Deli
4
TRUE
14
Glendale Az
-112.2003
33.71280
Solution
SELECT*FROM reviews LIMIT0, 5;
5 records
business_id
user_id
stars
9yKzy9PApeiPPOUJEtnvkg
rLtl8ZkDX5vH5nAx9C3q5Q
5
ZRJwVLyzEJq1VAihDhYiow
0a2KyEL0d3Yb1V6aivbIuQ
5
6oRAC4uyJCsJl1X0WZpVSA
0hT2KtfLiobPvh6cDC8JQg
4
_1QQZuf4zZOyFCvXc0o6Vg
uZetl9T0NcROGOyFfughhg
5
6ozycU1RpktNG2-1BroVtw
vYmM4KTsC8ZfQBg-j5MWkw
5
Solution
SELECT*FROM users LIMIT0, 5;
5 records
user_id
name
average_stars
review_count
CR2y7yEm4X035ZMzrTtN9Q
Jim
5
6
_9GXoHhdxc30ujPaQwh6Ew
Kelle
1
2
8mM-nqxjg6pT04kwcjMbsw
Stephanie
5
2
Ch6CdTR2IVaVANr-RglMOg
T
5
2
NZrLmHRyiHmyT1JrfzkCOA
Beth
1
1
Solution
SELECTCOUNT(*) FROM businesses;
1 records
COUNT(*)
11537
Solution
SELECTCOUNT(*) FROM reviews;
1 records
COUNT(*)
229907
Solution
SELECTCOUNT(*) FROM users;
1 records
COUNT(*)
43873
Solution
businesses: 11,537 records with variables of id, name, stars, open, review_count, city, longitude, latitude
reviews: 229,907 records with variables of business_id, user_id, stars
users: 43,873 records with variables of user_id, name, average_stars, review_count
Find the user with the most reviews. What is the person’s name, and how many reviews did they make?
Kim has the most reviews (5,807 reviews!) with an average rating of 3.77 stars.
What cities are represented in the businesses table? Find out by querying the number of businesses per city. Use LIMIT to start, just in case there are hundreds of cities. Write down a few of the city names. Do you notice anything interesting?
There appear to be cities in Phoenix as well as in Pheonix (??). And a few of the city names include the AZ to indicate that they are in Arizona. Before really working with the data, we would want to clean up the city names.
Find the businesses in “Carefree” (a city in Arizona). Write down the names of a few of the businesses.
Solution
SELECTid, name, stars, open, review_count, cityFROM businessesWHERE city ="Carefree"LIMIT0, 50;
Displaying records 1 - 10
id
name
stars
open
review_count
city
K0v87iFgB3sXSzw6e-BJ5w
AZ Wine Company
5
TRUE
8
Carefree
nZ0mUQEdez-CHrykD2nNLA
Carefree Resort & Conference Center
3
TRUE
30
Carefree
c4FLMLP7hMnAbSiW2o_o3A
Saguaro Grille
3
FALSE
5
Carefree
QygMZmPO8A6OaJYO2a0pnw
Studio C Hair Salon
5
TRUE
3
Carefree
Z85BGr-jnEWvoI4xAnoh3g
Lowe’s Home Improvement Warehouse of Phoenix
4
TRUE
6
Carefree
SahvCnFp3OvUVZIAYCyckw
34 Easy St
4
FALSE
3
Carefree
jdStuaC_1leN_DNGcQ0yEw
Latilla
5
FALSE
5
Carefree
AwUMl0PT3mdkPS7huzEUYQ
Balloon Festival
1
TRUE
4
Carefree
MHPXDXVE_cUYodfPhkadeQ
Black Mountain Coffee Shop
4
TRUE
11
Carefree
OvlAlAkiyyCILkq-TylBWg
The Sundial Cafe
3
TRUE
7
Carefree
Count the number of records resulting when the query connects the users with the reviews using a JOIN. How many records are there? What does that tell you about the two tables?
Solution
SELECTCOUNT(*)FROM usersJOIN reviews ON users.user_id = reviews.user_idLIMIT0, 100;
1 records
COUNT(*)
215879
Solution
Because there are fewer records in the join, we know that some reviews were done by users that are not in the users table.
How many users wrote reviews in the reviews table but do not exist in the users table. Hint: use a RIGHT JOIN to keep all the reviews, and then look for the rows where the user_id from the users table IS NULL.
Solution
SELECTCOUNT(DISTINCT reviews.user_id)FROM usersRIGHTJOIN reviews ON users.user_id = reviews.user_idWHERE users.user_id ISNULL;
1 records
COUNT(DISTINCT reviews.user_id)
2108
Solution
There are 2,108 users who wrote 14,028 reviews and who are not in the users table.
Write a query to ask a question that you think is interesting and uses at least one join. Provide both the question (in words) as well as the SQL code and results.