Lab 2 - SQL clauses

Author

your name goes here

Published

Invalid Date

Today’s lab will provide practice working with SQL clauses in DBeaver. Don’t forget the importance of the order of the SQL clauses.

The goals for lab 2 include:

Advice for turning in the assignment

  • Be sure to indicate (in the .sql file) which problem is being answered with which SQL code. Use the following syntax to comment within a .sql file: /* here is where comments go */. Indeed, feel free to copy the question into the .sql file so that you have it for your own records.

  • save the .Rproj file somewhere you can find it. Don’t keep everything in your downloads folder. Maybe make a folder called SDS261 or something. That folder could live on your Desktop. Or maybe in your Dropbox.

  • The SQL document should be saved in the R Project as lab2-sds261-yourlastname-yourfirstname.sql. You will have to navigate to the R Project to save the DBeaver file in the correct place.

  • Connect to the nyctaxi database, which contains the yellow_old table. See README file (or lab 1) for connection details.

Assignment

  1. Select the trip_distance and total_amount columns from the yellow_old table. For safety (i.e., not crashing your computer), only return the first 5 rows.
  1. Using the AVG() function, find the average total_amount paid in the yellow_old table.
  1. Which type of payment_type had the highest total_amount? How much was it?
  1. How many taxi trips happened on each day of the week? Sort your results so that the day of the week with the fewest rides is at the top of the output.
  1. Which day of the week had the longest trip_distance? Sort the results to have the day of the week with the longest rides at the top of the output.
  1. How many different rate_codes are given in the dataset?
  1. Start with the following lines of query:
SELECT AVG(trip_distance) AS avg_trip, 
       DAYNAME(STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T")) AS wday
  • Can you use WHERE to subset only Saturday and Sunday to find the average trip distance across the weekend? Why or why not?
  • Can you use HAVING to subset only Saturday and Sunday to find the average trip distance across the weekend? Why or why not?