2024-01-09
Consider a database of taxi rides from the Yellow Cab company in NYC in March of 2014.
There is only one table in the nyctaxi
database. It is called yellow_old
.
Still using a {sql}
chunk. The DESCRIBE
command shows the 18 field names (variables) in the yellow_old
table. Some of the variables are characters (text
) and some are numeric (either double
or bigint
)
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
vendor_id | text | YES | |||
pickup_datetime | text | YES | |||
dropoff_datetime | text | YES | |||
passenger_count | bigint(20) | YES | |||
trip_distance | double | YES | |||
pickup_longitude | double | YES | |||
pickup_latitude | double | YES | |||
rate_code | bigint(20) | YES | |||
store_and_fwd_flag | text | YES | |||
dropoff_longitude | double | YES |
Queries in SQL start with the SELECT
keyword and consist of several clauses, which must be written in the following order:
SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
LIMIT
The clauses are similar to data wrangling verbs in R.
SELECT
, include a corresponding FROM
*
returns every columnThe shortest SQL query is the following SELECT
command. Do not run this command!!! The yellow_old
table has 15 million rows, and we do not want to look at them simultaneously.
Much better for big tables:
vendor_id | pickup_datetime | dropoff_datetime | passenger_count | trip_distance | pickup_longitude | pickup_latitude | rate_code | store_and_fwd_flag | dropoff_longitude | dropoff_latitude | payment_type | fare_amount | surcharge | mta_tax | tip_amount | tolls_amount | total_amount |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CMT | 2014-03-01 01:07:38 | 2014-03-01 01:16:26 | 1 | 2.0 | -74.0 | 40.7 | 1 | N | -73.9 | 40.7 | CRD | 9.0 | 0.5 | 0.5 | 2.0 | 0 | 12.0 |
CMT | 2014-03-01 01:08:03 | 2014-03-01 01:12:51 | 2 | 1.2 | -74.0 | 40.7 | 1 | N | -74.0 | 40.7 | CRD | 6.0 | 0.5 | 0.5 | 1.0 | 0 | 8.0 |
CMT | 2014-03-01 01:08:51 | 2014-03-01 01:13:18 | 3 | 0.5 | -73.9 | 40.7 | 1 | N | -74.0 | 40.7 | CRD | 5.0 | 0.5 | 0.5 | 1.2 | 0 | 7.2 |
CMT | 2014-03-01 01:09:20 | 2014-03-01 01:24:18 | 3 | 3.5 | -74.0 | 40.7 | 1 | N | -74.0 | 40.8 | CRD | 14.0 | 0.5 | 0.5 | 3.0 | 0 | 18.0 |
CMT | 2014-03-01 01:09:46 | 2014-03-01 01:22:34 | 1 | 1.8 | -74.0 | 40.7 | 1 | N | -74.0 | 40.7 | CRD | 10.5 | 0.5 | 0.5 | 1.0 | 0 | 12.5 |
CMT | 2014-03-01 01:12:41 | 2014-03-01 01:15:38 | 1 | 0.5 | -74.0 | 40.7 | 1 | N | -74.0 | 40.7 | CRD | 4.0 | 0.5 | 0.5 | 0.5 | 0 | 5.5 |
CMT | 2014-03-01 01:12:11 | 2014-03-01 01:27:38 | 2 | 3.7 | -74.0 | 40.8 | 1 | N | -74.0 | 40.7 | CRD | 14.5 | 0.5 | 0.5 | 3.1 | 0 | 18.6 |
CMT | 2014-03-01 01:13:55 | 2014-03-01 01:34:54 | 1 | 5.4 | -74.0 | 40.8 | 1 | N | -74.0 | 40.7 | CRD | 20.0 | 0.5 | 0.5 | 3.0 | 0 | 24.0 |
CMT | 2014-03-01 01:14:06 | 2014-03-01 01:28:25 | 1 | 3.5 | -74.0 | 40.7 | 1 | N | -74.0 | 40.8 | CRD | 13.5 | 0.5 | 0.5 | 2.9 | 0 | 17.4 |
How do we know how many taxi rides are recorded? Two different ways of counting the rows:
Yikes, more than 15 million taxi rides!
What is the length of a taxi ride (in time)? … unfortunately, pickup_datetime
and dropoff_datetime
are saved as character strings instead of in DateTime format.
SELECT
pickup_datetime, dropoff_datetime,
STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T") AS pickup,
STR_TO_DATE(dropoff_datetime, "%Y-%m-%d %T") AS dropoff
FROM yellow_old
LIMIT 0, 10;
pickup_datetime | dropoff_datetime | pickup | dropoff |
---|---|---|---|
2014-03-01 01:07:38 | 2014-03-01 01:16:26 | 2014-03-01 01:07:38 | 2014-03-01 01:16:26 |
2014-03-01 01:08:03 | 2014-03-01 01:12:51 | 2014-03-01 01:08:03 | 2014-03-01 01:12:51 |
2014-03-01 01:08:51 | 2014-03-01 01:13:18 | 2014-03-01 01:08:51 | 2014-03-01 01:13:18 |
2014-03-01 01:09:20 | 2014-03-01 01:24:18 | 2014-03-01 01:09:20 | 2014-03-01 01:24:18 |
2014-03-01 01:09:46 | 2014-03-01 01:22:34 | 2014-03-01 01:09:46 | 2014-03-01 01:22:34 |
2014-03-01 01:12:41 | 2014-03-01 01:15:38 | 2014-03-01 01:12:41 | 2014-03-01 01:15:38 |
2014-03-01 01:12:11 | 2014-03-01 01:27:38 | 2014-03-01 01:12:11 | 2014-03-01 01:27:38 |
2014-03-01 01:13:55 | 2014-03-01 01:34:54 | 2014-03-01 01:13:55 | 2014-03-01 01:34:54 |
2014-03-01 01:14:06 | 2014-03-01 01:28:25 | 2014-03-01 01:14:06 | 2014-03-01 01:28:25 |
Why can’t we find the difference between the new two time variables?
Solution 1: two layers of SELECT
SELECT
(i.e., inside) layer creates the new variablesSELECT
(i.e., outside) layer subtracts the two timesSELECT
pickup,
dropoff,
TIMEDIFF(pickup, dropoff) AS length_time
FROM (
SELECT
STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T") AS pickup,
STR_TO_DATE(dropoff_datetime, "%Y-%m-%d %T") AS dropoff
FROM yellow_old)
AS subquery_table
LIMIT 0, 20;
pickup | dropoff | length_time |
---|---|---|
2014-03-01 01:07:38 | 2014-03-01 01:16:26 | 00:08:48 |
2014-03-01 01:08:03 | 2014-03-01 01:12:51 | 00:04:48 |
2014-03-01 01:08:51 | 2014-03-01 01:13:18 | 00:04:27 |
2014-03-01 01:09:20 | 2014-03-01 01:24:18 | 00:14:58 |
2014-03-01 01:09:46 | 2014-03-01 01:22:34 | 00:12:48 |
2014-03-01 01:12:41 | 2014-03-01 01:15:38 | 00:02:57 |
2014-03-01 01:12:11 | 2014-03-01 01:27:38 | 00:15:27 |
2014-03-01 01:13:55 | 2014-03-01 01:34:54 | 00:20:59 |
2014-03-01 01:14:06 | 2014-03-01 01:28:25 | 00:14:19 |
Solution 2: apply the STR_TO_DATE()
function inside the TIMEDIFF()
function
SELECT
pickup_datetime,
dropoff_datetime,
TIMEDIFF(STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T"),
STR_TO_DATE(dropoff_datetime, "%Y-%m-%d %T")) AS length_time
FROM yellow_old
LIMIT 0, 20;
pickup_datetime | dropoff_datetime | length_time |
---|---|---|
2014-03-01 01:07:38 | 2014-03-01 01:16:26 | 00:08:48 |
2014-03-01 01:08:03 | 2014-03-01 01:12:51 | 00:04:48 |
2014-03-01 01:08:51 | 2014-03-01 01:13:18 | 00:04:27 |
2014-03-01 01:09:20 | 2014-03-01 01:24:18 | 00:14:58 |
2014-03-01 01:09:46 | 2014-03-01 01:22:34 | 00:12:48 |
2014-03-01 01:12:41 | 2014-03-01 01:15:38 | 00:02:57 |
2014-03-01 01:12:11 | 2014-03-01 01:27:38 | 00:15:27 |
2014-03-01 01:13:55 | 2014-03-01 01:34:54 | 00:20:59 |
2014-03-01 01:14:06 | 2014-03-01 01:28:25 | 00:14:19 |
There is a distinction between clauses that operate on the variables of the original table versus those that operate on the variables of the results set.
pickup_datetime
and dropoff_datetime
are columns in the original table - they are written to disk on the SQL server.
pickup
, dropoff
, and length_time
exist only in the results set, which is passed from the server (SQL server) to the client (e.g., RStudio or DBeaver) and is not written to disk.
Returns only unique rows.
payment_type |
---|
CRD |
CSH |
NOC |
DIS |
UNK |
vendor_id | payment_type |
---|---|
CMT | CRD |
CMT | CSH |
CMT | NOC |
CMT | DIS |
VTS | CRD |
VTS | CSH |
VTS | UNK |
In case you are curious:
VTS
is Verifone Transportation Systems and CMT
is Mobile Knowledge Systems Inc.CRD
is credit card; CSH
is cash; NOC
is no charge; DIS
is disputeThe WHERE
clause is analogous to the filter()
function in dplyr. However, keep in mind that there are two SQL commands that resemble the dplyr filter()
function. WHERE
operates on the original data in the table and HAVING
operates on the result set.
What was the fare for those taxi rides where the tip_amount
was more than $10 and the person used cash?
Note that in SQL the equality logical is =
and in R the equality logical is ==
.
SELECT payment_type, fare_amount, tip_amount, total_amount
FROM yellow_old
WHERE tip_amount > 10
AND payment_type = "CSH"
LIMIT 0, 10;
payment_type | fare_amount | tip_amount | total_amount |
---|---|---|---|
CSH | 65.5 | 15.3 | 91.8 |
CSH | 52.0 | 11.6 | 69.4 |
CSH | 52.0 | 11.6 | 69.4 |
CSH | 55.0 | 16.2 | 81.2 |
CSH | 71.5 | 20.0 | 103.5 |
CSH | 70.0 | 16.2 | 97.1 |
CSH | 95.0 | 21.9 | 131.2 |
CSH | 62.5 | 15.5 | 93.0 |
CSH | 66.0 | 15.0 | 90.0 |
CSH | 65.0 | 13.2 | 79.2 |
BETWEEN
can be used to specify a range of values for a numeric value. BETWEEN
is inclusive.
SELECT payment_type, fare_amount, tip_amount, total_amount
FROM yellow_old
WHERE tip_amount BETWEEN 10 and 12
AND payment_type = "CSH"
LIMIT 0, 10;
payment_type | fare_amount | tip_amount | total_amount |
---|---|---|---|
CSH | 52.0 | 11.6 | 69.4 |
CSH | 52.0 | 11.6 | 69.4 |
CSH | 88.0 | 10.0 | 107.0 |
CSH | 72.0 | 10.0 | 94.0 |
CSH | 64.5 | 10.0 | 85.5 |
CSH | 66.0 | 12.0 | 93.0 |
CSH | 52.0 | 11.6 | 69.4 |
CSH | 69.0 | 10.0 | 88.0 |
CSH | 90.0 | 10.0 | 100.0 |
CSH | 52.0 | 11.6 | 69.4 |
IN
is similar to the dplyr %in%
function which specifies distinct values for the variable.
SELECT payment_type, fare_amount, tip_amount, total_amount
FROM yellow_old
WHERE tip_amount IN (10, 12)
AND payment_type = "CSH"
LIMIT 0, 10;
payment_type | fare_amount | tip_amount | total_amount |
---|---|---|---|
CSH | 88.0 | 10 | 107.0 |
CSH | 72.0 | 10 | 94.0 |
CSH | 64.5 | 10 | 85.5 |
CSH | 66.0 | 12 | 93.0 |
CSH | 69.0 | 10 | 88.0 |
CSH | 90.0 | 10 | 100.0 |
CSH | 74.5 | 10 | 90.3 |
CSH | 89.0 | 10 | 118.1 |
CSH | 52.0 | 10 | 67.8 |
CSH | 66.0 | 12 | 90.0 |
AND
takes precedent over OR
in the order of operations, when there are no parentheses.
SELECT payment_type, fare_amount, tip_amount, total_amount
FROM yellow_old
WHERE tip_amount BETWEEN 10 and 12 OR
total_amount BETWEEN 100 and 112 AND
payment_type = "CSH"
LIMIT 0, 10;
payment_type | fare_amount | tip_amount | total_amount |
---|---|---|---|
CRD | 52.0 | 10.5 | 63.0 |
CRD | 35.0 | 10.2 | 51.0 |
CRD | 52.0 | 11.6 | 69.4 |
CRD | 30.5 | 10.8 | 47.2 |
CRD | 52.0 | 10.5 | 63.0 |
CRD | 52.0 | 10.5 | 63.0 |
CRD | 52.0 | 11.6 | 69.4 |
CRD | 52.0 | 11.6 | 69.4 |
CRD | 52.0 | 11.6 | 69.4 |
CRD | 52.0 | 11.6 | 69.4 |
Parentheses take precedent over AND
and OR
.
SELECT payment_type, fare_amount, tip_amount, total_amount
FROM yellow_old
WHERE (tip_amount BETWEEN 10 and 12 OR
total_amount BETWEEN 100 and 112 ) AND
payment_type = "CSH"
LIMIT 0, 10;
payment_type | fare_amount | tip_amount | total_amount |
---|---|---|---|
CSH | 107.0 | 0.0 | 108.0 |
CSH | 92.5 | 0.0 | 103.5 |
CSH | 99.5 | 0.0 | 105.3 |
CSH | 92.0 | 0.0 | 106.3 |
CSH | 103.0 | 0.0 | 109.3 |
CSH | 107.0 | 0.0 | 107.0 |
CSH | 104.5 | 0.0 | 105.5 |
CSH | 112.0 | 0.0 | 112.0 |
CSH | 52.0 | 11.6 | 69.4 |
CSH | 52.0 | 11.6 | 69.4 |
IS NULL
not = NULL
(because NULL
indicates unknown)
NULL
exampleThe logic of NULL
:1
NULL
, you’ll just get NULL
. For instance if \(x\) is NULL
, then \(x > 3\), \(1 = x\), and \(x + 4\) all evaluate to NULL
. Even \(x =\) NULL
evaluates to NULL
! if you want to check whether \(x\) is NULL
, use x IS NULL
or x IS NOT NULL
.NULL
short-circuits with boolean operators. That means a boolean expression involving NULL
will evaluate to:
NULL
value is really TRUE or FALSE.NULL
value is really TRUE or FALSE.NULL
, if it depends on the NULL
value.NULL
exampleConsider the following table and SQL query to evaluate WHERE age <= 20 OR num_dogs = 3
:
SELECT * FROM (
SELECT 'Ace' AS name, 20 AS age, 4 as num_dogs
UNION
SELECT 'Ada' AS name, NULL AS age, 3 as num_dogs
UNION
SELECT 'Ben' AS name, NULL AS age, NULL as num_dogs
UNION
SELECT 'Cho' AS name, 27 AS age, NULL as num_dogs
) AS temptable;
name | age | num_dogs |
---|---|---|
Ace | 20 | 4 |
Ada | 3 | |
Ben | ||
Cho | 27 |
NULL
exampleWhere does the WHERE
clause do? It tells us that we only want to keep the rows satisfying the age <= 20
OR num_dogs = 3
. Let’s consider each row one at a time:
age <= 20
evaluates to TRUE so the claim is satisfied.age <= 20
evaluates to NULL
but num_dogs = 3
evaluates to TRUE so the claim is satisfied.age <= 20
evaluates to NULL
and num_dogs = 3
evaluates to NULL
so the overall expression is NULL
which has a FALSE value.age <= 20
evaluates to FALSE and num_dogs = 3
evaluates to NULL
so the overall expression evaluates to NULL
(because it depends on the value of the NULL
).Thus we keep only Ace and Ada.
NULL
exampleSELECT * FROM (
SELECT 'Ace' AS name, 20 AS age, 4 as num_dogs
UNION
SELECT 'Ada' AS name, NULL AS age, 3 as num_dogs
UNION
SELECT 'Ben' AS name, NULL AS age, NULL as num_dogs
UNION
SELECT 'Cho' AS name, 27 AS age, NULL as num_dogs
) AS temptable
WHERE age <= 20 OR num_dogs = 3;
name | age | num_dogs |
---|---|---|
Ace | 20 | 4 |
Ada | 3 |
The GROUP BY
clause will direct SQL to carry out the query separately for each category in the grouped variable.
COUNT()
, SUM()
, MAX()
, MIN()
, and AVG()
.SELECT COUNT(*) AS num_transactions,
SUM(1) AS num_transactions_also,
SUM(2) AS double_transactions,
payment_type
FROM yellow_old
WHERE tip_amount BETWEEN 10 and 20
GROUP BY payment_type;
num_transactions | num_transactions_also | double_transactions | payment_type |
---|---|---|---|
213872 | 213872 | 427744 | CRD |
78 | 78 | 156 | CSH |
3 | 3 | 6 | DIS |
7 | 7 | 14 | NOC |
609 | 609 | 1218 | UNK |
For those people who tipped between $10 and $20, what was the lowest and highest fare for each of the types of payments?
SELECT COUNT(*) AS num_transactions,
MIN(fare_amount) AS lowest_fare,
MAX(fare_amount) AS highest_fare,
payment_type
FROM yellow_old
WHERE tip_amount BETWEEN 10 and 20
GROUP BY payment_type;
num_transactions | lowest_fare | highest_fare | payment_type |
---|---|---|---|
213872 | 0.0 | 370.0 | CRD |
78 | 52.0 | 102.0 | CSH |
3 | 52.0 | 79.5 | DIS |
7 | 58.0 | 94.0 | NOC |
609 | 4.5 | 147.0 | UNK |
GROUP BY
will work applied to multiple columns.
What is wday
?
SELECT COUNT(*) AS num_transactions,
MIN(fare_amount) AS lowest_fare,
MAX(fare_amount) AS highest_fare,
payment_type,
DAYNAME(STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T")) AS wday
FROM yellow_old
GROUP BY payment_type, wday;
num_transactions | lowest_fare | highest_fare | payment_type | wday |
---|---|---|---|---|
1 | ||||
1247397 | 2.5 | 477 | CRD | Friday |
1278362 | -612.4 | 500 | CRD | Monday |
1533796 | 2.5 | 420 | CRD | Saturday |
1324394 | 2.5 | 480 | CRD | Sunday |
1258098 | 2.5 | 500 | CRD | Thursday |
1121081 | 2.5 | 500 | CRD | Tuesday |
1192892 | 2.5 | 400 | CRD | Wednesday |
860920 | 2.5 | 444 | CSH | Friday |
918653 | 0.0 | 873 | CSH | Monday |
ORDER BY
allows us to look at interesting aspects of the data by sorting the data.
SELECT COUNT(*) AS num_transactions,
MIN(fare_amount) AS lowest_fare,
MAX(fare_amount) AS highest_fare,
payment_type,
DAYNAME(STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T")) AS wday
FROM yellow_old
GROUP BY payment_type, wday
ORDER BY lowest_fare ASC;
num_transactions | lowest_fare | highest_fare | payment_type | wday |
---|---|---|---|---|
1 | ||||
1278362 | -612.4 | 500 | CRD | Monday |
918653 | 0.0 | 873 | CSH | Monday |
5440 | 0.0 | 950 | NOC | Monday |
1537 | 0.0 | 102 | DIS | Monday |
1533796 | 2.5 | 420 | CRD | Saturday |
1121081 | 2.5 | 500 | CRD | Tuesday |
1192892 | 2.5 | 400 | CRD | Wednesday |
1258098 | 2.5 | 500 | CRD | Thursday |
1247397 | 2.5 | 477 | CRD | Friday |
WHAT?!?!! How in the world was one of the fares -$612.40?
SELECT COUNT(*) AS num_transactions,
MIN(fare_amount) AS lowest_fare,
MAX(fare_amount) AS highest_fare,
payment_type,
DAYNAME(STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T")) AS wday
FROM yellow_old
GROUP BY payment_type, wday
ORDER BY highest_fare DESC;
num_transactions | lowest_fare | highest_fare | payment_type | wday |
---|---|---|---|---|
5440 | 0.0 | 950 | NOC | Monday |
918653 | 0.0 | 873 | CSH | Monday |
1278362 | -612.4 | 500 | CRD | Monday |
1121081 | 2.5 | 500 | CRD | Tuesday |
1258098 | 2.5 | 500 | CRD | Thursday |
1324394 | 2.5 | 480 | CRD | Sunday |
1247397 | 2.5 | 477 | CRD | Friday |
1222 | 2.5 | 475 | DIS | Tuesday |
813813 | 2.5 | 475 | CSH | Thursday |
860920 | 2.5 | 444 | CSH | Friday |
$950 is a lot to pay for a cab ride! But in NYC, I’d believe it.
SELECT COUNT(*) AS num_transactions,
MIN(fare_amount) AS lowest_fare,
MAX(fare_amount) AS highest_fare,
payment_type,
DAYNAME(STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T")) AS wday
FROM yellow_old
GROUP BY payment_type, wday
ORDER BY wday, payment_type;
num_transactions | lowest_fare | highest_fare | payment_type | wday |
---|---|---|---|---|
1 | ||||
1247397 | 2.5 | 477 | CRD | Friday |
860920 | 2.5 | 444 | CSH | Friday |
1592 | 2.5 | 255 | DIS | Friday |
5252 | 2.5 | 229 | NOC | Friday |
10131 | 2.5 | 130 | UNK | Friday |
1278362 | -612.4 | 500 | CRD | Monday |
918653 | 0.0 | 873 | CSH | Monday |
1537 | 0.0 | 102 | DIS | Monday |
5440 | 0.0 | 950 | NOC | Monday |
Note that both GROUP BY
and ORDER BY
evaluate the data after it has been retrieved. Therefore, the functions operate on the results set, not the original rows of the data.
We are able to GROUP BY
and ORDER BY
on the new variables we had created, wday
.
Recall that WHERE
acts only on the original data. If we are interested in rides that took place on Friday
, we need to use the derived variable wday
instead of the raw variable pickup_datetime
. Fortunately, HAVING
works on the results set.
SELECT COUNT(*) AS num_transactions,
MIN(fare_amount) AS lowest_fare,
MAX(fare_amount) AS highest_fare,
payment_type,
DAYNAME(STR_TO_DATE(pickup_datetime, "%Y-%m-%d %T")) AS wday
FROM yellow_old
GROUP BY payment_type, wday
HAVING wday = 'Friday';
num_transactions | lowest_fare | highest_fare | payment_type | wday |
---|---|---|---|---|
1247397 | 2.5 | 477 | CRD | Friday |
860920 | 2.5 | 444 | CSH | Friday |
1592 | 2.5 | 255 | DIS | Friday |
5252 | 2.5 | 229 | NOC | Friday |
10131 | 2.5 | 130 | UNK | Friday |
While it worked out quite well for us that HAVING
was able to filter the data based on the results set, the use of HAVING
was quite onerous because the entire data set was considered before the filter was applied. That is, if the filter can be done on the original data using WHERE
, the query will be much faster and more efficient.
Note: HAVING
requires a GROUP BY
clause. And the variable(s) used in HAVING
must also be part of the GROUP BY
clause.
LIMIT
truncates the query to specified rows. The first number is the offset (i.e., the number of rows to skip), the second number is the (maximum) number of rows to return. Here, we return rows 15428119 through 15428128.
The first number is optional.
vendor_id | pickup_datetime | dropoff_datetime | passenger_count | trip_distance | pickup_longitude | pickup_latitude | rate_code | store_and_fwd_flag | dropoff_longitude | dropoff_latitude | payment_type | fare_amount | surcharge | mta_tax | tip_amount | tolls_amount | total_amount |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CMT | 2014-03-18 14:35:21 | 2014-03-18 14:52:01 | 1 | 2.7 | 0 | 0.0 | 1 | N | 0 | 0.0 | CRD | 13.0 | 0 | 0.5 | 2.50 | 0.00 | 16.0 |
CMT | 2014-03-18 14:08:23 | 2014-03-18 14:19:29 | 2 | 1.3 | 0 | 0.0 | 1 | N | 0 | 0.0 | CRD | 9.0 | 0 | 0.5 | 1.90 | 0.00 | 11.4 |
CMT | 2014-03-18 09:18:38 | 2014-03-18 09:19:41 | 1 | 0.2 | -74 | 40.8 | 1 | N | -74 | 40.8 | CRD | 3.0 | 0 | 0.5 | 1.00 | 0.00 | 4.5 |
CMT | 2014-03-18 06:28:12 | 2014-03-18 06:49:49 | 1 | 9.9 | 0 | 0.0 | 1 | N | 0 | 0.0 | CRD | 30.0 | 0 | 0.5 | 7.16 | 5.33 | 43.0 |
CMT | 2014-03-18 17:39:28 | 2014-03-18 17:53:01 | 1 | 4.9 | -74 | 40.8 | 1 | N | -74 | 40.7 | CRD | 16.5 | 1 | 0.5 | 3.00 | 0.00 | 21.0 |
CMT | 2014-03-18 18:14:19 | 2014-03-18 18:27:22 | 1 | 0.3 | -74 | 40.7 | 1 | N | -74 | 40.7 | CRD | 3.5 | 1 | 0.5 | 6.00 | 0.00 | 11.0 |
CMT | 2014-03-18 10:12:33 | 2014-03-18 10:28:09 | 1 | 3.3 | -74 | 40.8 | 1 | N | -74 | 40.8 | CRD | 13.5 | 0 | 0.5 | 4.20 | 0.00 | 18.2 |
CMT | 2014-03-18 09:02:37 | 2014-03-18 09:16:29 | 1 | 6.2 | -74 | 40.8 | 1 | N | -74 | 40.7 | CRD | 19.5 | 0 | 0.5 | 1.00 | 0.00 | 21.0 |
CMT | 2014-03-18 10:10:19 | 2014-03-18 10:19:25 | 1 | 1.7 | -74 | 40.8 | 1 | N | -74 | 40.8 | CRD | 8.5 | 0 | 0.5 | 1.80 | 0.00 | 10.8 |
CMT | 2014-03-18 15:24:53 | 2014-03-18 15:42:42 | 1 | 1.7 | -74 | 40.8 | 1 | N | -74 | 40.7 | CRD | 12.5 | 0 | 0.5 | 2.60 | 0.00 | 15.6 |
If you are working in R to run SQL commands, you may want to use the query output for further analysis or visualizations.
#|output.var: "name_of_variable"
inside the {sql}
chunk.name_of_variable
will then be available to be used in the R environment.Always a good idea to terminate the SQL connection when you are done with it.