Tables_in_imdb |
---|
aka_name |
aka_title |
cast_info |
char_name |
comp_cast_type |
company_name |
company_type |
complete_cast |
info_type |
keyword |
2024-01-10
Consider a database of information from IMDb.
21 tables in the imdb
database! Lots of details on movies, TV series, video games, and more. Today interest is in movies (and details within, like actors and ratings).
JOIN
sJOIN
(aka INNER JOIN
): include all of the rows that exist in both tablesLEFT JOIN
: include all of the rows in the first table.RIGHT JOIN
: include all of the rows in the second table.FULL OUTER JOIN
: include all rows in either table. (The functionality doesn’t exist in MySQL but can be created using joins and UNION
.)CROSS JOIN
: match each row of the first table with each row in the second table.JOIN
sVenn diagrams of the different types of joins.
JOIN
sMini data tables of the different types of JOIN
. (In SQL the missing values will be labeled as NULL
(not NA
).)
JOIN
sEach JOIN
clause needs four specific pieces of information:
JOIN
.JOIN
being used.JOIN
.Rock bands from the 60s.
# A tibble: 3 × 2
name band
<chr> <chr>
1 Mick Stones
2 John Beatles
3 Paul Beatles
# A tibble: 3 × 2
name plays
<chr> <chr>
1 John guitar
2 Paul bass
3 Keith guitar
The function sqldf()
in the sqldf R package allows for SQL commands on R objects.
INNER JOIN
An inner join combines two datasets returning only the observations that exist in both of the original datasets.
FULL JOIN
A full join combines two datasets returning every observation that exists in either one of the original datasets.
# A tibble: 4 × 3
name band plays
<chr> <chr> <chr>
1 Mick Stones <NA>
2 John Beatles guitar
3 Paul Beatles bass
4 Keith <NA> guitar
The full_join()
function does not have an equivalent in MySQL. See notes for using JOIN
s and UNION
s to produce a full join.
LEFT JOIN
A left join combines two datasets returning every observation that exists in the left (or first) original dataset.
RIGHT JOIN
A right join combines two datasets returning every observation that exists in the right (or second) original dataset.
CROSS JOIN
A right join combines two datasets returning every observation that exists in the right (or second) original dataset.
name band name plays
1 Mick Stones John guitar
2 Mick Stones Paul bass
3 Mick Stones Keith guitar
4 John Beatles John guitar
5 John Beatles Paul bass
6 John Beatles Keith guitar
7 Paul Beatles John guitar
8 Paul Beatles Paul bass
9 Paul Beatles Keith guitar
JOIN
IMDb title + votesIn the imdb
database, the title
table includes information about the 4,626,322 titles in the database, including the id
, title
, kind_id
(indicator for the kind of ID it is), and production_year
. It does not, however, include the review of the title.
id | title | imdb_index | kind_id | production_year | imdb_id | phonetic_code | episode_of_id | season_nr | episode_nr | series_years | md5sum |
---|---|---|---|---|---|---|---|---|---|---|---|
78460 | Adults Recat to the Simpsons (30th Anniversary) | 7 | 2017 | A3432 | 78406 | 2ae09eed7d576cc2c24774fed5b18168 | |||||
70273 | (2016-05-18) | 7 | 2016 | 68058 | 511dfc14cfff7589d29a95abb30cd66a | ||||||
60105 | (2014-04-11) | 7 | 2014 | 59138 | c6cdce7e667e07713e431805c407feed | ||||||
32120 | (2008-05-01) | 7 | 2008 | 32060 | 100df65742caf5afd092b2e0ead67d8e | ||||||
97554 | Schmölders Traum | 7 | 2001 | S2543 | 97302 | 10 | 1 | 46862a2f96f9fb2d59e8c9a11ecfdd28 | |||
57966 | (#1.1) | 7 | 2013 | 57965 | 1 | 1 | 409c37703766c4b24f8a86162fd9cf85 | ||||
76391 | Anniversary | 7 | 1971 | A5162 | 76385 | 4 | 9 | 5e12ce73fac1d1dcf94136b6e9acd8f8 | |||
11952 | Angus Black/Lester Barrie/DC Curry | 7 | 2009 | A5214 | 11937 | 4 | 7 | 9c38b9e5601dc154444b73b518034aa1 | |||
1554 | New Orleans | 7 | 2003 | N6452 | 1508 | 2 | 11 | 621bea735740a547e862e4a3226f35d2 | |||
58442 | Kiss Me Kate | 7 | 2011 | K2523 | 58436 | 1 | 10 | 293e8c75c7f35a4035abf617962be5a9 |
JOIN
IMDb title + votesThe movie_info_idx
table does not contain much information about each particular film. It does, however, have an indicator for the movie ID (given by movie_id
) as well as the number of votes (given by info
where type_id = 100
).
JOIN
IMDb title + votesINNER JOIN
)SELECT title.id,
title.title,
movie_info_idx.info
FROM title
JOIN movie_info_idx ON title.id = movie_info_idx.movie_id
WHERE title.production_year = 2015
AND title.kind_id = 1 # movies only
AND movie_info_idx.info_type_id = 100 # info_type is votes
AND movie_info_idx.info > 150000 # at least 150,000 votes
ORDER BY movie_info_idx.info DESC
LIMIT 0, 20;
id | title | info |
---|---|---|
4260166 | Star Wars: Episode VII - The Force Awakens | 691691 |
3915213 | Mad Max: Fury Road | 666484 |
4389619 | The Martian | 583987 |
3313672 | Avengers: Age of Ultron | 540606 |
4414139 | The Revenant | 526189 |
3787790 | Jurassic World | 471237 |
3752999 | Inside Out | 443051 |
3292159 | Ant-Man | 390965 |
4364483 | The Hateful Eight | 363199 |
4251736 | Spectre | 319875 |
Some aspects of the query are worth pointing out:
* The variables in the output are given in the SELECT
clause. The id
and title
(both from the title
table) and the info
from the movie_info_idx
which represents the number of IMDb votes. * The variables are preceded by the table from which they came. While not always necessary, it is good practice so as to avoid confusion. * The JOIN
happens by linking the id
variable in the title
table with the movie_id
variable in the movie_info_idx
table. * The LIMIT
wasn’t necessary (there are only 12 observations), but it’s good practice so that we don’t end up with unwieldy query results. * The WHERE
clause happens before the JOIN
action, despite being written after. * In the WHERE
clause, we keep only movies, only 2015 production year, and only at least 150,000 votes.
JOIN
IMDb title + actresscast_info
contains the person_id
and the movie_id
id | person_id | movie_id | person_role_id | note | nr_order | role_id |
---|---|---|---|---|---|---|
1 | 1 | 3432997 | 1 | 31 | 1 | |
2 | 2 | 1901690 | 2 | 1 | ||
3 | 3 | 4027567 | 2 | 25 | 1 | |
4 | 3 | 4282876 | 3 | 22 | 1 | |
5 | 4 | 3542672 | 12 | 1 | ||
6 | 5 | 3331520 | 4 | (as $hutter Boy) | 10 | 1 |
7 | 5 | 4027191 | 2 | (as $hutter Boy) | 1 | 1 |
8 | 5 | 4195731 | 5 | (uncredited) | 1 | |
9 | 5 | 4263956 | 6 | (uncredited) | 1 | |
10 | 5 | 4267787 | 7 | (uncredited) | 1 |
person_role_id
is 1
if actor and 2
if actress
JOIN
IMDb title + actressaka_name
contains person_id
and name
(of actor)
id | person_id | name | imdb_index | name_pcode_cf | name_pcode_nf | surname_pcode | md5sum |
---|---|---|---|---|---|---|---|
1 | 6188450 | Smith, Jessica Noel | S5325 | J2542 | S53 | 25c9d464e3ff2957533546aa92b397ed | |
2 | 5125059 | Pain, L. $ham | P545 | L515 | P5 | 569b1e885ccb51211c01753f0dad9b2c | |
3 | 5 | Boy, $hutter | B36 | H361 | B | 35092b5604ce378fc48c8a6fc0038a49 | |
4 | 4152053 | Dollasign, Ty | D4253 | T3425 | D425 | 0f565a2d8027cfb8ed6c5f4bba719fcd | |
5 | 4152053 | Sign, Ty Dolla | S2534 | T3425 | S25 | 2eded1b021b96333b4b74e0fec959650 | |
6 | 6 | Moore, Brandon | M6165 | B6535 | M6 | 193a6f5adf4756320f622162d2475608 | |
7 | 8 | $torm, Country | T6525 | C5363 | T65 | 1654400b707d34323ea392b87060e6cc | |
8 | 19 | ‘Hooper’, Simon P.J. Kelly | H1625 | S5124 | H16 | 3fd8885372c23f8c74e583da91d1fd05 | |
9 | 19 | Hooper | H16 | 24ddc68ab605ee95857ad45b65ffa2d8 | |||
10 | 19 | Kelly, Simon P.J. | K4251 | S5124 | K4 | 33d976f22e276b73c61513bc5f6e72a6 |
JOIN
IMDb title + actresstitle
contains id
and title
(of the movie)
id | title | imdb_index | kind_id | production_year | imdb_id | phonetic_code | episode_of_id | season_nr | episode_nr | series_years | md5sum |
---|---|---|---|---|---|---|---|---|---|---|---|
78460 | Adults Recat to the Simpsons (30th Anniversary) | 7 | 2017 | A3432 | 78406 | 2ae09eed7d576cc2c24774fed5b18168 | |||||
70273 | (2016-05-18) | 7 | 2016 | 68058 | 511dfc14cfff7589d29a95abb30cd66a | ||||||
60105 | (2014-04-11) | 7 | 2014 | 59138 | c6cdce7e667e07713e431805c407feed | ||||||
32120 | (2008-05-01) | 7 | 2008 | 32060 | 100df65742caf5afd092b2e0ead67d8e | ||||||
97554 | Schmölders Traum | 7 | 2001 | S2543 | 97302 | 10 | 1 | 46862a2f96f9fb2d59e8c9a11ecfdd28 | |||
57966 | (#1.1) | 7 | 2013 | 57965 | 1 | 1 | 409c37703766c4b24f8a86162fd9cf85 | ||||
76391 | Anniversary | 7 | 1971 | A5162 | 76385 | 4 | 9 | 5e12ce73fac1d1dcf94136b6e9acd8f8 | |||
11952 | Angus Black/Lester Barrie/DC Curry | 7 | 2009 | A5214 | 11937 | 4 | 7 | 9c38b9e5601dc154444b73b518034aa1 | |||
1554 | New Orleans | 7 | 2003 | N6452 | 1508 | 2 | 11 | 621bea735740a547e862e4a3226f35d2 | |||
58442 | Kiss Me Kate | 7 | 2011 | K2523 | 58436 | 1 | 10 | 293e8c75c7f35a4035abf617962be5a9 |
JOIN
IMDb title + actressGoal: identify the actresses in those movies with the highest number of votes.
SELECT t.title,
idx.info,
a.person_id,
n.name
FROM title AS t
JOIN movie_info_idx AS idx ON t.id = idx.movie_id
JOIN cast_info AS a ON idx.movie_id = a.movie_id
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 150000 # at least 150,000 votes
AND a.role_id = 2 # actresses only
ORDER BY a.person_id
LIMIT 0, 50;
title | info | person_id | name |
---|---|---|---|
Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sofz |
Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sof |
Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sofz |
Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sof |
Focus | 172680 | 2678594 | Labouisse, Kate |
Mad Max: Fury Road | 666484 | 2681098 | Michelle, Debra |
The Hunger Games: Mockingjay - Part 2 | 214569 | 2685496 | Harris, Ahnie |
The Hunger Games: Mockingjay - Part 2 | 214569 | 2685496 | Harris, Jasmine |
Creed | 183904 | 2686218 | Kareema, Kiana A. |
Ant-Man | 390965 | 2687271 | Akana, Anna Kay |
Connecting the most popular movies of 2015 with the actresses in those movies requires a series of JOIN
s. Note that to make the code less onerous, the title
table has been aliased by t
, the movie_info_idx
table has been aliased by idx
, the cast_info
table has been aliased by a
, and the aka_name
table has been aliased by n
.
There is a lot of data cleaning to do as some of the person_id
values are one to many!! That is, the person_id
matches multiple names in the aka_name
database.
JOIN
sLet’s look at RIGHT JOIN
and LEFT JOIN
using two new smaller tables.
The first has seven movies in it (from 2015 with at least 400,000 IMDb votes).
The second consists of almost 3 million actresses (person_role_id = 2
). In order to find a subset of actresses, the person_id > 3900000
was set arbitrarily (in order to have a smaller group with which to work).
Using subqueries, we can JOIN
the two datasets using different JOIN
techniques.
SELECT t.id,
t.title,
idx.info,
(SELECT COUNT(*)
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000) AS row_count
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000 # at least 400,000 votes
ORDER BY idx.info DESC;
id | title | info | row_count |
---|---|---|---|
4260166 | Star Wars: Episode VII - The Force Awakens | 691691 | 7 |
3915213 | Mad Max: Fury Road | 666484 | 7 |
4389619 | The Martian | 583987 | 7 |
3313672 | Avengers: Age of Ultron | 540606 | 7 |
4414139 | The Revenant | 526189 | 7 |
3787790 | Jurassic World | 471237 | 7 |
3752999 | Inside Out | 443051 | 7 |
SELECT a.person_id,
a.movie_id,
n.name,
(SELECT COUNT(*)
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.person_role_id = 2
AND a.person_id > 390000) AS row_count
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.person_role_id = 2
AND a.person_id > 3900000
LIMIT 0, 20;
person_id | movie_id | name | row_count |
---|---|---|---|
3900141 | 759802 | Simons, Rita Joanne | 2904759 |
3902258 | 4365829 | Singer, Rabbi Tovia | 2904759 |
3902699 | 3109788 | Singh, Sabine Erika | 2904759 |
3903035 | 3215866 | Val | 2904759 |
3904831 | 2468067 | Masha | 2904759 |
3904928 | 3654347 | Fei, Siu Yin | 2904759 |
3904928 | 3654347 | Hsiao, Yen-fei | 2904759 |
3904928 | 3654347 | Siu, Yinfei | 2904759 |
3904928 | 3654347 | Xiao, Yanfei | 2904759 |
3904928 | 3654347 | Yin-Fai, Siu | 2904759 |
JOIN
With an inner JOIN
, there are 32 rows corresponding to all the actresses in the seven 2015 films with the most votes. Because the JOIN
is an intersection of the two tables, only the actresses with person_id
above 3900000 are included.
SELECT * FROM
(SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
INNER JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
LIMIT 0, 300;
id | title | person_id | movie_id | name |
---|---|---|---|---|
3313672 | Avengers: Age of Ultron | 3916648 | 3313672 | Smulders, Jacoba Francisca Maria |
3752999 | Inside Out | 4122876 | 3752999 | Kuzniar, Lennon Wynn |
3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
3787790 | Jurassic World | 3950111 | 3787790 | Sallaway, Seannon Jane |
3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
3787790 | Jurassic World | 4084626 | 3787790 | Weeks, Jency |
3787790 | Jurassic World | 4084626 | 3787790 | Jenc |
3787790 | Jurassic World | 4084626 | 3787790 | Little J |
RIGHT JOIN
With a RIGHT JOIN
, there are more than 300 rows (the LIMIT
clause keeps us from knowing how many rows, but there are a LOT!) corresponding to all the actresses whose person_id
above 3900000 are included. Those actresses who acted in one of the seven top 2015 films are also included in the full results table, but they don’t happen to be in the truncated output here.
SELECT * FROM
(SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
RIGHT JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
LIMIT 0, 300;
id | title | person_id | movie_id | name |
---|---|---|---|---|
3900001 | 3355298 | Simonis, Heidi | ||
3900001 | 3509490 | Simonis, Heidi | ||
3900001 | 3739780 | Simonis, Heidi | ||
3900001 | 3943329 | Simonis, Heidi | ||
3900001 | 165963 | Simonis, Heidi | ||
3900001 | 268249 | Simonis, Heidi | ||
3900001 | 268370 | Simonis, Heidi | ||
3900001 | 268406 | Simonis, Heidi | ||
3900001 | 268475 | Simonis, Heidi | ||
3900001 | 279082 | Simonis, Heidi |
LEFT JOIN
With a LEFT JOIN
, there are 33 rows corresponding to the actresses in the seven top 2015 movies. Only The Revenant did not have any actresses whose person_id
is greater than 3900000.
SELECT * FROM
(SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
LEFT JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
LIMIT 0, 300;
id | title | person_id | movie_id | name |
---|---|---|---|---|
3313672 | Avengers: Age of Ultron | 3916648 | 3313672 | Smulders, Jacoba Francisca Maria |
3752999 | Inside Out | 4122876 | 3752999 | Kuzniar, Lennon Wynn |
3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
3787790 | Jurassic World | 3950111 | 3787790 | Sallaway, Seannon Jane |
3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
3787790 | Jurassic World | 4084626 | 3787790 | Weeks, Jency |
3787790 | Jurassic World | 4084626 | 3787790 | Jenc |
3787790 | Jurassic World | 4084626 | 3787790 | Little J |
We might, for example, want to know how many names / spellings of a name with a specific person_id
(above 3900000) exist for each person_id
in each of the top voted seven films of 2015.
SELECT acts.person_id,
COUNT(*) AS num_repeat_names
FROM (SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
GROUP BY acts.person_id;
person_id | num_repeat_names |
---|---|
3916648 | 1 |
4122876 | 1 |
3938423 | 2 |
3950111 | 1 |
4079047 | 2 |
4084626 | 3 |
4099458 | 1 |
3958614 | 1 |
3990819 | 2 |
4081131 | 2 |
We might, for example, want to know how many actresses with a specific person_id
(above 3900000) are in each of the top voted seven films of 2015.
SELECT movs.id,
movs.title,
COUNT(*) AS num_actress
FROM (SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
GROUP BY movs.id;
id | title | num_actress |
---|---|---|
3313672 | Avengers: Age of Ultron | 1 |
3752999 | Inside Out | 1 |
3787790 | Jurassic World | 9 |
3915213 | Mad Max: Fury Road | 5 |
4260166 | Star Wars: Episode VII - The Force Awakens | 15 |
4389619 | The Martian | 1 |
UNION
ingIn SQL a UNION
clause combines two different tables by their rows (whereas JOIN
combines two tables by columns). Think about UNION
similarly to the bind_rows()
command in R.
UNION
sSilly example where the column names are ignored.
UNION
UNION
is specifically designed to bind rows from two different SELECT
queries where the variables have been selected in the same order.
Combine the top voted movies from 2015 with the top voted movies from 2019.
However, to account for time, we require the movies from 2015 to have more votes (400,000) than the movies from 2017 (200,000).
That is, the WHERE
clause is different for the two subqueries.
UNION
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000)
UNION
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2017
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 200000)
LIMIT 0, 100;
title | production_year | num_votes |
---|---|---|
Avengers: Age of Ultron | 2015 | 540606 |
Inside Out | 2015 | 443051 |
Jurassic World | 2015 | 471237 |
Mad Max: Fury Road | 2015 | 666484 |
Star Wars: Episode VII - The Force Awakens | 2015 | 691691 |
The Martian | 2015 | 583987 |
The Revenant | 2015 | 526189 |
Dunkirk | 2017 | 229089 |
Guardians of the Galaxy Vol. 2 | 2017 | 281845 |
Logan | 2017 | 397056 |
UNION ALL
If the goal is to include duplicates across two tables, use UNION ALL
instead of UNION
.
Let’s say that the first table is all movies with production year after 2012 and number of votes greater than 500,000. The second table is movies with production year equal to 2015 and number of votes greater than 400,000.
The Martian would be in both tables.
UNION ALL
With just UNION
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year > 2012
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 500000)
UNION
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000)
ORDER BY production_year DESC, num_votes;
title | production_year | num_votes |
---|---|---|
Batman v Superman: Dawn of Justice | 2016 | 500037 |
Deadpool | 2016 | 673887 |
Inside Out | 2015 | 443051 |
Jurassic World | 2015 | 471237 |
The Revenant | 2015 | 526189 |
Avengers: Age of Ultron | 2015 | 540606 |
The Martian | 2015 | 583987 |
Mad Max: Fury Road | 2015 | 666484 |
Star Wars: Episode VII - The Force Awakens | 2015 | 691691 |
Interstellar | 2014 | 1102826 |
UNION ALL
When UNION ALL
is applied, The Martian is listed twice in the results table.
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year > 2012
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 500000)
UNION ALL
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000)
ORDER BY production_year DESC, num_votes;
title | production_year | num_votes |
---|---|---|
Batman v Superman: Dawn of Justice | 2016 | 500037 |
Deadpool | 2016 | 673887 |
Inside Out | 2015 | 443051 |
Jurassic World | 2015 | 471237 |
The Revenant | 2015 | 526189 |
The Revenant | 2015 | 526189 |
Avengers: Age of Ultron | 2015 | 540606 |
Avengers: Age of Ultron | 2015 | 540606 |
The Martian | 2015 | 583987 |
The Martian | 2015 | 583987 |
It is always a good idea to terminate the SQL connection when you are done with it.