| Tables_in_imdb |
|---|
| aka_name |
| aka_title |
| cast_info |
| char_name |
| comp_cast_type |
| company_name |
| company_type |
| complete_cast |
| info_type |
| keyword |
2024-01-10
Consider a database of information from IMDb.

21 tables in the imdb database! Lots of details on movies, TV series, video games, and more. Today interest is in movies (and details within, like actors and ratings).
JOINsJOIN (aka INNER JOIN): include all of the rows that exist in both tablesLEFT JOIN: include all of the rows in the first table.RIGHT JOIN: include all of the rows in the second table.FULL OUTER JOIN: include all rows in either table. (The functionality doesn’t exist in MySQL but can be created using joins and UNION.)CROSS JOIN: match each row of the first table with each row in the second table.JOINsVenn diagrams of the different types of joins.

Figure 1: Venn diagrams describing different JOINs, image credit: phoenixNAP https://phoenixnap.com/kb/mysql-join
JOINsMini data tables of the different types of JOIN. (In SQL the missing values will be labeled as NULL (not NA).)

Figure 2: Mini data tables describing different JOINs, image credit: Statistics Globe blog, https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti
JOINsEach JOIN clause needs four specific pieces of information:
JOIN.JOIN being used.JOIN.Rock bands from the 60s.
# A tibble: 3 × 2
name band
<chr> <chr>
1 Mick Stones
2 John Beatles
3 Paul Beatles
# A tibble: 3 × 2
name plays
<chr> <chr>
1 John guitar
2 Paul bass
3 Keith guitar
The function sqldf() in the sqldf R package allows for SQL commands on R objects.
INNER JOINAn inner join combines two datasets returning only the observations that exist in both of the original datasets.
FULL JOINA full join combines two datasets returning every observation that exists in either one of the original datasets.
# A tibble: 4 × 3
name band plays
<chr> <chr> <chr>
1 Mick Stones <NA>
2 John Beatles guitar
3 Paul Beatles bass
4 Keith <NA> guitar
The full_join() function does not have an equivalent in MySQL. See notes for using JOINs and UNIONs to produce a full join.
LEFT JOINA left join combines two datasets returning every observation that exists in the left (or first) original dataset.
RIGHT JOINA right join combines two datasets returning every observation that exists in the right (or second) original dataset.
CROSS JOINA right join combines two datasets returning every observation that exists in the right (or second) original dataset.
name band name plays
1 Mick Stones John guitar
2 Mick Stones Paul bass
3 Mick Stones Keith guitar
4 John Beatles John guitar
5 John Beatles Paul bass
6 John Beatles Keith guitar
7 Paul Beatles John guitar
8 Paul Beatles Paul bass
9 Paul Beatles Keith guitar
JOIN IMDb title + votesIn the imdb database, the title table includes information about the 4,626,322 titles in the database, including the id, title, kind_id (indicator for the kind of ID it is), and production_year. It does not, however, include the review of the title.
| id | title | imdb_index | kind_id | production_year | imdb_id | phonetic_code | episode_of_id | season_nr | episode_nr | series_years | md5sum |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 78460 | Adults Recat to the Simpsons (30th Anniversary) | 7 | 2017 | A3432 | 78406 | 2ae09eed7d576cc2c24774fed5b18168 | |||||
| 70273 | (2016-05-18) | 7 | 2016 | 68058 | 511dfc14cfff7589d29a95abb30cd66a | ||||||
| 60105 | (2014-04-11) | 7 | 2014 | 59138 | c6cdce7e667e07713e431805c407feed | ||||||
| 32120 | (2008-05-01) | 7 | 2008 | 32060 | 100df65742caf5afd092b2e0ead67d8e | ||||||
| 97554 | Schmölders Traum | 7 | 2001 | S2543 | 97302 | 10 | 1 | 46862a2f96f9fb2d59e8c9a11ecfdd28 | |||
| 57966 | (#1.1) | 7 | 2013 | 57965 | 1 | 1 | 409c37703766c4b24f8a86162fd9cf85 | ||||
| 76391 | Anniversary | 7 | 1971 | A5162 | 76385 | 4 | 9 | 5e12ce73fac1d1dcf94136b6e9acd8f8 | |||
| 11952 | Angus Black/Lester Barrie/DC Curry | 7 | 2009 | A5214 | 11937 | 4 | 7 | 9c38b9e5601dc154444b73b518034aa1 | |||
| 1554 | New Orleans | 7 | 2003 | N6452 | 1508 | 2 | 11 | 621bea735740a547e862e4a3226f35d2 | |||
| 58442 | Kiss Me Kate | 7 | 2011 | K2523 | 58436 | 1 | 10 | 293e8c75c7f35a4035abf617962be5a9 |
JOIN IMDb title + votesThe movie_info_idx table does not contain much information about each particular film. It does, however, have an indicator for the movie ID (given by movie_id) as well as the number of votes (given by info where type_id = 100).
JOIN IMDb title + votesINNER JOIN)SELECT title.id,
title.title,
movie_info_idx.info
FROM title
JOIN movie_info_idx ON title.id = movie_info_idx.movie_id
WHERE title.production_year = 2015
AND title.kind_id = 1 # movies only
AND movie_info_idx.info_type_id = 100 # info_type is votes
AND movie_info_idx.info > 150000 # at least 150,000 votes
ORDER BY movie_info_idx.info DESC
LIMIT 0, 20;| id | title | info |
|---|---|---|
| 4260166 | Star Wars: Episode VII - The Force Awakens | 691691 |
| 3915213 | Mad Max: Fury Road | 666484 |
| 4389619 | The Martian | 583987 |
| 3313672 | Avengers: Age of Ultron | 540606 |
| 4414139 | The Revenant | 526189 |
| 3787790 | Jurassic World | 471237 |
| 3752999 | Inside Out | 443051 |
| 3292159 | Ant-Man | 390965 |
| 4364483 | The Hateful Eight | 363199 |
| 4251736 | Spectre | 319875 |
Some aspects of the query are worth pointing out:
* The variables in the output are given in the SELECT clause. The id and title (both from the title table) and the info from the movie_info_idx which represents the number of IMDb votes. * The variables are preceded by the table from which they came. While not always necessary, it is good practice so as to avoid confusion. * The JOIN happens by linking the id variable in the title table with the movie_id variable in the movie_info_idx table. * The LIMIT wasn’t necessary (there are only 12 observations), but it’s good practice so that we don’t end up with unwieldy query results. * The WHERE clause happens before the JOIN action, despite being written after. * In the WHERE clause, we keep only movies, only 2015 production year, and only at least 150,000 votes.
JOIN IMDb title + actresscast_info contains the person_id and the movie_id
| id | person_id | movie_id | person_role_id | note | nr_order | role_id |
|---|---|---|---|---|---|---|
| 1 | 1 | 3432997 | 1 | 31 | 1 | |
| 2 | 2 | 1901690 | 2 | 1 | ||
| 3 | 3 | 4027567 | 2 | 25 | 1 | |
| 4 | 3 | 4282876 | 3 | 22 | 1 | |
| 5 | 4 | 3542672 | 12 | 1 | ||
| 6 | 5 | 3331520 | 4 | (as $hutter Boy) | 10 | 1 |
| 7 | 5 | 4027191 | 2 | (as $hutter Boy) | 1 | 1 |
| 8 | 5 | 4195731 | 5 | (uncredited) | 1 | |
| 9 | 5 | 4263956 | 6 | (uncredited) | 1 | |
| 10 | 5 | 4267787 | 7 | (uncredited) | 1 |
person_role_id is 1 if actor and 2 if actress
JOIN IMDb title + actressaka_name contains person_id and name (of actor)
| id | person_id | name | imdb_index | name_pcode_cf | name_pcode_nf | surname_pcode | md5sum |
|---|---|---|---|---|---|---|---|
| 1 | 6188450 | Smith, Jessica Noel | S5325 | J2542 | S53 | 25c9d464e3ff2957533546aa92b397ed | |
| 2 | 5125059 | Pain, L. $ham | P545 | L515 | P5 | 569b1e885ccb51211c01753f0dad9b2c | |
| 3 | 5 | Boy, $hutter | B36 | H361 | B | 35092b5604ce378fc48c8a6fc0038a49 | |
| 4 | 4152053 | Dollasign, Ty | D4253 | T3425 | D425 | 0f565a2d8027cfb8ed6c5f4bba719fcd | |
| 5 | 4152053 | Sign, Ty Dolla | S2534 | T3425 | S25 | 2eded1b021b96333b4b74e0fec959650 | |
| 6 | 6 | Moore, Brandon | M6165 | B6535 | M6 | 193a6f5adf4756320f622162d2475608 | |
| 7 | 8 | $torm, Country | T6525 | C5363 | T65 | 1654400b707d34323ea392b87060e6cc | |
| 8 | 19 | ‘Hooper’, Simon P.J. Kelly | H1625 | S5124 | H16 | 3fd8885372c23f8c74e583da91d1fd05 | |
| 9 | 19 | Hooper | H16 | 24ddc68ab605ee95857ad45b65ffa2d8 | |||
| 10 | 19 | Kelly, Simon P.J. | K4251 | S5124 | K4 | 33d976f22e276b73c61513bc5f6e72a6 |
JOIN IMDb title + actresstitle contains id and title (of the movie)
| id | title | imdb_index | kind_id | production_year | imdb_id | phonetic_code | episode_of_id | season_nr | episode_nr | series_years | md5sum |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 78460 | Adults Recat to the Simpsons (30th Anniversary) | 7 | 2017 | A3432 | 78406 | 2ae09eed7d576cc2c24774fed5b18168 | |||||
| 70273 | (2016-05-18) | 7 | 2016 | 68058 | 511dfc14cfff7589d29a95abb30cd66a | ||||||
| 60105 | (2014-04-11) | 7 | 2014 | 59138 | c6cdce7e667e07713e431805c407feed | ||||||
| 32120 | (2008-05-01) | 7 | 2008 | 32060 | 100df65742caf5afd092b2e0ead67d8e | ||||||
| 97554 | Schmölders Traum | 7 | 2001 | S2543 | 97302 | 10 | 1 | 46862a2f96f9fb2d59e8c9a11ecfdd28 | |||
| 57966 | (#1.1) | 7 | 2013 | 57965 | 1 | 1 | 409c37703766c4b24f8a86162fd9cf85 | ||||
| 76391 | Anniversary | 7 | 1971 | A5162 | 76385 | 4 | 9 | 5e12ce73fac1d1dcf94136b6e9acd8f8 | |||
| 11952 | Angus Black/Lester Barrie/DC Curry | 7 | 2009 | A5214 | 11937 | 4 | 7 | 9c38b9e5601dc154444b73b518034aa1 | |||
| 1554 | New Orleans | 7 | 2003 | N6452 | 1508 | 2 | 11 | 621bea735740a547e862e4a3226f35d2 | |||
| 58442 | Kiss Me Kate | 7 | 2011 | K2523 | 58436 | 1 | 10 | 293e8c75c7f35a4035abf617962be5a9 |
JOIN IMDb title + actressGoal: identify the actresses in those movies with the highest number of votes.
SELECT t.title,
idx.info,
a.person_id,
n.name
FROM title AS t
JOIN movie_info_idx AS idx ON t.id = idx.movie_id
JOIN cast_info AS a ON idx.movie_id = a.movie_id
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 150000 # at least 150,000 votes
AND a.role_id = 2 # actresses only
ORDER BY a.person_id
LIMIT 0, 50;| title | info | person_id | name |
|---|---|---|---|
| Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sofz |
| Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sof |
| Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sofz |
| Mission: Impossible - Rogue Nation | 266759 | 2674172 | Sof |
| Focus | 172680 | 2678594 | Labouisse, Kate |
| Mad Max: Fury Road | 666484 | 2681098 | Michelle, Debra |
| The Hunger Games: Mockingjay - Part 2 | 214569 | 2685496 | Harris, Ahnie |
| The Hunger Games: Mockingjay - Part 2 | 214569 | 2685496 | Harris, Jasmine |
| Creed | 183904 | 2686218 | Kareema, Kiana A. |
| Ant-Man | 390965 | 2687271 | Akana, Anna Kay |
Connecting the most popular movies of 2015 with the actresses in those movies requires a series of JOINs. Note that to make the code less onerous, the title table has been aliased by t, the movie_info_idx table has been aliased by idx, the cast_info table has been aliased by a, and the aka_name table has been aliased by n.
There is a lot of data cleaning to do as some of the person_id values are one to many!! That is, the person_id matches multiple names in the aka_name database.
JOINsLet’s look at RIGHT JOIN and LEFT JOIN using two new smaller tables.
The first has seven movies in it (from 2015 with at least 400,000 IMDb votes).
The second consists of almost 3 million actresses (person_role_id = 2). In order to find a subset of actresses, the person_id > 3900000 was set arbitrarily (in order to have a smaller group with which to work).
Using subqueries, we can JOIN the two datasets using different JOIN techniques.
SELECT t.id,
t.title,
idx.info,
(SELECT COUNT(*)
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000) AS row_count
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000 # at least 400,000 votes
ORDER BY idx.info DESC;| id | title | info | row_count |
|---|---|---|---|
| 4260166 | Star Wars: Episode VII - The Force Awakens | 691691 | 7 |
| 3915213 | Mad Max: Fury Road | 666484 | 7 |
| 4389619 | The Martian | 583987 | 7 |
| 3313672 | Avengers: Age of Ultron | 540606 | 7 |
| 4414139 | The Revenant | 526189 | 7 |
| 3787790 | Jurassic World | 471237 | 7 |
| 3752999 | Inside Out | 443051 | 7 |
SELECT a.person_id,
a.movie_id,
n.name,
(SELECT COUNT(*)
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.person_role_id = 2
AND a.person_id > 390000) AS row_count
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.person_role_id = 2
AND a.person_id > 3900000
LIMIT 0, 20;| person_id | movie_id | name | row_count |
|---|---|---|---|
| 3900141 | 759802 | Simons, Rita Joanne | 2904759 |
| 3902258 | 4365829 | Singer, Rabbi Tovia | 2904759 |
| 3902699 | 3109788 | Singh, Sabine Erika | 2904759 |
| 3903035 | 3215866 | Val | 2904759 |
| 3904831 | 2468067 | Masha | 2904759 |
| 3904928 | 3654347 | Fei, Siu Yin | 2904759 |
| 3904928 | 3654347 | Hsiao, Yen-fei | 2904759 |
| 3904928 | 3654347 | Siu, Yinfei | 2904759 |
| 3904928 | 3654347 | Xiao, Yanfei | 2904759 |
| 3904928 | 3654347 | Yin-Fai, Siu | 2904759 |
JOINWith an inner JOIN, there are 32 rows corresponding to all the actresses in the seven 2015 films with the most votes. Because the JOIN is an intersection of the two tables, only the actresses with person_id above 3900000 are included.
SELECT * FROM
(SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
INNER JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
LIMIT 0, 300;| id | title | person_id | movie_id | name |
|---|---|---|---|---|
| 3313672 | Avengers: Age of Ultron | 3916648 | 3313672 | Smulders, Jacoba Francisca Maria |
| 3752999 | Inside Out | 4122876 | 3752999 | Kuzniar, Lennon Wynn |
| 3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
| 3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
| 3787790 | Jurassic World | 3950111 | 3787790 | Sallaway, Seannon Jane |
| 3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
| 3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
| 3787790 | Jurassic World | 4084626 | 3787790 | Weeks, Jency |
| 3787790 | Jurassic World | 4084626 | 3787790 | Jenc |
| 3787790 | Jurassic World | 4084626 | 3787790 | Little J |
RIGHT JOINWith a RIGHT JOIN, there are more than 300 rows (the LIMIT clause keeps us from knowing how many rows, but there are a LOT!) corresponding to all the actresses whose person_id above 3900000 are included. Those actresses who acted in one of the seven top 2015 films are also included in the full results table, but they don’t happen to be in the truncated output here.
SELECT * FROM
(SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
RIGHT JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
LIMIT 0, 300;| id | title | person_id | movie_id | name |
|---|---|---|---|---|
| 3900001 | 3355298 | Simonis, Heidi | ||
| 3900001 | 3509490 | Simonis, Heidi | ||
| 3900001 | 3739780 | Simonis, Heidi | ||
| 3900001 | 3943329 | Simonis, Heidi | ||
| 3900001 | 165963 | Simonis, Heidi | ||
| 3900001 | 268249 | Simonis, Heidi | ||
| 3900001 | 268370 | Simonis, Heidi | ||
| 3900001 | 268406 | Simonis, Heidi | ||
| 3900001 | 268475 | Simonis, Heidi | ||
| 3900001 | 279082 | Simonis, Heidi |
LEFT JOINWith a LEFT JOIN, there are 33 rows corresponding to the actresses in the seven top 2015 movies. Only The Revenant did not have any actresses whose person_id is greater than 3900000.
SELECT * FROM
(SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
LEFT JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
LIMIT 0, 300;| id | title | person_id | movie_id | name |
|---|---|---|---|---|
| 3313672 | Avengers: Age of Ultron | 3916648 | 3313672 | Smulders, Jacoba Francisca Maria |
| 3752999 | Inside Out | 4122876 | 3752999 | Kuzniar, Lennon Wynn |
| 3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
| 3787790 | Jurassic World | 3938423 | 3787790 | Norvell, Ingrid |
| 3787790 | Jurassic World | 3950111 | 3787790 | Sallaway, Seannon Jane |
| 3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
| 3787790 | Jurassic World | 4079047 | 3787790 | Washington, Kelly Lynn |
| 3787790 | Jurassic World | 4084626 | 3787790 | Weeks, Jency |
| 3787790 | Jurassic World | 4084626 | 3787790 | Jenc |
| 3787790 | Jurassic World | 4084626 | 3787790 | Little J |
We might, for example, want to know how many names / spellings of a name with a specific person_id (above 3900000) exist for each person_id in each of the top voted seven films of 2015.
SELECT acts.person_id,
COUNT(*) AS num_repeat_names
FROM (SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
GROUP BY acts.person_id;| person_id | num_repeat_names |
|---|---|
| 3916648 | 1 |
| 4122876 | 1 |
| 3938423 | 2 |
| 3950111 | 1 |
| 4079047 | 2 |
| 4084626 | 3 |
| 4099458 | 1 |
| 3958614 | 1 |
| 3990819 | 2 |
| 4081131 | 2 |
We might, for example, want to know how many actresses with a specific person_id (above 3900000) are in each of the top voted seven films of 2015.
SELECT movs.id,
movs.title,
COUNT(*) AS num_actress
FROM (SELECT t.id,
t.title
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1 # movies only
AND idx.info_type_id = 100 # info_type is votes
AND idx.info > 400000) AS movs # at least 400,000 votes
JOIN (SELECT a.person_id,
a.movie_id,
n.name
FROM cast_info AS a
JOIN aka_name AS n ON a.person_id = n.person_id
WHERE a.role_id = 2 # acresses only
AND a.person_id > 3900000) AS acts ON acts.movie_id = movs.id
GROUP BY movs.id;| id | title | num_actress |
|---|---|---|
| 3313672 | Avengers: Age of Ultron | 1 |
| 3752999 | Inside Out | 1 |
| 3787790 | Jurassic World | 9 |
| 3915213 | Mad Max: Fury Road | 5 |
| 4260166 | Star Wars: Episode VII - The Force Awakens | 15 |
| 4389619 | The Martian | 1 |
UNIONingIn SQL a UNION clause combines two different tables by their rows (whereas JOIN combines two tables by columns). Think about UNION similarly to the bind_rows() command in R.

Figure 3: UNION binds rows while JOIN appends columns, image credit: Jane Williams https://blog.devart.com/mysql-union-tutorial-html.html
UNIONsSilly example where the column names are ignored.
UNIONUNION is specifically designed to bind rows from two different SELECT queries where the variables have been selected in the same order.
Combine the top voted movies from 2015 with the top voted movies from 2019.
However, to account for time, we require the movies from 2015 to have more votes (400,000) than the movies from 2017 (200,000).
That is, the WHERE clause is different for the two subqueries.
UNION(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000)
UNION
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2017
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 200000)
LIMIT 0, 100;| title | production_year | num_votes |
|---|---|---|
| Avengers: Age of Ultron | 2015 | 540606 |
| Inside Out | 2015 | 443051 |
| Jurassic World | 2015 | 471237 |
| Mad Max: Fury Road | 2015 | 666484 |
| Star Wars: Episode VII - The Force Awakens | 2015 | 691691 |
| The Martian | 2015 | 583987 |
| The Revenant | 2015 | 526189 |
| Dunkirk | 2017 | 229089 |
| Guardians of the Galaxy Vol. 2 | 2017 | 281845 |
| Logan | 2017 | 397056 |
UNION ALLIf the goal is to include duplicates across two tables, use UNION ALL instead of UNION.
Let’s say that the first table is all movies with production year after 2012 and number of votes greater than 500,000. The second table is movies with production year equal to 2015 and number of votes greater than 400,000.
The Martian would be in both tables.
UNION ALLWith just UNION
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year > 2012
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 500000)
UNION
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000)
ORDER BY production_year DESC, num_votes;| title | production_year | num_votes |
|---|---|---|
| Batman v Superman: Dawn of Justice | 2016 | 500037 |
| Deadpool | 2016 | 673887 |
| Inside Out | 2015 | 443051 |
| Jurassic World | 2015 | 471237 |
| The Revenant | 2015 | 526189 |
| Avengers: Age of Ultron | 2015 | 540606 |
| The Martian | 2015 | 583987 |
| Mad Max: Fury Road | 2015 | 666484 |
| Star Wars: Episode VII - The Force Awakens | 2015 | 691691 |
| Interstellar | 2014 | 1102826 |
UNION ALLWhen UNION ALL is applied, The Martian is listed twice in the results table.
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year > 2012
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 500000)
UNION ALL
(SELECT t.title,
t.production_year,
idx.info AS num_votes
FROM title AS t
JOIN movie_info_idx AS idx ON idx.movie_id = t.id
WHERE t.production_year = 2015
AND t.kind_id = 1
AND idx.info_type_id = 100
AND idx.info > 400000)
ORDER BY production_year DESC, num_votes;| title | production_year | num_votes |
|---|---|---|
| Batman v Superman: Dawn of Justice | 2016 | 500037 |
| Deadpool | 2016 | 673887 |
| Inside Out | 2015 | 443051 |
| Jurassic World | 2015 | 471237 |
| The Revenant | 2015 | 526189 |
| The Revenant | 2015 | 526189 |
| Avengers: Age of Ultron | 2015 | 540606 |
| Avengers: Age of Ultron | 2015 | 540606 |
| The Martian | 2015 | 583987 |
| The Martian | 2015 | 583987 |
It is always a good idea to terminate the SQL connection when you are done with it.