Step: Join

Overview

A Join step will combine data from two tables based on a join condition so it can be queried together.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
*----------+----------*/      *---------+-------*/

Example output data:

Inner join where student = student

/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 *---------+----------*-------*/

Step structure

  • There will be at least one join block where you will define a table to join and a join condition for that table.

  • When multiple blocks exist, the tables will be joined in sequence.

Field descriptions

Field
Description

Join table

The table containing the data you wish to combine with this transform's data. Once selected, this table's variable will be marked t1 (or t2 etc. if other joins exist).

Join type

The way that the tables will be combined. (More information below.)

Source variable

The variable in your source table that you want to join on. All records in this variable will be matched against all records in the Joined table variable.

Joined table variable

The variable in your selected Join table that you want to join on. All records in this variable will be matched against all records in the Source variable.

Join types

Inner join

If a row in either the source or join table doesn’t have a match, it will be dropped. If a row is matched multiple times, it will be multiplied in the output.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
*----------+----------*/      *---------+-------*/

INNER JOIN on student

Output:
/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 *---------+----------*-------*/

Left join

If a row in the source doesn’t have a match, all joined variables will be null for that row. If a row in the join table doesn’t have a match, it will be dropped. If a row is matched multiple times, it will be multiplied in the output.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
 | zay     | 2        |       | toni    | 30    |
*----------+----------*/      *---------+-------*/

LEFT JOIN on student

Output:
/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 | zay     | 2        | NULL  |
 *---------+----------*-------*/

Right join

If a row in the source doesn’t have a match, it will be dropped. If a row in the join table doesn’t have a match, all source variables will be null for that row. If a row is matched multiple times, it will be multiplied in the output. This is the same as a left join, but reversed.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
 | zay     | 2        |       | toni    | 30    |
*----------+----------*/      *---------+-------*/

RIGHT JOIN on student

Output:
/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 | toni    | NULL     | 30    |
 *---------+----------*-------*/

Full join

If a row in either the source or join table doesn’t have a match, all source variables will be null for that row, including for the column being joined upon. If a row is matched multiple times, it will be multiplied in the output.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
 | zay     | 2        |       | toni    | 30    |
*----------+----------*/      *---------+-------*/

FULL JOIN on student

Output:
/*-----------+-----------+----------+--------*
 | student_A | student_B | absences | score  |
 +-----------+-----------+----------+--------+
 | jane      | jane      | 0        | 85     |
 | sam       | sam       | 6        | 64     |
 | pat       | pat       | 1        | 88     |
 | zay       | NULL      | 2        | NULL   |
 | NULL      | toni      | NULL     | 30     |
 *-----------+-----------+----------+--------*/

Cross join

Every row in the source table will be combined with every row in the joined table. This might be used to perform a join on a new variable that will be created downstream (such as in a geospatial join). You will almost always need to use a filter after this join for the query to successfully execute.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
*----------+----------*/      *---------+-------*/

CROSS JOIN

Output:
/*---------+----------+---------+---------*
 | student | absences | student | score   |
 +---------+----------+---------+---------+
 | jane    | 0        | jane    | 85      |
 | jane    | 0        | sam     | 64      |
 | sam     | 6        | jane    | 85      |
 | sam     | 6        | sam     | 64      |
 *---------+----------*---------+---------*/

Join structures

Depending on the structure of your data, joins might multiply in rows.

1-to-1 joins

If the variable you are joining on has no duplicate values in either the source table or the joined table, then your output table will always have the same or fewer rows than the sum of both table's rows added together.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
*----------+----------*/      *---------+-------*/

INNER JOIN on student

Output:
/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 *---------+----------*-------*/

1-many joins

If the variable you are joining on has duplicates in either table, any time there is is a duplicate row it will be matched for each duplicate.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
*----------+----------*/      | pat     | 86    |
                              *---------+-------*/

INNER JOIN on student

Output:
/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 | pat     | 1        | 86    |
 *---------+----------*-------*/

Many-to-many joins

If the variable you are joining on has duplicates in both tables, any time there is is a duplicate row it will be matched for each duplicate.

Table A:                     Table B:
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
 | pat     | 3        |       | pat     | 68    |
*----------+----------*/      *---------+-------*/

INNER JOIN on student

Output:
/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 | pat     | 1        | 86    |
 | pat     | 3        |     |
 | pat     | 3        | 86    |
 *---------+----------*-------*/

Examples

Example 1: Simple join condition

Let's say our information about student absenses and test scores are in separate tables and we want to join them together.

Starting data:

Source table (t0)             Table 1 (t1)    
/*---------+----------*      /*---------+-------*
 | student | absences |       | student | score |
 +---------+----------+       +---------+-------+
 | jane    | 0        |       | jane    | 85    |
 | sam     | 6        |       | sam     | 64    |
 | pat     | 1        |       | pat     | 88    |
*----------+----------*/      *---------+-------*/

Input fields:

  • Join table: The data we want to join is in Table 1 so we select it here.

  • Join type: Since we only care about students that have both absence and score information we choose inner join which will drop any rows without a match in both tables.

  • Source variable / Joined table variable: We want to match on the variable student which is present in both tables. So we select our source table (t0)'s student on the left, and our joined table (t1)'s student variable on the right.

    • Since this variable has the same name in both tables, they will be combined into one variable in the output table.

Output data:

/*---------+----------+-------*
 | student | absences | score |
 +---------+----------+-------+
 | jane    | 0        | 85    |
 | sam     | 6        | 64    |
 | pat     | 1        | 88    |
 *---------+----------*-------*/

Example 2: More specific join condition

Let's say instead of a table with aggregated absences we have a daily timeliness chart. We want to join the scores for the corresponding attendance information for both the student and date in question.

Starting data:

Source table (t0)                        Table 1 (t1)    
/*---------+------------+---------*      /*---------+------------+-------*
 | student | date       | on_time |       | student | date       | score |
 +---------+------------+---------+       +---------+------------+-------+
 | jane    | 2020-01-01 | TRUE    |       | jane    | 2020-01-01 | 85    |
 | sam     | 2020-01-01 | FALSE   |       | sam     | 2020-01-01 | 65    |
 | jane    | 2020-02-01 | FALSE   |       *---------+------------+-------*/
*----------+------------+--------*/     

Input fields:

Join type: Since we want to keep our absence data whether a test was taken or not, we do a Left join.

Join condition: We join on fields where BOTH the values in the student and date variables are a match by selecting both in our join condition.

Output data:

/*---------+------------+---------+-------*
 | student | date       | on_time | score |
 +---------+------------+---------+-------+
 | jane    | 2020-01-01 | TRUE    | 85    |
 | sam     | 2020-01-01 | FALSE   | 65    |
 | jane    | 2020-02-01 | FALSE   | NULL  |
*----------+------------+---------+-------*/     

Example 3: Complex join conditions with cross joins

Starting data:

Source table (t0)                Table 1 (t1)    
/*------------+----------*      /*------------+-------*
 | student_id | absences |       | student_id | score |
 +------------+----------+       +------------+-------+
 | 01         | 0        |       | 01         | 85    |
 | 02         | 6        |       | 02         | 64    |
 | 03         | 1        |       | 03         | 88    |
*-------------+----------*/      *------------+-------*/

Input fields:

Output data:

/*------------+----------+-------*
 | student_id | absences | score |
 +------------+----------+-------+
 | jane       | 0        | 85    |
 | sam        | 6        | 64    |
 | pat        | 1        | 88    |
 *------------+----------*-------*/

Last updated

Was this helpful?