Step: Join
Overview
A Join step will combine data from two tables based on a join condition so it can be queried together.
Example output data:
Inner join where student
= student
Step structure
There will be at least one join block where you will define a table to join and a join condition for that table.
When multiple blocks exist, the tables will be joined in sequence.
Field descriptions
Join table
The table containing the data you wish to combine with this transform's data. Once selected, this table's variable will be marked t1 (or t2 etc. if other joins exist).
Join type
The way that the tables will be combined. (More information below.)
Source variable
The variable in your source table that you want to join on. All records in this variable will be matched against all records in the Joined table variable.
Joined table variable
The variable in your selected Join table that you want to join on. All records in this variable will be matched against all records in the Source variable.
Join types
Inner join
If a row in either the source or join table doesn’t have a match, it will be dropped. If a row is matched multiple times, it will be multiplied in the output.
Left join
If a row in the source doesn’t have a match, all joined variables will be null for that row. If a row in the join table doesn’t have a match, it will be dropped. If a row is matched multiple times, it will be multiplied in the output.
Right join
If a row in the source doesn’t have a match, it will be dropped. If a row in the join table doesn’t have a match, all source variables will be null for that row. If a row is matched multiple times, it will be multiplied in the output. This is the same as a left join, but reversed.
Full join
If a row in either the source or join table doesn’t have a match, all source variables will be null for that row. If a row is matched multiple times, it will be multiplied in the output.
Cross join
Every row in the source table will be combined with every row in the joined table. This might be used to perform a join on a new variable that will be created downstream (such as in a geospatial join). You will almost always need to use a filter after this join for the query to successfully execute.
It is strongly recommended to use a Filter step with a cross join to avoid a massively expanded table.
Join structures
Depending on the structure of your data, joins might multiply in rows.
1-to-1 joins
If the variable you are joining on has no duplicate values in either the source table or the joined table, then your output table will always have the same or fewer rows than the sum of both table's rows added together.
1-many joins
If the variable you are joining on has duplicates in either table, any time there is is a duplicate row it will be matched for each duplicate.
Many-to-many joins
If the variable you are joining on has duplicates in both tables, any time there is is a duplicate row it will be matched for each duplicate.
Examples
Example 1: Simple join condition
Let's say our information about student absenses and test scores are in separate tables and we want to join them together.
Starting data:
Input fields:
Join table: The data we want to join is in
Table 1
so we select it here.Join type: Since we only care about students that have both absence and score information we choose
inner join
which will drop any rows without a match in both tables.Source variable / Joined table variable: We want to match on the variable
student
which is present in both tables. So we select our source table (t0)'sstudent
on the left, and our joined table (t1)'sstudent
variable on the right.Since this variable has the same name in both tables, they will be combined into one variable in the output table.
Output data:
Example 2: More specific join condition
Let's say instead of a table with aggregated absences we have a daily timeliness chart. We want to join the scores for the corresponding attendance information for both the student and date in question.
Starting data:
Input fields:
Join type: Since we want to keep our absence data whether a test was taken or not, we do a Left
join.
Join condition: We join on fields where BOTH the values in the student
and date
variables are a match by selecting both in our join condition.
Output data:
You can use a SQL query step to build more complex join conditions
Example 3: Complex join conditions with cross joins
In some cases you might need to do a more complex join, such as using an operator other than =
, or joining on a condition that uses a variable from your joined table that needs to be retyped or created first. This is common in geospatial joins but might come up in any situation.
To use a more complex join condition you can use a cross join, followed by filter. The cross join will make all possible joins between these two tables and then the filter will limit it to rows matching your condition. Effectively this will execute the same as if you had done your initial join with a more complex condition.
Let's say in the example below we want to do an inner join where T0 student_id
= T1 student_id
but in the source table student_id
is an integer type variable and in Table 1 it is a string. We want to retype the variable in Table 1 before we can join on it, but we can't retype it until we have joined in Table 1. So we will do a cross join, a retype, then a filter to narrow down to only records meeting our condition.
Starting data:
Input fields:
Output data:
Last updated