Step: Stack

Overview

The Stack step will add or remove rows from another table that align with your existing variable structure.

Example starting data:

Source (t0)             table1 (t1)
/*---------+--------*   /*---------+--------+
 | student | score  |    | student | score  |
 +---------+--------+    +---------+--------+
 | jane    | 83     |    | tom     | 83     |
 | kim     | 35     |    | sherri  | 92     |
 | sam     | 74     |    | pat     | 48     |
 | zay     | 62     |    | jade    | 87     |
 *---------+--------*/   *---------+--------*/

Example output data:

Union rows from Table B.

/*---------+--------*
 | student | score  |
 +---------+--------+
 | jane    | 83     |
 | kim     | 35     |
 | sam     | 74     |
 | pat     | 62     |
 | tom     | 83     |
 | sherri  | 92     |
 | pat     | 48     |
 | jade    | 87     |
 *---------+--------*/

Step structure

Field descriptions

FieldDescription

Stacked table

The table containing the rows you wish to add.

Stack type

How the rows will be added. (More information below.)

Variable alignment

How you will define the variable structure between the source and the Stacked table. All matching variables: Will automatically match variables and only include variables with a matching name, the rest will be dropped.

All variables: Will automatically match variables and keep all variables in the output. Values will be marked null where there is no match for an existing variable Manually match variables: Will require you to align variables between two tables.

Retype variables

Variables will be retyped in order to be aligned.

Only keep distinct rows

If any added rows are an exact duplicate of an existing row it will be dropped from the output.

Create variable for table name (Union only)

In order to record in the data where new rows came from you can opt to create a new variable with table information.

Stack types

The most common stack type is a Union, which can be particularly useful when combining data which has been broken up into multiple tables with the same structure.

Union

Appends all of the stacked table’s rows to the source table.

Except

Keep only those distinct rows in the source that match rows in the stacked table.

Intersect

Keep only those distinct rows that are in the source and stacked table.

Examples

Example 1: Basic union

Let's say I have data broken out into two tables with the same structure. I want to add all rows together as they are.

Starting data:

Source (t0)             table1 (t1)
/*---------+--------*   /*---------+--------+
 | student | score  |    | student | score  |
 +---------+--------+    +---------+--------+
 | jane    | 83     |    | tom     | 83     |
 | kim     | 35     |    | sherri  | 92     |
 | sam     | 74     |    | pat     | 48     |
 | zay     | 62     |    | jade    | 87     |
 *---------+--------*/   *---------+--------*/

Input fields:

  • Stacked table: My data is in table1 so we select that here

  • Stack type: We want to add all rows so we select Union here.

  • Variable alignment: Somehow our variable names shifted between tables so we can't automatically align tables. We choose Manually match variables here and then fill in the variable names under Source variable and Stacked table variable.

    • Where the variable names are different, the output will keep the name of the Source variable when it exists.

Output data:

/*---------+--------*
 | student | score  |
 +---------+--------+
 | jane    | 83     |
 | kim     | 35     |
 | sam     | 74     |
 | pat     | 62     |
 | tom     | 83     |
 | sherri  | 92     |
 | pat     | 48     |
 | jade    | 87     |
 *---------+--------*/   

Example 2: Bulk union and variable misalignment

To continue the previous example, let's say we have a third table of data, and it has additional information that was not present in the first two about state. We want to keep this information and also keep track of which rows came from which table.

Starting data:

Source (t0)             table1 (t1)             table2 (t2)
/*---------+--------*   /*---------+--------*   /*---------+--------+------------*
 | student | score  |    | student | score  |    | student | score  | date       |
 +---------+--------+    +---------+--------+    +---------+--------+------------+
 | jane    | 83     |    | tom     | 83     |    | barb    | 46     | 2020-01-01 |
 | kim     | 35     |    | sherri  | 92     |    | mitch   | 79     | 2020-01-01 |
 | sam     | 74     |    | pat     | 48     |    | oleg    | 68     | 2020-01-01 |
 | zay     | 62     |    | jade    | 87     |    | maria   | 85     | 2020-01-01 |
 *---------+--------*/   *---------+--------*/   *---------+--------+------------*/

Input fields:

  • We create one block for the table1 union, and a second block for the table2 union

  • Variable alignment: Since we want to keep all variables no matter if they have matches, we select All variables.

  • Create variable for table name: We check this box since we want to keep track of which table each row came from.

    • Note that Source table name is automatically populated with the name of this transform's source table.

Output data:

/*---------+--------+------------+---------------*
 | student | score  | date       | source_table  |
 +---------+--------+------------+---------------+
 | jane    | 83     | null       | cohort        |
 | kim     | 35     | null       | cohort        |
 | sam     | 74     | null       | cohort        |
 | zay     | 62     | null       | cohort        |
 | tom     | 83     | null       | table1        |
 | sherri  | 92     | null       | table1        |
 | pat     | 48     | null       | table1        |
 | jade    | 87     | null       | table1        |
 | barb    | 46     | 2020-01-01 | table2        |
 | mitch   | 79     | 2020-01-01 | table2        |
 | oleg    | 68     | 2020-01-01 | table2        |
 | maria   | 85     | 2020-01-01 | table2        |
 *---------+--------+------------+---------------*/  

Example 3: Except

Let's say we have a table with information about students and their test scores, and we have identified some students that have dropped the course and that we no longer need in the original table. We have gathered those students in table1.

Starting data:

Source (t0)             table1 (t1)
/*--------+-------*      /*------+-------*
 | name   | score |       | name | score |
 +--------+-------+       +------+-------+
 | jane   | 83    |       | kim  | 35    |
 | kim    | 35    |       | pat  | 48    |
 | sam    | 74    |       *------+-------*/
 | pat    | 62    | 
 | tom    | 83    |
 | sherri | 92    |
 *--------+-------*/       

Input fields:

Output data:

/*--------+-------*
 | name   | score |
 +--------+-------+
 | jane   | 83    |
 | sam    | 74    |
 | tom    | 83    |
 | sherri | 92    |
 *--------+-------*/       

The rows that were in both tables are now removed.

Example 4: Intersect

Let's say we have two tables full of students, one from who took our fall semester class, and one from the spring semester class. We want to only keep student rows for people who took both classes, so those who are present in both tables.

Source (t0)             table1 (t1)
/*--------+-------*      /*------+-------*
 | name   | id    |       | name | id    |
 +--------+-------+       +------+-------+
 | jane   | 101   |       | kim  | 104   |
 | kim    | 104   |       | pat  | 108   |
 | sam    | 105   |       | mae  | 109   |       
 | pat    | 108   |       | zay  | 110   | 
 | tom    | 112   |       | jade | 111   |
 | sherri | 117   |       | tom  | 112   |
 *--------+-------*/      *------+-------*/      

Input fields

Output data:

/*------+-------*
 | name | id    |
 +------+-------+
 | kim  | 104   |
 | pat  | 108   |
 | tom  | 112   |
 *------+-------*/     

Only rows that were in both tables now remain.

Last updated