Step: Aggregate
Overview
The Aggregate step collapses rows that are identical across a set of variables, optionally creating new aggregate variables in the process.
Example starting data:
Example output data
Collapsing on variables test
and date
, and creating new variable average_score
to aggregate data from the score
variable.
Step structure
There is one collapse block where we will define how the data will be reshaped. On execution:
Your data will be cut to only include the variables chosen in this block.
Duplicates records across the collapsed variables will be dropped.
There can be one or more aggregation blocks where we can capture aggregate information in a newly created variable.
You can successfully collapse your table without creating any new variables in aggregation blocks.
Each aggregation block in the step represents one new variable in your output data.
Aggregation blocks how you can capture information about records dropped in your collapse block.
Field definitions
Collapse block:
Field | Description |
---|---|
Variables to collapse on | All variables you want to include in your output. |
Aggregation block(s):
Field | Description |
---|---|
Name | The name of the new variable being created. |
Aggregation method | How the new variable will be summarized (e.g. |
[Method fields] | After you select your aggregation method, you will be prompted to input the information your chosen method needs to execute. You can see more specifics in the Variable creation methods section. |
Examples
Example 1: Basic collapse
We have test score data recorded per test, student, and date. However we want to know the average score on each test overall.
Starting data:
Input fields:
Variables to collapse on: Of the variables in our table, the only ones we want in our final output are
test
anddate
, so we select these here. We leave outstudent
because that information doesn't matter to us anymore, and we leave outscore
because we are creating a new variable to replace it with aggregated information.Name: We give our new variable a descriptive name
average_score
.Aggregation method: We want to average all values in the
grades
variable per test, so we chooseAverage
.Variable to aggregate: Here is where we choose
score
as the variable containing the data we want toAverage
. If we had chosen a different Aggregation method, we might have different input fields here to answer.
Execution:
All variables are removed that aren't collapsed on or used in aggregation.
The Average
is created of the score
where test
and date
(our collapsed on variables) have the exact same values.
The score
variable is removed, since it was not collapsed on.
Then all exact duplicate records are dropped, to create the output.
Output data:
Example 2: Multiple aggregation variables
We can build on our previous example, but say we want to know additionally how many students took each test.
Starting data:
Input fields:
2nd aggregation block
Name: We want to give the new variable a descriptive name unique from our other block (and other variables in our table), in this case
test_count
.Aggregation method: We want to count all values in the
test
variable per date, so we chooseCount
.Variable to count: We choose
test
as the variable containing the data we want toCount
. Since none of our variables have null entries, we could choose any variable here and get the same result. If we did have nulls, they would not be included in the Count. Conceptually we do not want to only include distinct values in our count so we leave that off
Output data:
Example 3: Drop duplicates
This step can be used to drop duplicated records even in cases where no aggregation happens.
Let's say we have data that we know had duplicate records that we don't need.
Starting data:
Input fields:
Variables to collapse on: To drop all records that are an exact duplicate across all variables, we just need the collapse block with no aggregation blocks. We need to select all variables here and can do so by typing all of them out, or inputting *
.
Output data:
Last updated