Step: Pivot
Overview
The Pivot step rotates data values into new variables using aggregation.
Example starting data:
Example output data:
Pivot on the test
variable.
Step structure
There is one pivot block where we will define the pivot variable and values.
There are one or more aggregation blocks where we will define how the table will reshape and what data populates the newly created variables.
There must always be at least one aggregate block.
The total number of new variables in your output table will be the number of Pivot values defined * The number of aggregation blocks.
Input field definitions
Pivot block:
Pivot variable
The variable containing the values that will become new variables.
Variables to collapse on
All variables you want to include in your output, except for variables defined elsewhere in the pivot operation (pivot variable
, variable to aggregate on
).
Pivot value
A value from your pivot variable
which will become a new variable in your output table. You can add multiple values. You do not need to include every value of this variable.
New variable name
An option to rename any of the new variables created in the pivot value
field. Leaving this field blank will persist the value name into the variable name.
Aggregation block:
Alias
A value that will be prefixed to your new variable names
. If you are aggregating one variable this field is optional, but if you are aggregating multiple then you'll need to specify a value for each one.
Aggregation method
How the variable to aggregate
will be summarized (e.g. SUM
, COUNT
).
Variable to aggregate
Which variable's data will populate the new variables you are creating in the pivot value
field.
Examples
Example 1: Basic pivot
Let's say we've recorded data about tests given and student scores and we want to pivot in order to see information sorted by student.
Starting data:
Input fields:
Pivot variable: Since we want to create new variables based on values in the
student
variable, we select that as our Pivot variable.Variables to collapse on: We want to include all variables here that will be the final shape of our pivoted table that are not already defined as our
pivot variable
orvariable to aggregate
. We includetest
anddate
in this example.Pivot value: We need to choose one or more values from our selected pivot variable (
student
) to become new variables in our output table. This does not have to include every value in this variable, even though we use all of them in this example.New variable name: We don't want to rename any of these new variables, so we can leave this blank.
Alias (optional): Since we only have one variable to aggregate, we can leave this blank.
Aggregation method: We need to choose how our variable to aggregate (
score
) will be aggregated. In this example data we don't have duplicated values so what we choose here won't matter. So we chooseSUM
to validate the process.Variable to aggregate: This is the variable (
score
) that will be removed from this table, and its values will be redistributed to our newly created variables (jane
,pat
, andsam
) with the aggregation method we selected handling any duplicate values (in this example,SUM
ming them).
Output data:
Example 2: Pivot with collapse
Continuing this example, let's say we want to average scores from the two quizzes. We can collapse the table by not including the date
variable. Without that variable there would be two identical values in test
(quiz
) which will be collapsed into one.
Starting table:
Input fields:
In this case it does matter what we select for Aggregation method
, since the duplicate values for quiz will get aggregated. We select Average
here.
Pivoted output table:
Example 3: Multiple aggregation variables
We can add another variable aggregation block to gather more information about the variables being aggregated. Continuing this example, let's say we not only want to average the quiz scores, but count the number of averaged quizzes.
Starting table:
Input fields:
Note that the Alias
field is no longer optional since we have two aggregation blocks.
We make sure to select score for our Variable to count
since we specifically want to know how many scores were included in our average. Since we have some null scores, selecting a different variable here might give us a different result.
Pivoted output table:
The jane_scoreavg
variable contains data from the scoreavg
aggregation block's method of Average
for the values of student Jane
, while the jane_scorecount
variable contains data from the scorecount
aggregation block.
Last updated