Transform concepts
Last updated
Was this helpful?
Last updated
Was this helpful?
A transform is a series of steps detailing a data transformation that are executed in sequential order when the transform is run. These steps and a final variable selection create an output table that can be inspected and further transformed.
When running the transform, steps will be executed in order, building off of what directly came before them.
Steps have additional options in the ⋮
menu to help in the building process:
Collapse / expand
Annotate
Disable / enable
Reorder (this might change the outcome)
Transforms are iterative and are designed to easily check your progress as you work. As long as a transform is valid, you can press the Run
button to generate your output table. Click on this output table node in the workflow tree to see your results once the run is complete.
The transform interface will prevent you from running a transform that is invalid. This will be displayed with an invalid icon on the step or the part of a step where the issue is. Usually when you have an invalid alert it means some crucial information is missing and you can follow the invalid icons to find the locations you'll need to update.
Every time you run the transform, a snapshot of the transform at that point in time is saved. You can click on History to view the log of these runs, and click on any snapshot to revert the transform to that point in time.
Your transform will then be in the Edited state and no changes will be made to the output table. Once you run it, this configuration will become a new snapshot at the top of the transform history list.
Common things to check might be total number of rows and variables in your output table, distinct and null counts of a key variable, or even running a query on the Query tab of the table to filter by a particular value.
If you find the table isn't what you want, you can go back to the transform to make changes and rerun it. Transforms were created to be iterative!
By default transforms are named with a number based on the order they were created. We suggest renaming your transforms with descriptive names so you and your collaborators can quickly find them later on. You can rename a transform by clicking on the title of the node, or on the Rename option in the transform ⋮ More
menu.
If you have not specifically renamed the connected output table, renaming a transform will also rename this output table.
All transform nodes have one source table. You can join in multiple tables, but the source table designation will affect how the joins execute. You can change the source table from the ⋮ More
menu in the transform. Note that if the new source table has different variables your transform might become invalid until you update it.
All transforms can be split at the step level into two different transforms by clicking Split in any step's menu. Additionally, two transforms can be combined into one by right clicking on a table to Remove it.
You might want to split a transform above a tricky step to see what the output table would look like at that point in the process. This can be a key tool in troubleshooting any issues and understanding what might be going wrong.
After splitting a transform to check an output table, the next logical step might be to combine these two transforms back into one again. Or perhaps you have a string of transforms which you no longer need the output tables for and want to reduce the size of your workflow.
As you go, you might find yourself wanting to take certain data cleaning actions multiple times or move them around in your process. You can right click on any transform in the workflow tree to see options for copying it or deleting it. Right click on any table node to paste the copied transform. You can also insert transforms above other transforms.
Steps and parts of steps can also be copied and pasted within the same transform or across multiple transforms.
All transforms generate SQL, which will executed the transform is run. To view the code for a particular step, click on View SQL in the step menu. You can also see the SQL for the entire transform from within the ⋮ More
menu of the transform.
You can convert this code into a SQL query step and run it to achieve the same outcome, or edit the code directly.
Redivis uses SQL as the basis for data transformation for many reasons. It is designed for working quickly and efficiently with large tables, and is easy to learn concepts and basic executions, and it is highly reproducible.
While Python, R, Stata, and SAS are commonly used in data analysis, they are not suited to working with large tables. Executing queries on millions of records can take hours and in many cases fail. We have provided a point and click interface for using transforms to make using a perhaps unfamiliar coding language easier.
Tables are data structures containing rows and variables
Variables represent fields in the table that all rows contain values for
Rows represent an individual observation in table with values in across all variables
A SQL query will always operate on data from one or more tables and output a single data table.
When you are using the interface or SQL code, it is best to start the process by figuring out what you would like your output table to look like. What are the variables? What do you want each row of the table to represent?
Once you figure out what you want your output to look like, you can work backwards and formulate the steps of your query to build the shape you'd like.
The majority of building a transform is choosing and completing your data transformation steps. Steps represent one specific action (e.g. , ) and are completed in the Redivis transform interface. Every step is a fundamentally a piece of SQL code that can be examined and could be reproduced by running in any environment with the same data.
The final task in constructing a transform is to select which variables you want to have in your output table. You will use the in the bottom pane of the transform to choose which to keep.
Output tables don't just exist to store data, you can use the variable statistics generated in the to make sure your transform did what you expected it to do.