Workflow concepts
Last updated
Last updated
You can use workflows on Redivis to analyze any type of data, at any scale. Workflows allow you to organize your work into discrete steps, where you can easily validate your results and develop well-documented, reproducible analysis artifacts.
To create a workflow, navigate to a dataset that you're interested in working with and press the Analyze data in a workflow button. If you do not have data access to this dataset you may need to Apply for access first.
You can also create a workflow from the "Workflows" tab of your workspace. Once you've created your workflow, you'll be able to add any dataset that you have access to.
In a Redivis workflow steps you've taken are represented visually on the left side of the screen in a tree layout where you can build nodes and easily see their relationships. Each node type can take different actions. The workflow tree automatically expands and creates a layout of all the nodes in your workflow, keeping everything organized as it grows.
Clicking on a node will allow you to see its information and/or execute changes. Your selected node will have a purple border in the tree and the right panel will update to show that node's contents. You can expand or collapse this right panel by dragging the center pane side to side.
Most nodes in a workflow will be connected to others, and these connections are drawn as lines on the workflow tree. Thicker lines usually indicate a source node or a harder connection, while thin lines will show a join or more casual linkage.
When you click on a node to select it, the lines connecting upstream and downstream nodes will turn purple. For larger workflows this can make it easier to trace where the data in your selected node came from, or see what its eventual outputs are.
Nodes that have a direct source will have a large solid line connecting them.
Data source nodes represent datasets or workflows that have been added to your workflow that represent a copy of a data source elsewhere on Redivis.
Table nodes are either tables within a dataset, or the resulting output table of an upstream computation.
Transform nodes are queries that are used to reshape data by creating new tables.
Notebook nodes are code blocks (and their outputs) which are used to analyze data.
The main way to build your workflow is to add and edit nodes. You will start by adding data to your workflow, then add ways to cut, reshape, and analyze it.
You can click the Add data button in the top left corner of the workflow toolbar to select a dataset or another workflow you want to add to this workflow. This will add a copy of the selected data source to the workflow and allow you to reference its tables.
Each data source can only be added to a workflow one time. You can add multiple data sources in bulk from this modal. By default all datasets are added at the current version but you can right click on the dataset in this modal to select a different version to add.
Add data editing nodes
All data cutting, reshaping, and analysis on Redivis happens in either a transform or a notebook node. These nodes must be attached to a source table, so can only be created after you've added a dataset.
To create a work node, click on a table and select either the transform or notebook button that appears beneath it. If the table already has a downstream node you can press the plus icon beneath it instead.
There are two mechanisms for working with data in workflows: transforms and notebooks. Understanding when to use each tool is key to taking full advantage of the capabilities of Redivis, particularly when working with big datasets.
Transforms are better for:
Reshaping + combining tabular and geospatial data
Working with large tables, especially at the many GB to TB scale
Preference for a no-code interface, or preference for programming in SQL
Declarative, easily documented data operations
Notebooks are better for:
Interactive exploration of any data type, including unstructured data files
Working with smaller tables (though working with bigger data is possible)
Preference for Python, R, Stata, or SAS
Interactive visualizations and figure generation
You can right click on any transform or notebook node in the workflow tree to copy it. Right click on any table node to paste the copied transform or notebook. The table you pasted it onto will become this node's source table. You may need to update the node it's new location's information for it to work properly.
If you would like to place a node between other nodes you can click on a transform or notebook and select Insert transform.
If you have a transform copied to the clipboard you can insert it between other nodes by right clicking on a transform or notebook and selecting Paste copied transform above. This will insert both the transform and its output table into the branch of the workflow you've selected.
All transforms can be split at the step level into two different transforms by clicking Split in any step's menu. Additionally, two transforms can be combined into one by right clicking on a table to Remove it.
You might want to split a transform above a tricky step to see what the output table would look like at that point in the process. This can be a key tool in troubleshooting any issues and understanding what might be going wrong.
After splitting a transform to check an output table, the next logical step might be to combine these two transforms back into one again. Or perhaps you have a string of transforms which you no longer need the output tables for and want to reduce the size of your workflow.
Only nodes that you have intentionally created can be deleted or removed from the workflow. Output tables are created by running code and can't be removed.
To delete a node, right click on the node and select Delete.
When deleting a transform or notebook:
The transform or notebook and its output table will be deleted.
If the workflow tree has additional nodes downstream, the transform or notebook and its output table will be 'spliced' out, i.e. the upstream node nearest the deleted transform will be connected to the downstream node nearest to the deleted output table. This deletion will cause the next downstream transform to receive new input variables from the node that's directly upstream which will probably cause it to become stale and possibly invalid since variables it was previously referencing might not exist any longer.
When deleting a dataset:
The dataset and all downstream nodes will be deleted. If additional branches are joined into the branch downstream of the deleted dataset, those branches will be retained up to but not including the transform located in the deleted branch.
Since you can't undo a deletion, you'll receive a warning message before proceeding.
As you make changes in a workflow you will change the status of different nodes connected to it. These changes in status are shown in the left panel of the workflow to help you keep track of any changes.
To view and update additional information about your workflow, click its name in the middle of the black menu bar or click on the white space in the background of the workflow tree. This will bring up the workflow overview where you can edit this workflow.
Rename
Rename your workflow by clicking the header at the top of the workflow information view.
Abstract
You can add a short description of your workflow here that will be visible in workflow previews across Redivis.
Methodology
Document the details of your research aim or data manipulation strategy.
Visibility / Access
You can give other users access to view or edit your workflow, or transfer ownership to another user in the Sharing section. You can also set this workflow's visibility and discoverability. Anyone viewing your workflow will need to have gained data access to any restricted datasets to view the relevant node contents.
Study
You can add your workflow to a study in order to facilitate collaboration with others. For certain restricted datasets, your workflow will need to be part of an approved study in order to run queries.
Tags
You can add up to 25 tags to your workflow, which will help researchers discover and understand it.
Usage
You can see the data of last workflow activity, how often the workflow was forked or viewed (if it is a public workflow).
Data sources / Transforms / Notebooks
The dataset overview has tabs that list all the nodes in the workflow. You can click on any node in one of these lists to navigate directly to it. These lists can be filtered by their relative location in the workflow, variables they contain, and more metrics.
You can select the Map button on the left side of the workflow toolbar to begin a run of all stale nodes in the workflow. This will execute all transform and notebook nodes in a logical sequence to update the workflow completely.
To shift a node, select the node and click the arrow that appears next to most nodes when selected. Shifting nodes is purely an organizational tool and it has no effect on the data produced in the workflow.
Along with clicking a node to select it, all nodes have a Source and Output navigation button on the right side of the workflow toolbar. You can click this button to jump directly to the immediate upstream or downstream node.
You have a couple options for how this workflow can be reused in other analyses. Click the Fork button in the toolbar to get started.
Add to another workflow
Select this option to choose a workflow you'd like to add this one to. This will be a linked copy that will instantly update as the original workflow is updated. This can be a very powerful tool in building out complex workflows that all reference the same source analysis or cohort.
Clone this workflow
This will create a duplicate copy of the workflow will have a link back to the original workflow on the overview page. It can be helpful to work from an exact duplicate of a workflow if you're a viewing a colleague's workflow and would like to test a similar data manipulation, or if you'll rely on similar data manipulations in a new research effort.
Workflows are continuously saved as you work, and any analyses will continue to run in the background if you close your browser window. You can always navigate back to this workflow later from the "Workflows" tab of your workspace.
A complete version history is available for all transforms and notebooks in your workflow, allowing you to access historic code and revert your work back to a previous point in time.
Workflows are made to be a space for collaborative analysis. You can easily share access to your workflow with colleagues.
Any node in a workflow can be annotated with a comment by any workflow editor. Comments are intended to be a space for conversation grounded in a specific area of a workflow. They can be replied to in threads by multiple collaborators and resolved when the conversation is complete.
Multiple users with edit access can be working on a workflow at the same time. When this is the case you will see their picture in the top menu bar alongside your own and a colored dot on the workflow tree to the right of the node they currently have selected. When a notebook is started you will see any collaborators code edits in real time.
As you work in a workflow, nodes colors and symbols will change on the tree view to help you keep track of your work progress. More detailed information about each of these states can be found on each node page.
These are the most important workflow-wide concepts to be aware of. Workflows are designed to be iterative and allow you to see outputs and go back to make changes and rerun for new results.
In order to keep track of any edits, transforms that unrun edits will be shown differently on the map than other transforms, with grey diagonal marks.
Any time data is changed, nodes that referenced that data will become stale (yellow background) until they are rerun on the new inputs.