Workflow concepts

Overview

You can use workflows on Redivis to analyze any type of data, at any scale. Workflows allow you to organize your work into discrete steps, where you can easily validate your results and develop well-documented, reproducible analysis artifacts.

Creating a workflow

To create a workflow, navigate to a dataset that you're interested in working with and press the Analyze data in a workflow button. If you do not have data access to this dataset you may need to Apply for access first.

You can also create a workflow from the "Workflows" tab of your workspace. Once you've created your workflow, you'll be able to add any dataset that you have access to.

Workflow tree layout and operation

In a Redivis workflow steps you've taken are represented visually on the left side of the screen in a tree layout where you can build nodes and easily see their relationships. Each node type can take different actions. The workflow tree automatically expands and creates a layout of all the nodes in your workflow, keeping everything organized as it grows.

Node selection

Clicking on a node will allow you to see its information and/or execute changes. Your selected node will have a purple border in the tree and the right panel will update to show that node's contents. You can expand or collapse this right panel by dragging the center pane side to side.

Node connections

Most nodes in a workflow will be connected to others, and these connections are drawn as lines on the workflow tree. Thicker lines usually indicate a source node or a harder connection, while thin lines will show a join or more casual linkage.

When you click on a node to select it, the lines connecting upstream and downstream nodes will turn purple. For larger workflows this can make it easier to trace where the data in your selected node came from, or see what its eventual outputs are.

Nodes that have a direct source will have a large solid line connecting them.

Node types

Data source nodes represent datasets or workflows that have been added to your workflow that represent a copy of a data source elsewhere on Redivis.

Table nodes are either tables within a dataset, or the resulting output table of an upstream computation.

Transform nodes are queries that are used to reshape data by creating new tables.

Notebook nodes are code blocks (and their outputs) which are used to analyze data.

Building a workflow

The main way to build your workflow is to add and edit nodes. You will start by adding data to your workflow, then add ways to cut, reshape, and analyze it.

Add data to a workflow

You can click the Add data button in the top left corner of the workflow toolbar to select a dataset or another workflow you want to add to this workflow. This will add a copy of the selected data source to the workflow and allow you to reference its tables.

Each data source can only be added to a workflow one time. You can add multiple data sources in bulk from this modal. By default all datasets are added at the current version but you can right click on the dataset in this modal to select a different version to add.

Add data editing nodes

All data cutting, reshaping, and analysis on Redivis happens in either a transform or a notebook node. These nodes must be attached to a source table, so can only be created after you've added a dataset.

To create a work node, click on a table and select either the transform or notebook button that appears beneath it. If the table already has a downstream node you can press the plus icon beneath it instead.

Transforms vs. notebooks?

There are two mechanisms for working with data in workflows: transforms and notebooks. Understanding when to use each tool is key to taking full advantage of the capabilities of Redivis, particularly when working with big datasets.

Transforms are better for:

Reshaping + combining tabular and geospatial data
Working with large tables, especially at the many GB to TB scale
Preference for a no-code interface, or preference for programming in SQL
Declarative, easily documented data operations

Notebooks are better for:

Interactive exploration of any data type, including unstructured data files
Working with smaller tables (though working with bigger data is possible)
Preference for Python, R, Stata, or SAS
Interactive visualizations and figure generation

Copy and paste nodes

You can right click on any transform or notebook node in the workflow tree to copy it. Right click on any table node to paste the copied transform or notebook. The table you pasted it onto will become this node's source table. You may need to update the node it's new location's information for it to work properly.

Insert nodes

If you would like to place a node between other nodes you can click on a transform or notebook and select Insert transform.

If you have a transform copied to the clipboard you can insert it between other nodes by right clicking on a transform or notebook and selecting Paste copied transform above. This will insert both the transform and its output table into the branch of the workflow you've selected.

Split and combine transforms

All transforms can be split at the step level into two different transforms by clicking Split in any step's menu. Additionally, two transforms can be combined into one by right clicking on a table to Remove it.

You might want to split a transform above a tricky step to see what the output table would look like at that point in the process. This can be a key tool in troubleshooting any issues and understanding what might be going wrong.

After splitting a transform to check an output table, the next logical step might be to combine these two transforms back into one again. Or perhaps you have a string of transforms which you no longer need the output tables for and want to reduce the size of your workflow.

Delete nodes

Only nodes that you have intentionally created can be deleted or removed from the workflow. Output tables are created by running code and can't be removed.

To delete a node, right click on the node and select Delete.

When deleting a transform or notebook:

The transform or notebook and its output table will be deleted.
If the workflow tree has additional nodes downstream, the transform or notebook and its output table will be 'spliced' out, i.e. the upstream node nearest the deleted transform will be connected to the downstream node nearest to the deleted output table. This deletion will cause the next downstream transform to receive new input variables from the node that's directly upstream which will probably cause it to become stale and possibly invalid since variables it was previously referencing might not exist any longer.

When deleting a dataset:

The dataset and all downstream nodes will be deleted. If additional branches are joined into the branch downstream of the deleted dataset, those branches will be retained up to but not including the transform located in the deleted branch.

Since you can't undo a deletion, you'll receive a warning message before proceeding.

As you make changes in a workflow you will change the status of different nodes connected to it. These changes in status are shown in the left panel of the workflow to help you keep track of any changes.

Workflow-level actions

Run all

You can select the Map button on the left side of the workflow toolbar to begin a run of all stale nodes in the workflow. This will execute all transform and notebook nodes in a logical sequence to update the workflow completely.

Shift nodes

To shift a node, select the node and click the arrow that appears next to most nodes when selected. Shifting nodes is purely an organizational tool and it has no effect on the data produced in the workflow.

Navigate nodes

Along with clicking a node to select it, all nodes have a Source and Output navigation button on the right side of the workflow toolbar. You can click this button to jump directly to the immediate upstream or downstream node.

Fork the workflow

You have a couple options for how this workflow can be reused in other analyses. Click the Fork button in the toolbar to get started.

Add to another workflow
- Select this option to choose a workflow you'd like to add this one to. This will be a linked copy that will instantly update as the original workflow is updated. This can be a very powerful tool in building out complex workflows that all reference the same source analysis or cohort.
Clone this workflow
- This will create a duplicate copy of the workflow will have a link back to the original workflow on the overview page. It can be helpful to work from an exact duplicate of a workflow if you're a viewing a colleague's workflow and would like to test a similar data manipulation, or if you'll rely on similar data manipulations in a new research effort.

Saving

Workflows are continuously saved as you work, and any analyses will continue to run in the background if you close your browser window. You can always navigate back to this workflow later from the "Workflows" tab of your workspace.

A complete version history is available for all transforms and notebooks in your workflow, allowing you to access historic code and revert your work back to a previous point in time.

Workflow states

As you work in a workflow, nodes colors and symbols will change on the tree view to help you keep track of your work progress. More detailed information about each of these states can be found on each node page.

Stale and edited

These are the most important workflow-wide concepts to be aware of. Workflows are designed to be iterative and allow you to see outputs and go back to make changes and rerun for new results.

In order to keep track of any edits, transforms that unrun edits will be shown differently on the map than other transforms, with grey diagonal marks.

Any time data is changed, nodes that referenced that data will become stale (yellow background) until they are rerun on the new inputs.

Collaboration

Workflows are made to be a space for collaborative analysis. You can easily share access to your workflow with colleagues.

Comments

Any node in a workflow can be annotated with a comment by any workflow editor. Comments are intended to be a space for conversation grounded in a specific area of a workflow. They can be replied to in threads by multiple collaborators and resolved when the conversation is complete.

Simultaneous editors

Multiple users with edit access can be working on a workflow at the same time. When this is the case you will see their picture in the top menu bar alongside your own and a colored dot on the workflow tree to the right of the node they currently have selected. When a notebook is started you will see any collaborators code edits in real time.

Last updated 6 months ago

Was this helpful?