Workflow concepts
Overview
Workflows are used to analyze any type of data on Redivis, at any scale. They allow you to organize your analysis into discrete steps, where you can easily validate your results and develop well-documented, reproducible analyses.
Workflows are owned by either a user or an organization, and can be shared with other users and organizations.

Creating a workflow
To create a workflow, navigate to a dataset that you're interested in working with and press the Analyze data in a workflow button. If you do not have data access to this dataset you may need to apply for access first.
You can also create a workflow from the "Workflows" tab of your workspace, or from the administrator panel of an organization (in this latter case, the workflow will be "owned" by the organization and its administrators).
Once you've created your workflow, you'll be able to add any dataset or workflow that you have access to as a data source.
The workflow page
The workflow page consists of a top title bar, a left panel, and a right panel.
The left panel displays the workflow tree, allowing you to visualize how data is moving through the workflow and its nodes.
The right panel shows the contents of the currently selected node in the tree. If no node is selected, this panel will display the workflow's documentation. You can click on the workflow title, or empty space in the workflow tree, to return to the workflow documentation at any time.
The title bar provides an entry point to common actions, broken into two sections: the left section contains actions that are global to the workflow, while the right section contains actions relevant to the currently selected node (e.g,. running a transform).
If no node is selected, information about the workflow will be populated in the right panel:
Overview
The workflow overview contains various metadata, provenance information, and narrative about the workflow.
Abstract
The abstract is limited to 256 characters and will show up in previews and search results for the dataset. This should be a concise, high-level summary of this dataset.
Provenance
View and update information about this workflow's creators and contributors, citation, and any related identifiers detected by Redivis automatically or added by a workflow editor. This is also where you can issue a DOI for this workflow.
Provenance
→ Creators
This section automatically defaults to displaying the owner of the project. Workflow editors can add or remove anyone from this list. Anyone included here will be added to the citation generated for this workflow.
Provenance
→ Contributor
This section automatically includes anyone who edited this workflow. Workflow editors can add or remove anyone from this list.
This section shows the automatically generated citation for this workflow in your chosen format. This can be copied or downloaded for use elsewhere.
Changes made to the "Creators" field will be reflected in this citation. Any DOI issued for this workflow will automatically be included in this citation.
This section automatically includes any datasets or workflows referenced by this project, including data sources, study collaboration, or what this workflow was forked from. Workflow editors can add any related identifiers from outside of Redivis through links or DOIs, including DMPs, papers referenced, and more.
You can launch a bibliography which displays the citation of this workflow and all of its related identifiers.
Methodology
Document the details of your research aim or data analysis strategy. You can also embed links or images.
Sharing / Sharing
You can give other users access to view or edit your workflow, or transfer ownership to another user in the Sharing section. You can also set this workflow's visibility and discoverability. Anyone viewing your workflow will need to have gained data access to any restricted datasets to view the relevant node contents.
Study
You can add your workflow to a study in order to facilitate collaboration with others. For certain restricted datasets, your workflow will need to be part of an approved study in order to run queries.
Tags
You can add up to 25 tags to your workflow, which will help researchers discover and understand it.
Usage
You can see the data of last workflow activity, how often the workflow was forked or viewed (if it is a public workflow).
Tags
In addition to documentation, you may add up to 25 tags to your dataset, which will help researchers discover and understand the dataset.
Other metadata
Additionally, information about the dataset's size and temporal range will be automatically computed from the metadata on its tables. Additional table documentation, as well as the variable metadata, will be indexed and surfaced as part of the dataset discovery process.
Data sources
A filterable list of all Data sources within the workflow. Clicking on an item will navigate to the corresponding data source in the workflow tree.
Tables
A filterable list of all Tables within the workflow. Clicking on an item will navigate to the corresponding table in the workflow tree.
Transforms
A filterable list of all Transforms within the workflow. Clicking on an item will navigate to the corresponding transform in the workflow tree.
Notebooks
A filterable list of all Notebooks within the workflow. Clicking on an item will navigate to the corresponding notebook in the workflow tree.
The workflow tree
The workflow "tree" is represented visually in the left pain of the workflow. This tree is made up of a collection of nodes, with each node having various inputs and outputs, such that the output (result) of one node can serve as the input of another.
Data in the tree flows from the top to bottom, and circular relationships are not allowed. Formally, this is known as a "Directed Acyclic Graph" (DAG).
Clicking on a node within the tree will display that node's contents within the right pane of the workflow, while highlighting the ancestors and descendants of that node on the tree.
You can right-click on any node for a list of other options, or if preferred, click on the node and then click the three-dot "More" menu at the top-right.
Workflow nodes
The workflow tree is made up of the following node types:

Data sources represent datasets or workflows that have been added to your workflow, and are the mechanism for bringing data into your workflow.
Tables are either tables associated with a data source, or the resulting output table of a transform or notebook.
Transforms are queries that are used to reshape and combine data, always creating a single table as an output.
Notebooks are flexible, interactive programming environments, which can optionally produce a table as an output.
Building a workflow
The main way to build your workflow is to add and edit nodes. You will start by adding data to your workflow, and then create a series of additional nodes that reshape and analyze the data.
Add data to a workflow
You can click the Add data button in the top left corner of the workflow toolbar to select a dataset or another workflow you want to add to this workflow. This will add a copy of the selected data source to the workflow and allow you to reference its tables.
Each data source can only be added to a workflow one time. By default, all datasets are added at their current version but you can right click on the dataset in this modal to select a different version to add.
Reshape and analyze data
All data cutting, reshaping, and analysis on Redivis happens in either a transform or a notebook. These nodes must be attached to a source table, so can only be created after you've added a data source.
To create a transform or notebook, click on a table and select either the transform or notebook button that appears beneath it. If the table already has a downstream node you can press the plus icon beneath it instead.
Transforms vs. notebooks?
There are two mechanisms for working with data in workflows: transforms and notebooks. Understanding when to use each tool is key to taking full advantage of the capabilities of Redivis, particularly when working with big datasets.
Transforms are better for:
Reshaping + combining tabular and geospatial data
Working with large tables, especially at the many GB to TB scale
Preference for a no-code interface, or preference for programming in SQL
Declarative, easily documented data operations
Notebooks are better for:
Interactive exploration of any data type, including unstructured data files
Working with smaller tables (though working with bigger data is possible)
Preference for Python, R, Stata, or SAS
Interactive visualizations and figure generation
Copy and paste nodes
You can right click on any transform or notebook in the workflow tree to copy it. Once you've copied a node, you can right click on any table to paste the copied transform or notebook.
Insert nodes
If you would like to place a copied transform or notebook between other nodes, you can click on a transform or notebook and select Insert transform.
If you have a transform copied to the clipboard you can insert it between other nodes by right clicking on a transform or notebook and selecting Paste copied transform above. This will insert both the transform and its output table into the branch of the workflow you've selected.
Split and combine transforms
All transforms can be split at the step level into two different transforms by clicking Split in any step's menu. Additionally, two transforms can be combined into one by right clicking on a table to Remove it.
You might want to split a transform above a tricky step to see what the output table would look like at that point in the process. This can be a key tool in troubleshooting any issues and understanding what might be going wrong.
After splitting a transform to check an output table, the next logical step might be to combine these two transforms back into one again. Or perhaps you have a string of transforms which you no longer need the output tables for and want to reduce the size of your workflow.
Delete nodes
To delete a node, right click on the node and select Delete. Tables cannot be deleted directly, but are rather deleted when their parent node is deleted.
When deleting a transform or notebook:
The transform or notebook and its output table will be deleted.
If the workflow tree has additional nodes downstream, the transform or notebook and its output table will be 'spliced' out, i.e. the upstream node nearest the deleted transform will be connected to the downstream node nearest to the deleted output table.
When deleting a data source:
The data source and all directly downstream nodes will be deleted. If additional branches are joined into the branch downstream of the deleted dataset, those branches will be retained up to but not including the transform located in the deleted branch.
Since you can't undo a deletion, you'll receive a warning message before proceeding.
Node states
As you build out a workflow, node colors and symbols will change on the tree to help you keep track of your work progress.
Detailed information about each of these states can be found in the documentation for each node, though some common states are outlined here.

Stale nodes
Stale nodes are indicated with a yellow background. If a node is stale, it means that its upstream content has changed since the node was last run, and likely that the node should be re-run to reflect these upstream changes.
Edited nodes
If node has been edited since when it was last run, it will be indicated with hashed vertical lines.
Tree-level actions
Run all
You can select the Map button on the left side of the workflow toolbar to begin a run of all stale nodes in the workflow. This will execute all transform and notebook nodes in a logical sequence to update the workflow completely.
Shift nodes
To shift a node, select the node and click the arrow that appears next to most nodes when selected. Shifting nodes is purely an organizational tool and it has no effect on the data produced in the workflow.
Navigate nodes
Along with clicking a node to select it, all nodes have a Source and Output navigation button on the right side of the workflow toolbar. You can click this button to jump directly to the immediate upstream or downstream node
Saving
Workflows are continuously saved as you work, and any analyses will continue to run in the background if you close your browser window. You can always navigate back to this workflow later from the "Workflows" tab of your workspace.
A complete version history is available for all transforms and notebooks in your workflow, allowing you to access historic code and revert your work back to a previous point in time.
Workflow ownership and sharing
All workflows are owned by either a user or an organization, and can then be shared with other users and organizations. When a workflow is owner or shared with an organization, all administrators of that organization will have corresponding access.
Workflows can also be associated with a study, which may be necessary if access to certain datasets in the workflow was granted to that study. In this case, you can specify a level of access to the workflow for other collaborators on the study.
Forking workflows
You have a couple options for how this workflow can be reused in other analyses. Click the Fork button in the toolbar to get started.
Add to another workflow
Select this option to choose a workflow you'd like to add this workflow to, as a data source. This will be a linked copy that will update as the original workflow is updated. This can be a very powerful tool in building out complex workflows that all reference the same source analysis or cohort.
Clone this workflow
This will create a duplicate copy of the workflow, with a link back to the original workflow encoded in its provenance information.
Collaboration
Workflows are made to be a space for collaborative analysis. You can easily share access to your workflow with colleagues.
Comments
Any node in a workflow can be annotated with a comment by workflow collaborators. Comments are intended to be a space for conversation grounded in a specific area of a workflow. They can be replied to in threads by multiple collaborators and resolved when the conversation is complete.
Simultaneous editors
Multiple users with edit access can be working on a workflow at the same time. When this is the case, you will see their picture in the top menu bar alongside your own and a colored dot on the workflow tree to the right of the node they currently have selected. When a notebook is started you will see any collaborators code edits in real time.
Workflow DOIs
Any workflow editor can issue a DOI (Digital Object Identifier) for a workflow. A DOI is a persistent identifier that can be used to reference this workflow in citations. DOIs are issued through DataCite and do not require any configuration with your own or your organization's DataCite account.
Open the Provenance section and click Issue DOI. Once created, you will be able to see the DOI and view the record on DataCite.

Draft status
When DOIs are issued they enter a "Draft" status where the identifier is assigned but it has not been permanently created. All DOIs issued for workflows will remain in this draft status for seven days to allow for removal of the DOI.
You can start referencing the DOI immediately while it is still in draft status since the final DOI will not change once it becomes permanent. After the seven day draft period the DOI will automatically become permanent if your workflow is set to be publicly visible.
Since DOIs are intended for public reference, they will not be issued for workflows that remain fully private.
Reproducibility and change management
Every time a transform or notebook is run in your workflow, a snapshot of the code in that node is permanently saved. On any transform or notebook, you will see a "History" button that will bring up all of the previous executions of that node, with the ability to view its historic contents and revert to a previous version of the code. This historic code will also be associated with the corresponding log entry in your workspace.
While the tables within a workflow should be considered "live" in that their data can regularly change as upstream nodes are modified, the ability to permanently persist code (alongside the built in version-control for datasets) ensures that any historic output can be reproduced by simply re-running the historic code that produced a given output.
Last updated
Was this helpful?