# Create and edit workflows

## Create a new workflow

To create a workflow, navigate to a dataset that you're interested in working with and press the **Analyze data in a workflow** button. If you do not have [data access](https://docs.redivis.com/data-access/access-levels#access-levels) to this dataset you may need to apply for access first.

You can also create a workflow from the "Workflows" tab of your [workspace](https://docs.redivis.com/reference/your-account/workspace), or from the administrator panel of an organization (in this latter case, the workflow will be "owned" by the organization and its administrators).&#x20;

Once you've created your workflow, you'll be able to add any dataset or workflow that you have access to as a [data source](https://docs.redivis.com/reference/workflows/data-sources).

## Build a workflow

The main way to build your workflow is to add and edit nodes. You will start by adding data to your workflow, and then create a series of additional nodes that reshape and analyze the data.

#### Add data to a workflow

You can click the **Add data** button in the top left corner of the workflow toolbar to select a dataset or another workflow you want to add to this workflow. This will add a copy of the selected [data source](https://docs.redivis.com/reference/workflows/data-sources) to the workflow and allow you to reference its tables.&#x20;

Each data source can only be added to a workflow one time. By default, all datasets are added at their current [version](https://docs.redivis.com/reference/datasets/versions) but you can right click on the dataset in this modal to select a different version to add.&#x20;

#### Reshape and analyze data

All data cutting, reshaping, and analysis on Redivis happens in either a [transform](https://docs.redivis.com/reference/workflows/transforms) or a [notebook](https://docs.redivis.com/reference/workflows/notebooks). These nodes must be attached to a source table, so can only be created after you've added a data source.

To create a transform or notebook, click on a table and select either the transform or notebook button that appears beneath it. If the table already has a downstream node you can press the plus icon beneath it instead.

{% hint style="info" %}

#### Transforms vs. notebooks?

There are two mechanisms for working with data in workflows: [transforms](https://docs.redivis.com/reference/workflows/transforms) and [notebooks](https://docs.redivis.com/reference/workflows/notebooks). Understanding when to use each tool is key to taking full advantage of the capabilities of Redivis, particularly when working with big datasets.

Transforms are better for:

* Reshaping + combining tabular and geospatial data
* Working with large tables, especially at the many GB to TB scale
* Preference for a no-code interface, *or* preference for programming in SQL
* Declarative, easily documented data operations

Notebooks are better for:

* Interactive exploration of any data type, including unstructured data files
* Working with smaller tables (though working with bigger data is possible)
* Preference for Python, R, Stata, or SAS
* Interactive visualizations and figure generation
  {% endhint %}

#### Copy and paste nodes

You can right click on any transform or notebook in the workflow tree to copy it. Once you've copied a node, you can right click on any table to paste the copied transform or notebook.

#### Insert nodes

If you would like to place a copied transform or notebook between other nodes, you can click on a transform or notebook and select **Insert transform**.

If you have a transform copied to the clipboard you can insert it between other nodes by right clicking on a transform or notebook and selecting **Paste copied transform above**. This will insert both the transform and its output table into the branch of the workflow you've selected.

#### Split and combine transforms

All transforms can be split at the step level into two different transforms by clicking **Split** in any step's menu. Additionally, two transforms can be combined into one by right clicking on a table to **Remove** it.&#x20;

You might want to split a transform above a tricky step to see what the output table would look like at that point in the process. This can be a key tool in troubleshooting any issues and understanding what might be going wrong.

After splitting a transform to check an output table, the next logical step might be to combine these two transforms back into one again. Or perhaps you have a string of transforms which you no longer need the output tables for and want to reduce the size of your workflow.

#### Shift nodes

To shift a node, select the node and click the arrow that appears next to most nodes when selected. Shifting nodes is purely an organizational tool and it has no effect on the data produced in the workflow.

#### Delete nodes

To delete a node, right click on the node and select **Delete**. Tables cannot be deleted directly, but are rather deleted when their parent node is deleted.

When deleting a **transform or notebook:**

* The transform or notebook *and* its output table will be deleted.
* If the workflow tree has additional nodes downstream, the transform or notebook and its output table will be 'spliced' out, i.e. the upstream node nearest the deleted transform will be connected to the downstream node nearest to the deleted output table.

When deleting a **data source**:

* The data source ***and all directly downstream nodes*** will be deleted. If additional branches are joined into the branch downstream of the deleted dataset, those branches will be retained up to but not including the transform located in the deleted branch.

Since you can't undo a deletion, you'll receive a warning message before proceeding.

## Node states

As you build out a workflow, node colors and symbols will change on the tree to help you keep track of your work progress.&#x20;

Detailed information about each of these states can be found in the documentation for each node, though some common states are outlined here.

<figure><img src="https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FnyWdSYnYtpaGQWCNZvQg%2FScreenshot%202024-12-03%20at%2012.04.46%E2%80%AFPM_out.png?alt=media&#x26;token=a1c9fb35-b8e9-42d9-916a-ae2544315b9b" alt=""><figcaption></figcaption></figure>

#### Stale nodes

Stale nodes are indicated with a yellow background. If a node is stale, it means that its upstream content has changed since the node was last run, and likely that the node should be re-run to reflect these upstream changes.

#### Edited nodes

If a node has been edited since when it was last run, it will be indicated with hashed vertical lines.&#x20;

## Run all

Frequently you will make changes that affect many downstream nodes.&#x20;

You can select the **Map** button on the left side of the workflow toolbar to begin a run of all stale nodes in the workflow. This will execute all transform and notebook nodes in a logical sequence to update the workflow completely.

## Reproducibility and change management

Workflow edits are continuously saved as you work, and any analyses will continue to run in the background if you close your browser window. You can always navigate back to this workflow later from the "Workflows" tab of your [workspace](https://docs.redivis.com/reference/your-account/workspace).

#### Version control

Every time a transform or notebook is run, a snapshot of the code in that node is permanently saved. On any transform or notebook, you will see a "History" button that will bring up all of the previous executions of that node, with the ability to view its historic contents and revert to a previous version of the code. This historic code will also be associated with the corresponding [log entry](https://docs.redivis.com/your-account/workspace#logs) in your workspace.

While the tables within a workflow should be considered "live" in that their data can regularly change as upstream nodes are modified, the ability to permanently persist code (alongside the built in [version-control](https://docs.redivis.com/reference/datasets/versions) for datasets) ensures that any historic output can be reproduced by simply re-running the historic code that produced a given output.

## Forking workflows

You can easily reference a workflow for other analyses. Click the **Fork** button in the toolbar to get started.

* **Add to another workflow**
  * Select this option to choose a workflow you'd like to add this workflow to, as a [data source](https://docs.redivis.com/reference/data-sources#workflows-as-a-data-source). This will be a linked copy that will update as the original workflow is updated. This can be a very powerful tool in building out complex workflows that all reference the same source analysis or cohort.
* **Clone this workflow**
  * This will create a duplicate copy of the workflow, with a link back to the original workflow encoded in its provenance information.
