Core concepts

Overview

It might be useful to think of a project as a large folder. It contains datasets along with queries and their output tables. In a Redivis project, these entities are visually arranged to make it easy to see how your queries and tables are related to each other, and to make it easier to make changes that affect your whole project.

The left side of the screen is where you'll see all entities that currently exist in your project. By default this is laid out as a tree of connected nodes to better understand connections between tables. You can also switch this view to a list. If you created this project from a dataset you'll see a rectangle with the dataset name next to it to start.

Node types

Each shape, or node, on the project tree represents a different entity in your project.

Dataset nodes are direct copies of a dataset page.

Transform nodes are queries that don't contain any data themselves.

Table nodes are either dataset tables, or the resulting output of a transform that has been run.

If you ever get lost, you can click the search icon in the top of the left panel and input the name of a node to jump to it.

Datasets

A dataset node is a copy of a dataset in a project.

Dataset nodes display a list of the tables they contain. You can click on any table to view its contents, or click "Query" to build a transform on it.

Samples

Some large datasets have 1% samples which are useful for quickly testing querying strategies before running transforms against the full dataset.

If a 1% sample is available for a dataset, it will automatically be added to your project by default instead of the full sample. Samples are indicated by the dark circle icon to the top left of a dataset node in the left panel and in the list of the dataset's tables.

All sampled tables in the same dataset will be sampled on the same variable with the same group of values (so joining two tables in the same dataset with 1% samples will still result in a 1% sample).

​To switch to the full sample, click "Sample" button in the top right of the menu bar when you have a dataset selected.

Your downstream transforms and tables will become stale, since an upstream change has been made. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.

Versions

When a new version of a dataset is released by an administrator, the corresponding dataset node on your project minimap will become purple. To upgrade the dataset's version, click the "Version" button in the top right of the menu bar when you have a dataset selected.

You can select whichever version you want to use here, or view the full version history on the dataset page.

After updating, your downstream transforms and tables will become stale. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.

Tables

A dataset table is shown linked to the dataset when you begin working with it.

An output table is automatically created when you create a transform node. Running the transform generates the data in this output table.

All table nodes have one upstream parent. You can view the table's data and metadata similarly to other tables on Redivis. You can not edit or update the metadata here.

Many transforms can operate on a table, allowing you to create various branches within your project. To do this you can select the table or dataset and click the small plus icon that appears under the bottom right corner of the node.

Now you have two side by side transforms and tables. These will run independently of each other.

Sanity check output

After you run a transform, you can investigate the downstream output table to get feedback on the success and validity of your querying operation – both the filtering criteria you've applied and the new features you've created.

Understanding at the content of an output table allows you perform important sanity checks at each step of your research process, answering questions like:

  • Did my filtering criteria remove the rows I expected?

  • Do my new variables contain the information I expect?

  • Does the distribution of values in a given variable make sense?

  • Have I dropped unnecessary variables?

The best tools for sanity checking a table node are by inspecting the general table characteristics, checking the summary statistics of different variables, or looking at the cells.

Transforms

Transforms are at the core of every project, allowing for comprehensive data merges and transformation. Learn more about building transforms in the Transform documentation.

To copy a transform, right click the transform and select Copy transform.

This will copy the transform, including all parameters specified in the detail view, and allows you to insert it somewhere else in your project tree, to re-use querying logic. Note that tables cannot be copied alone; copying a transform node will copy the transform and its downstream table.

You can also insert a transform between two tables by right clicking on another transform.

Node layout

The project tree automatically creates a grid layout of all the nodes in your project, helping to keep it organized as your project grows.

Sometimes, you may wish to reorganize certain nodes in your project. To shift a dataset node, hover and click the arrow to the side of the node.

This will move the dataset to the left (or right), and reorganize your tree according to the new horizontal order of datasets at the top of your project tree. Note that shifting nodes is purely an organizational tool; it has no effect on the data produced in the project.

Node states

Empty

Display: White background

A table or transform node will be empty if it has never been run.

Executed

Display: Grey background

A table, transform, or table will be grey when it has previously been run and has not since been edited or had anything change upstream.

Invalid

Display: Black exclamation icon

A transform will be invalid when it is unable to be run. This might be because you haven't finished building the steps, or because something changed upstream which made its current configuration impossible to execute again.

Errored

Display: Red exclamation icon

A transform will be errored when you run them and the run can't be completed. This might be due to an incorrect input you've set that our validator can't catch. Or something might have gone wrong while executing and you'll just need to rerun it.

Edited

Display: Yellow background with diagonal hash lines

A transform will be edited when you revisit a successfully run transform and change a parameter. You can either Run this transform or Revert to its previously run state to resolve it. Editing a transform makes any downstream nodes stale.

Stale

Display: Yellow background

A transform or table will be stale when an upstream change has been made. For tables immediately downstream from an edited node, means that the data contents might no longer be the results of the previous transform.

You'll need to revert an upstream edited node

Running and queued

Display: Double arrows rotating

Transforms have this icon when the node is currently being run (if the icon is spinning) or it is queued to run after upstream nodes have finished running (icon isn't moving).

You can cancel queued and running on each individual node or by clicking the Run menu in the top bar and selecting Cancel all. If a node is currently running it might not be able to cancel, depending on what point in the process it's at.

Limited access

Display: All black, or partially black background

For all nodes, this means that you don't have full access the node. Click on these nodes and then the Incomplete access button to begin applying for access to the relevant datasets.

Sampled

Display: Black circle with 1% icon

For datasets this means that you are using a 1% sample of the data. When a dataset has a sample, it will automatically default to it when added to a project. You can change this to the full sample and back at any time in the dataset node.

Outdated version

Display: Purple background

For datasets this means that you are not using the latest version. This might mean that you have switched to using an older version intentionally, or it might mean that this dataset's administrator has released a new version that you can upgrade to.

Working in bulk

At any point you might realize that you need to change a parameter of a query that will affect man downstream tables. This will make these tables stale and you'll see their color turn to yellow on the map.

After finishing your updates you can run each transform individually to propagate changes or you can use the Run button in the top menu to run many nodes in sequence. This menu gives you the option to run all stale nodes, or all downstream or upstream nodes (from the node you have selected).

Deleting nodes

To delete a node, right click on a dataset or transform node and select Delete.

When deleting a transform, the transform and output table will be deleted; every transform must have an output table to record results of that transform . If the project tree has additional nodes downstream, the transform and output table will be 'spliced' out, i.e. the upstream node nearest the deleted transform will be connected to the downstream node nearest to the deleted output table. Note that this deletion will cause the next downstream transform to receive new input variables from the node that's directly upstream. (In the above example, deleting the selected transform will result in the 'Optum SES Inpatient Confinement' dataset being connected directly to the remaining transform, which will change the variables available to work with in that transform.)

When deleting a dataset or dataset table, the dataset and all downstream nodes will be deleted. If additional branches are joined into the branch downstream of the deleted dataset, those branches will be retained up to but not including the transform located in the deleted branch.

Since you can't undo a deletion, you'll receive a warning message before proceeding.

As you make changes in a project you will change the status of different nodes connected to it. These changes in status are shown on the minimap to help you keep track of your project.