It might be useful to think of a project as a large folder. It contains datasets, alongside transforms – which query those dataset tables – and their output tables, and any notebooks used to analyze your outputs. In a Redivis project, these entities are visually arranged to make it easy to see how your transforms, tables, and notebooks are related to each other, and to make it easier to make changes that affect your whole project.
The left half of the screen is where you'll see all entities that currently exist in your project. By default this is laid out as a tree of connected nodes to better understand connections between tables. You can also switch this view to a list. If you created this project from a dataset you'll see a rectangle with the dataset name next to it to start.
Each shape, or node, on the project tree represents a different entity in your project.
Transform nodes are queries that don't contain any data themselves which are used to shape data by creating new tables.
If you ever get lost, you can use the Search button in the left of the black menu bar and input the name of a node to jump to it.
You can right click on any node to take open its action menu.
Dataset nodes display a list of the tables they contain. You can click on any table to view its contents, or click "Query" to build a transform on it.
Some large datasets have 1% samples which are useful for quickly testing querying strategies before running transforms against the full dataset.
If a 1% sample is available for a dataset, it will automatically be added to your project by default instead of the full sample. Samples are indicated by the dark circle icon to the top left of a dataset node in the left panel and in the list of the dataset's tables.
All sampled tables in the same dataset will be sampled on the same variable with the same group of values (so joining two tables in the same dataset with 1% samples will still result in a 1% sample).
To switch to the full sample, click "Sample" button in the top right of the menu bar when you have a dataset selected.
Your downstream transforms and tables will become stale, since an upstream change has been made. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.
When a new version of a dataset is released by an administrator, the corresponding dataset node on your project minimap will become purple. To upgrade the dataset's version, click the "Version" button in the top right of the menu bar when you have a dataset selected.
You can view version diffs and select whichever version you want to use here.
After updating, your downstream transforms and tables will become stale. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.
A dataset table refers to a single table (a unique set of rows and colums) that was uploaded to the dataset by the owner. These tables are shown directly underneath the dataset when you create a transform or notebook from that table.
An output table is automatically created when you create a transform node. Running the transform generates the data in this output table.
All table nodes have one upstream parent. You can view the table's data and metadata similarly to other tables on Redivis. You cannot edit or update the metadata here.
You can create multiple transforms to operate on a single table, which allows you to create various branches within your project. To create a new transform, select the table or dataset and click the small + icon that appears under the bottom right corner of the node.
After you run a transform, you can investigate the downstream output table to get feedback on the success and validity of your querying operation – both the filtering criteria you've applied and the new features you've created.
Understanding at the content of an output table allows you perform important sanity checks at each step of your research process, answering questions like:
- Did my filtering criteria remove the rows I expected?
- Do my new variables contain the information I expect?
- Does the distribution of values in a given variable make sense?
- Have I dropped unnecessary variables?
To sanity check the contents of a table node, you can inspecting the general table characteristics, checking the summary statistics of different variables, looking at the table's cells, or create a notebook for more in-depth analysis.
Transforms are at the core of every project, allowing for comprehensive data merges and transformations. Learn more about building transforms in the Transform documentation.
Create a new transform by clicking the + button beneath any table, or the Transform button at the top right of a table's detail view. Transforms can only reference tables that are present in this project.
To copy a transform, right click the transform and select Copy transform. This will copy the transform, including all parameters specified in the detail view, and allows you to insert it somewhere else in your project tree, to re-use querying logic. Note that tables cannot be copied alone; copying a transform node will copy the transform and its downstream table.
You can also insert a transform between two tables by right-clicking on another transform.
Notebook nodes allow you to work with data in a Jupyter notebook interface, taking advantage of the open-source community and scientific computing toolkit available in Python, R, Stata, or SAS. Learn more about using notebooks in the Notebooks documentation.
Create a notebook by clicking the + button beneath any table. Notebooks can only reference tables that are present in this project.
To copy a notebook, right click and select Copy notebook. You can paste the copied notebook by right-clicking on the background of the project's tree view to the left, and selecting Paste copied notebook.
The project tree automatically creates a grid layout of all the nodes in your project, helping to keep it organized as your project grows.
Sometimes, you may wish to reorganize certain nodes in your project. To shift a dataset or notebook node, hover and click the arrow to the side of the node.
This will move the node to the right or left, and reorganize your tree according to the new horizontal order of datasets at the top of your project tree (or notebooks at the bottom of your project tree). Note that shifting nodes is purely an organizational tool; it has no effect on the data produced in the project.
Display: White background
A transform node will be white if it has never been run.
A notebook or table node will be white if it contains no data.
Display: Grey background
A transform will be grey when it has previously been run and has not since been edited or had anything change upstream.
A notebook or table node will be grey if it contains data, and no upstream transforms have been edited (if there was an upstream change, everything downstream would be stale)
Display: Black exclamation icon
A transform will be invalid when it is unable to be run. This might be because you haven't finished building the steps, or because something changed upstream which made its current configuration impossible to execute again.
Display: Red exclamation icon
A transform will be errored when you run them and the run can't be completed. This might be due to an incorrect input you've set that our validator can't catch. Or something might have gone wrong while executing and you'll just need to rerun it.
Display: Yellow background with diagonal hash lines
A transform will be edited when you revisit a successfully run transform and change a parameter. You can either Run this transform or Revert to its previously run state to resolve it. Editing a transform makes any downstream nodes stale.
Display: Yellow background
A transform, table, or notebook will be stale when an upstream change has been made. For tables and notebooks immediately downstream from an edited node, means that the data contents might no longer be the results of the previous transform.
You'll need to re-run any edited upstream transforms to propagate new data into downstream tables and nodes, or revert an upstream edited node to return to the previously executed state.
Display: Double arrows rotating
Transforms have this icon when the node is currently being run (if the icon is spinning) or it is queued to run after upstream nodes have finished running (icon isn't moving).
You can cancel queued and running on each individual transform or by clicking the Run menu in the top bar and selecting Cancel all. If a node is currently running it might not be able to cancel, depending on what point in the process it's at.
Display: All black background, or dashed borders
For all nodes, this means that you don't have full access the node. Click on these nodes and then the Incomplete access button to begin applying for access to the relevant datasets.
Display: Black circle with 1% icon
For datasets this means that you are using a 1% sample of the data. When a dataset has a sample, it will automatically default to it when added to a project. You can change this to the full sample and back at any time in the dataset node.
Display: Purple background
At any point you might realize that you need to change a parameter of a query that will affect man downstream tables. This will make these tables stale and you'll see their color turn to yellow on the map.
After finishing your updates you can run each transform individually to propagate changes or you can use the Run button in the top menu to run many nodes in sequence. This menu gives you the option to run all stale nodes, or all downstream or upstream nodes (from the node you have selected).
To delete a node, right click on a dataset or transform node and select Delete.
When deleting a transform, the transform and output table will be deleted; every transform must have an output table to record results of that transform . If the project tree has additional nodes downstream, the transform and output table will be 'spliced' out, i.e. the upstream node nearest the deleted transform will be connected to the downstream node nearest to the deleted output table. Note that this deletion will cause the next downstream transform to receive new input variables from the node that's directly upstream. (In the above example, deleting the selected transform will result in the 'Optum SES Inpatient Confinement' dataset being connected directly to the remaining transform, which will change the variables available to work with in that transform.)
When deleting a dataset or dataset table, the dataset and all downstream nodes will be deleted. If additional branches are joined into the branch downstream of the deleted dataset, those branches will be retained up to but not including the transform located in the deleted branch.
Since you can't undo a deletion, you'll receive a warning message before proceeding.
As you make changes in a project you will change the status of different nodes connected to it. These changes in status are shown in the left panel of the project to help you keep track of any changes.