Project concepts
Last updated
Last updated
You can use projects on Redivis to analyze any type of data, at any scale. Projects allow you to organize your work into discrete steps, where you can easily validate your results and develop well-documented, reproducible workflows.
To create a project, navigate to a dataset that you're interested in working with and press the Analyze data in a project button. If you do not have data access to this dataset you may need to Apply for access first.
You can also create a project from the "Projects" tab of your workspace. Once you've created your project, you'll be able to add any dataset that you have access to.
In a Redivis project work is represented visually on the left side of the screen in a tree layout where you can build nodes and easily see their relationships. Each node type can take different actions. The project tree automatically expands and creates a layout of all the nodes in your project, keeping everything organized as it grows.
Clicking on a node will allow you to see its information and/or execute changes. Your selected node will have a purple border in the tree and the right panel will update to show that node's contents. You can expand or collapse this right panel by dragging the center pane side to side.
Most nodes in a project will be connected to others, and these connections are drawn as lines on the project tree. Thicker lines usually indicate a source node or a harder connection, while thin lines will show a join or more casual linkage.
When you click on a node to select it, the lines connecting upstream and downstream nodes will turn purple. For larger projects this can make it easier to trace where the data in your selected node came from, or see what its eventual outputs are.
Nodes that have a direct source will have a large solid line connecting them.
Dataset nodes represent datasets that have been added to your project.
Table nodes are either tables within a dataset, or the resulting output table of an upstream computation.
Transform nodes are queries that are used to reshape data by creating new tables.
Notebook nodes are code blocks (and their outputs) which are used to analyze data.
The main way to build your project is to add and customize nodes. You will start by adding data to your project, then add ways to cut, reshape, and analyze it.
You can click the Add dataset button in the top left corner of the project toolbar to select a dataset want to add to this project. This will add a copy of this dataset to the project as a dataset node and allow you to reference its tables.
Each dataset can only be added to a project one time. You may not be able to add a dataset to the project if it has no tables or it already exists in the project.
Add data editing nodes
All data cutting, reshaping, and analysis on Redivis happens in either a transform or a notebook node. These nodes must be attached to a source table, so can only be created after you've added a dataset.
To create a work node, click on a table and select either the transform or notebook button that appears beneath it. If the table already has a downstream node you can press the plus icon beneath it instead.
There are two mechanisms for working with data in projects: transforms and notebooks. Understanding when to use each tool is key to taking full advantage of the capabilities of Redivis, particularly when working with big datasets.
Transforms are better for:
Reshaping + combining tabular and geospatial data
Working with large tables, especially at the many GB to TB scale
Preference for a no-code interface, or preference for programming in SQL
Declarative, easily documented data operations
Notebooks are better for:
Interactive exploration of any data type, including unstructured data files
Working with smaller tables (though working with bigger data is possible)
Preference for Python, R, Stata, or SAS
Interactive visualizations and figure generation
You can right click on any transform or notebook node in the project tree to copy it. Right click on any table node to paste the copied transform or notebook. The table you pasted it onto will become this node's source table. You may need to update the node it's new location's information for it to work properly.
If you would like to place a node between other nodes you can click on a transform or notebook and select Insert transform.
If you have a transform copied to the clipboard you can insert it between other nodes by right clicking on a transform or notebook and selecting Paste copied transform above. This will insert both the transform and its output table into the branch of the project you've selected.
All transforms can be split at the step level into two different transforms by clicking Split in any step's menu. Additionally, two transforms can be combined into one by right clicking on a table to Remove it.
You might want to split a transform above a tricky step to see what the output table would look like at that point in the process. This can be a key tool in troubleshooting any issues and understanding what might be going wrong.
After splitting a transform to check an output table, the next logical step might be to combine these two transforms back into one again. Or perhaps you have a string of transforms which you no longer need the output tables for and want to reduce the size of your project.
Only nodes that you have intentionally created can be deleted or removed from the project. Output tables are created by running code and can't be removed.
To delete a node, right click on the node and select Delete.
When deleting a transform or notebook:
The transform or notebook and its output table will be deleted.
If the project tree has additional nodes downstream, the transform or notebook and its output table will be 'spliced' out, i.e. the upstream node nearest the deleted transform will be connected to the downstream node nearest to the deleted output table. This deletion will cause the next downstream transform to receive new input variables from the node that's directly upstream which will probably cause it to become stale and possibly invalid since variables it was previously referencing might not exist any longer.
When deleting a dataset:
The dataset and all downstream nodes will be deleted. If additional branches are joined into the branch downstream of the deleted dataset, those branches will be retained up to but not including the transform located in the deleted branch.
Since you can't undo a deletion, you'll receive a warning message before proceeding.
As you make changes in a project you will change the status of different nodes connected to it. These changes in status are shown in the left panel of the project to help you keep track of any changes.
As you work in a project, nodes colors and symbols will change on the tree view to help you keep track of your work progress. More detailed information about each of these states can be found on each node page.
These are the most important project-wide concepts to be aware of. Projects are designed to be iterative and allow you to see outputs and go back to make changes and rerun for new results.
In order to keep track of any edits, transforms that unrun edits will be shown differently on the map than other transforms, with grey diagonal marks.
Any time data is changed, nodes that referenced that data will become stale (yellow background) until they are rerun on the new inputs.
You can select the Map button on the left side of the project toolbar to begin a run of all stale nodes in the project. This will execute all transform and notebook nodes in a logical sequence to update the project completely.
To shift a node, select the node and click the arrow that appears next to most nodes when selected. Shifting nodes is purely an organizational tool and it has no effect on the data produced in the project.
Along with clicking a node to select it, all nodes have a Source and Output navigation button on the right side of the project toolbar. You can click this button to jump directly to the immediate upstream or downstream node.
You can duplicate any project that you have view access to by selecting Fork in the right side of the black menu bar with the project overview selected. This duplicate copy will have a link back to the original project on the overview page.
It can be helpful to work from an exact duplicate of a project if you're a viewing a colleague's project and would like to test a similar data manipulation, or if you'll rely on similar data manipulations in a new research effort.
Projects are continuously saved as you work, and any analyses will continue to run in the background if you close your browser window. You can always navigate back to this project later from the "Projects" tab of your workspace.
A complete version history is available for all transforms and notebooks in your project, allowing you to access historic code and revert your work back to a previous point in time.
To view and update additional information about your project, click its name in the middle of the black menu bar or click on the white space in the background of the project tree. This will bring up the project overview where you can edit this project.
Rename: Rename your project by clicking the header at the top of the project information view.
Study: You can add your project to a study in order to facilitate collaboration with others. For certain restricted datasets, your project will need to be part of an approved study in order to run queries.
Sharing: You can give other users access to view or edit your project, or transfer ownership to another user in the Sharing section. You can also set this project's visibility and discoverability. Anyone viewing your project will need to have gained data access to any restricted datasets to view the relevant node contents.
Project description: You can add description of your project to document the details of your research aim or data manipulation strategy.
Metrics: You can see the data of last project activity, where your project was forked from, and how many others have forked your project in the description section.