Table nodes
Overview
Table nodes in your project can either be generated by running transform, running code in a notebook to output a table, or will be part of a dataset you've added. The main purpose of table nodes in projects is to store data which you can sanity check and further query.
Derivative tables in a project behave quite similarly to dataset tables throughout Redivis, where you can preview cells, view summary statistics, and run quick queries.
All table nodes also have an associated export interface to see different ways that you can use this data off of the Redivis platform.
Usage
All table nodes have one upstream parent. You can view the table's data and metadata similarly to other tables on Redivis. You cannot edit or update the metadata here.
You can create multiple transforms to operate on a single table, which allows you to create various branches within your project. To create a new transform, select the table or dataset and click the small + icon that appears under the bottom right corner of the node.
Sanity check output
After you run a transform, you can investigate the downstream output table to get feedback on the success and validity of your querying operation – both the filtering criteria you've applied and the new features you've created.
Understanding at the content of an output table allows you perform important sanity checks at each step of your research process, answering questions like:
Did my filtering criteria remove the rows I expected?
Do my new variables contain the information I expect?
Does the distribution of values in a given variable make sense?
Have I dropped unnecessary variables?
To sanity check the contents of a table node, you can inspecting the general table characteristics, checking the summary statistics of different variables, looking at the table's cells, or create a notebook for more in-depth analysis.
Archival
If you haven't interacted with tables in your project for a while, these tables may become archived, which will temporarily limit your ability to view cells and query data in that table. This is done to prevent runaway storage costs, while leveraging the built-in reproducibility of projects to allow you to unarchive the table and pick up where you last left off.
The archival algorithm prioritizes tables that are large, quick to regenerate, and intermediary (not at the bottom of the tree). It currently does not archive tables less than 1GB; in many cases you may never interact with archived tables.
If a table is archived, you can still see the name, row count, and variable names/types. To access variable summary statistics, view cells, or run transforms downstream of an archived table, you'll have to reconstitute the table by re-running upstream transforms.
Note that the transform immediately upstream (or any additional upstream transforms, if multiple sequential tables are archived) is invalid, you'll have to resolve the invalid state before un-archiving the table.
File index tables
If you've added a dataset to a project that contains files (storage for unstructured data) you will see a table with a File index label in that dataset's list of tables. This is an automatically generated table, where every row represents one file. You can work with this table just like any other in the project tool, but the file_id
variable will remain linked to the files in that dataset for use in a notebook.
Exporting
Click the Export table button in the table's right panel to see your exporting options and manage any access restrictions associated with data this table is derived from.
Table node states
As you work in a project, nodes colors and symbols will change on the tree view to help you keep track of your work progress.
State | Display | Details |
---|---|---|
Empty | White background | A table node will be empty when it contains no data because the upstream node either has not been executed or its execution resulted in no data. |
Executed | Grey background | A table node will be grey when it has data that aligns with the contents of its upstream node. |
Stale | Yellow background | A node will be stale when an upstream change has been made. This means the content of the node does not match the content of the node above it. |
Sampled | Black circle with 1% icon | This means that you are using a 1% sample of the data. When a dataset has a sample, it will automatically default to it when added to a project. You can change this to the full sample and back at any time in the dataset node. |
Incomplete access | All black background, or dashed borders | You don't have full access the node. Click on the Incomplete access button in the top bar to begin applying for access to the relevant datasets. |
Last updated