Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • Usage
  • Sanity check output
  • Archival
  • File index tables
  • Exporting
  • Table node states

Was this helpful?

Export as PDF
  1. Reference
  2. Workflows

Tables

Last updated 5 months ago

Was this helpful?

Overview

Table nodes in your workflow can either be generated by running transform, running code in a notebook to output a table, or will be part of a dataset you've added. The main purpose of table nodes in workflows is to store data which you can sanity check and further query.

Derivative tables in a workflow behave quite similarly to throughout Redivis, where you can preview cells, view summary statistics, and run quick queries.

All table nodes also have an associated export interface to see different ways that you can use this data off of the Redivis platform.

Usage

You can create multiple transforms to operate on a single table, which allows you to create various branches within your workflow. To create a new transform, select the table or dataset and click the small + icon that appears under the bottom right corner of the node.

Sanity check output

After you run a transform, you can investigate the downstream output table to get feedback on the success and validity of your querying operation – both the filtering criteria you've applied and the new features you've created.

Understanding at the content of an output table allows you perform important sanity checks at each step of your research process, answering questions like:

  • Did my filtering criteria remove the rows I expected?

  • Do my new variables contain the information I expect?

  • Does the distribution of values in a given variable make sense?

  • Have I dropped unnecessary variables?

Archival

If you haven't interacted with tables in your workflow for a while, these tables may become archived, which will temporarily limit your ability to view cells and query data in that table. This is done to prevent runaway storage costs, while leveraging the built-in reproducibility of workflows to allow you to unarchive the table and pick up where you last left off.

The archival algorithm prioritizes tables that are large, quick to regenerate, and intermediary (not at the bottom of the tree). It currently does not archive tables less than 1GB; in many cases you may never interact with archived tables.

Note that the transform immediately upstream (or any additional upstream transforms, if multiple sequential tables are archived) is invalid, you'll have to resolve the invalid state before un-archiving the table.

File index tables

Exporting

Table node states

As you work in a workflow, nodes colors and symbols will change on the tree view to help you keep track of your work progress.

State
Display
Details

Empty

White background

A table node will be empty when it contains no data because the upstream node either has not been executed or its execution resulted in no data.

Executed

Grey background

A table node will be grey when it has data that aligns with the contents of its upstream node.

Stale

Yellow background

A node will be stale when an upstream change has been made. This means the content of the node does not match the content of the node above it.

Sampled

Black circle with 1% icon

Incomplete access

All black background, or dashed borders

You don't have full access the node. Click on the Incomplete access button in the top bar to begin applying for access to the relevant datasets.

All table nodes have one upstream parent. You can view the table's data and metadata similarly to other on Redivis. You cannot edit or update the metadata here.

To sanity check the contents of a table node, you can inspecting the general characteristics, checking the of different variables, looking at the table's , or create a for more in-depth analysis.

If a table is archived, you can still see the name, row count, and variable names/types. To access , , or downstream of an archived table, you'll have to reconstitute the table by re-running upstream transforms.

If you've added a dataset to a workflow that contains (storage for unstructured data) you will see a table with a label in that dataset's list of tables. This is an automatically generated table, where every row represents one file. You can work with this table just like any other in the workflow, but the file_id variable will remain linked to the files in that dataset for use in a .

Click the Export table button in the table's right panel to see your and manage any access restrictions associated with data this table is derived from.

This means that you are using a 1% of the data. When a dataset has a sample, it will automatically default to it when added to a workflow. You can change this to the full sample and back at any time in the .

tables
exporting options
sample
dataset node
dataset tables
table
run transforms
files
notebook
variable summary statistics
view cells
File index
cells
notebook
summary statistics