Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • Adding data source nodes to a workflow
  • Datasets as a data source
  • Workflows as a data source
  • Data source node states

Was this helpful?

Export as PDF
  1. Reference
  2. Workflows

Data sources

Last updated 4 months ago

Was this helpful?

Overview

A data source node contains data across Redivis that you want to work with and have access to. These are usually but can also be other workflows. These nodes display overview information about the dataset or workflow it represents, and a list of the tables it contains.

You can click on any table to view its contents, or click "Transform" to build a transform on it.

Adding data source nodes to a workflow

Adding a data source to a workflow will make a copy of that dataset or workflow in the form of a circular node at the top of your workflow tree. You can add data to the workflow by:

  • Click the + Add data button in the top left of the toolbar within a workflow.

  • Click the Analyze in workflow button on a dataset page.

  • Click the Fork button in the toolbar of any workflow.

Restrictions

No workflow can contain two copies of the data source or the same version of the same dataset. You can add a different version of the dataset to your workflow by right clicking on the dataset name in the + Add data modal.

Datasets as a data source

Dataset samples

If a 1% sample is available for a dataset, it will automatically be added to your workflow by default instead of the full sample. Samples are indicated by the dark circle icon to the top left of a dataset node in the left panel and in the list of the dataset's tables.

All sampled tables in the same dataset will be sampled on the same variable with the same group of values (so joining two tables in the same dataset with 1% samples will still result in a 1% sample).

​To switch to the full sample, click "Sample" button in the top right of the menu bar when you have a dataset selected.

Your downstream transforms and tables will become stale, since an upstream change has been made. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the workflow's name in the top menu bar.

Dataset versions

You can view version diffs and select whichever version you want to use here.

After updating, your downstream transforms and tables will become stale. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the workflow's name in the top menu bar.

Workflows as a data source

All workflow data sources are linked to their original workflow and will automatically update with any changes made to the original.

Data source node states

As you work in a workflow, nodes colors and symbols will change on the tree view to help you keep track of your work progress.

State
Display
Details

Dataset type

Dataset icon in the middle of the circle

Workflow type

Workflow icon in the middle of the circle

Sampled

Black circle with 1% icon

Outdated version

Purple background on version number

Incomplete access

All black background, or dashed borders

You don't have full access the node. Click on the Incomplete access button in the top bar to begin applying for access to the relevant datasets.

are the most common Data source you will add to your workflow. They contain the data you want to work with in the original state curated by the data owner. Usually datasets will contain 1 or more tables you can choose to transform or analyze in a notebook.

Some large datasets have 1% which are useful for quickly testing querying strategies before running transforms against the full dataset.

When a new of a dataset is released by an administrator, the corresponding dataset node on your workflow tree will become purple. To upgrade the dataset's version, click the "Version" button in the top right of the menu bar when you have a dataset selected.

can be added to another workflow in order to build off existing analysis. You might want to continue an analytical pipeline that you've built elsewhere, or elaborate on someone else's analysis. You will have access to all tables this workflow contains.

This data source is a copy of a on Redivis.

This data source is a copy of another on Redivis.

Only possible for dataset source nodes. This means that you are using a 1% of the data. When a dataset has a sample, it will automatically default to it when added to a workflow. You can change this to the full sample and back at any time in the dataset node

Only possible for dataset source nodes. For datasets this means that you are not using the latest . This means that you have either intentionally switched to using an older version, or that this dataset's administrator has released a new version that you can to.

Datasets
samples
version
Workflows
dataset
workflow
sample
version
datasets