Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • Semantic version tags
  • Version history
  • Creating a new version
  • Version storage
  • Version unrelease
  • Version deletion
  • Unreleased and published datasets
  • Working with previous versions

Was this helpful?

Export as PDF
  1. Reference
  2. Datasets

Version control

Last updated 5 months ago

Was this helpful?

Overview

All datasets on Redivis are automatically versioned. Any change to a dataset's data in or will require a new version. A version is a locked copy of the data, supporting future reproducibility and confidence in the persistence of researchers' data workflows.

Changes to documentation and metadata do not create a new version, though different versions do have independent documentation and metadata. For example, if a new version contains a new table, or new data, you will likely want to document this information separately from the previous version. However, if you only want to enrich the existing metadata on the current version, you can do so without creating a new version.

If you are a dataset editor and have new data, find a mistake to correct, or would otherwise like to modify the existing content of a dataset, you can and a new version.

Semantic version tags

To help researchers better understand if the differences across versions, Redivis uses semantic versioning, of the form v[major].[minor] The first version of every dataset is v1.0. For subsequent versions, the tag will augment automatically depending on the changes being released.

  • Major update: Existing code may not run.

    • Triggered when variables in the new version are renamed, deleted, or retyped. Also occurs if any tables from the previous version were deleted.

  • Minor update: Existing code will generally run.

    • Changes are limited to adding / removing records, recoding variables, adding variables, and adding tables.

Version history

On any dataset page, you can view the current version tag next to the dataset title, and click on this tag to switch to view a full version history and switch to a different version of this dataset.

Creating a new version

When it's time to update a dataset's data, you'll want to create a new version. To do this, navigate to the dataset editor and click Create next version.

Before this version is released, it will be tagged as next. Only dataset editors will be able to see the next version on the dataset page and use it in their workflows.

A dataset can have up to 1,000 versions. If your use case exceeds this limit, consider creating a new dataset that imports the previous dataset's tables once this limit has been reached.

Version storage

This total size will be displayed in the dataset editor, alongside the size for the current version. For datasets with one version, this total size may be slightly larger than the current version, as Redivis stores certain metadata behind the scenes to support future versioning.

As new versions are created, Redivis will efficiently compute a row-level difference between the versions — only additions and updates to existing data will contribute to the dataset's total storage size, preventing data that is consistent across versions from being double-counted.

Adding or removing columns (or changing column types) won’t affect row uniqueness, as the underlying storage represents all values as strings. Only the storage size of the new column would be added.

Version unrelease

If the most recent version of a dataset has been released in the last 7 days, and there is no Next version already created, you'll have the option to unrelease it.

This will revert the dataset to the exact state it was in before the version was released. If anyone who is not a dataset editor has this version in a workflow, they will lose access to the data, though can revert to a previous version if it exists.

Version deletion

Any version of a dataset can be deleted as long as it is not the most recent released version. Deleting this version will permanently delete all metadata and data associated with it.

This version will no longer be available in any workflows, and researchers will lose access to any data referencing that table in their workflows. In order to continue with the dataset, researchers will need to change their analyses to a non-deleted version.

Deleting a version is permanent and can't be undone.

If you are deleting versions to reduce storage costs, be aware that Redivis stores data efficiently across versions – the storage used by a particular record will be deleted only if it is unique to the deleted version (or, if deleting a series of versions, if that record doesn't exist in any non-deleted version).

Unreleased and published datasets

When a dataset is first created it will be marked as unreleased.

As you work on the initial version of the dataset you can see the changes you make to the data in the version history modal. in this unreleased state you can add it to a workflow and analyze the data it contains. Note that the only people who can see it in the workflow are dataset editors. If you make changes to the data or delete it, your workflow will change instantly to reflect those changes.

To add an unreleased or unpublished dataset to a workflow, click the View dataset tab from the edit interface, and click the Analyze in workflow button. Since this dataset is not published it will not appear in the Add dataset interface in workflows.

Once you are ready to make your data available to non-editors, you will need to publish it. This will simultaneously release your version as well as make your dataset discoverable. As soon as it is published it will be accessible to all who meet its access rules.

If this is an organization dataset, it can be unpublished at any time. This will "unlist" the dataset so that it will not be listed on your organization home page and even users who meet the access requirements will not be able to see it. Anywhere it is used in a workflow it will instantly be made unavailable.

Unpublishing might be used if you need to temporarily halt usage of the dataset but don't want to disrupt all of the access rules.

Working with previous versions

When analyzing data in a workflow you can change any dataset to a previous version to instantly change every table to the corresponding table in that version. To do this, click on the dataset and click the Version button in the toolbar. Select the version you'd like to analyze and confirm. You can always switch versions at any point.

Normally you can only have one copy of dataset in a workflow but it's possible to add a second version if you'd like to compare them. In any workflow, click the + Add data button and locate the dataset. Right click on the dataset and choose to add another version to the workflow.

Within a workflow, you can change the version of a dataset by selecting the and clicking the Version button at top right. If there is a new version available, the dataset node will be highlighted to indicate that you might want to upgrade.

All versions of a dataset contribute to that dataset's total size, which in turn will count towards your or (depending on whether the dataset is owned by you or an organization).

usage quotas
organization billing
tables
files
create
dataset node
release