Versions

Overview

All datasets on Redivis are versioned, with any change to the data in their tables prompting a new version. A version is a locked copy of the data that can't be edited or deleted, supporting future reproducibility and confidence in the persistence of researchers' data workflows.

Changes to documentation and metadata do not create a new version, though different versions do have independent documentation and metadata. For example, if a new version contains a new table, or new data, you will likely want to document this information separately from the previous version. However, if you only want to enrich the existing metadata on the current version, you can do so without creating a new version.

If you are a dataset editor and have updated data, find a mistake to correct, or would otherwise like to modify the existing content of the dataset's tables, you can create and release a new version.

Semantic version tags

To help researchers better understand if the differences across versions, Redivis uses semantic versioning, of the form v[major].[minor] The first version of every dataset is v1.0. For subsequent versions, the tag will augment automatically depending on the changes being released.

  • Major update: Your existing code may not run.

    • Triggered when a new version renames, deletes, or retypes variables in the dataset's tables. Also occurs if any tables from the previous version were deleted.

  • Minor update: Your existing code will generally run.

    • Changes are limited to adding / removing records, recoding variables, adding variables, and adding tables.

Version history

On any dataset page, you can view the current version tag in the dropdown at top-right, and use this to toggle between different versions of the dataset. You can also view a full version history by navigating to the Versions tab of the dataset.

Change versions on the dataset page, and view a complete version history in the Versions tab

Within a project, you can change the version of a dataset by selecting the dataset node and clicking the Version button at top right. If there is a new version available, the dataset node will be highlighted to indicate that you might want to upgrade.

When a new version is available, the dataset in your project will be highlighted.

Comparing versions

On the dataset page you can compare any two versions using the versions dropdown at top right. Any changes between will be shown in red (removed) and green (added) highlights on the page.

If a table exists in both versions, clicking on that table will automatically open it in comparison mode. You can generally use table comparisons to look at the changes to a table across versions, both on the dataset page and within projects.

Creating a new version

When it's time to update a dataset's data, you'll want to create a new version. To do this, navigate to the dataset editor and click Create next version.

Before this version is released, it will be tagged as next. Only dataset editors will be able to see the next version on the dataset page and use it in their projects.

Create subsequent versions from within the dataset editor

Version storage

All versions of a dataset contribute to that dataset's total size, which in turn will count towards your usage quotas or organization billing (depending on whether the dataset is owned by you or an organization).

This total size will be displayed in the dataset editor, alongside the size for the current version. For datasets with one version, this total size may be slightly larger than the current version, as Redivis stores certain metadata behind the scenes to support future versioning.

As new versions are created, Redivis will efficiently compute a row-level difference between the versions — only additions and updates to existing data will contribute to the dataset's total storage size, preventing data that is consistent across versions from being double-counted.