Version control

Overview

All datasets on Redivis are automatically versioned. Any change to a dataset's data in tables or files will require a new version. A version is a locked copy of the data, supporting future reproducibility and confidence in the persistence of researchers' data workflows.

Changes to documentation and metadata do not create a new version, though different versions do have independent documentation and metadata. For example, if a new version contains a new table, or new data, you will likely want to document this information separately from the previous version. However, if you only want to enrich the existing metadata on the current version, you can do so without creating a new version.

If you are a dataset editor and have new data, find a mistake to correct, or would otherwise like to modify the existing content of a dataset, you can create and release a new version.

Semantic version tags

To help researchers better understand if the differences across versions, Redivis uses semantic versioning, of the form v[major].[minor] The first version of every dataset is v1.0. For subsequent versions, the tag will augment automatically depending on the changes being released.

Major update: Existing code may not run.
- Triggered when variables in the new version are renamed, deleted, or retyped. Also occurs if any tables from the previous version were deleted.
Minor update: Existing code will generally run.
- Changes are limited to adding / removing records, recoding variables, adding variables, and adding tables.

Version history

On any dataset page, you can view the current version tag next to the dataset title, and click on this tag to switch to view a full version history and switch to a different version of this dataset.

Within a workflow, you can change the version of a dataset by selecting the dataset node and clicking the Version button at top right. If there is a new version available, the dataset node will be highlighted to indicate that you might want to upgrade.

Creating a new version

When it's time to update a dataset's data, you'll want to create a new version. To do this, navigate to the dataset editor and click Create next version.

Before this version is released, it will be tagged as next. Only dataset editors will be able to see the next version on the dataset page and use it in their workflows.

A dataset can have up to 1,000 versions. If your use case exceeds this limit, consider creating a new dataset that imports the previous dataset's tables once this limit has been reached.

Version storage

All versions of a dataset contribute to that dataset's total size, which in turn will count towards your usage quotas or organization billing (depending on whether the dataset is owned by you or an organization).

This total size will be displayed in the dataset editor, alongside the size for the current version. For datasets with one version, this total size may be slightly larger than the current version, as Redivis stores certain metadata behind the scenes to support future versioning.

As new versions are created, Redivis will efficiently compute a row-level difference between the versions — only additions and updates to existing data will contribute to the dataset's total storage size, preventing data that is consistent across versions from being double-counted.

Adding or removing columns (or changing column types) won’t affect row uniqueness, as the underlying storage represents all values as strings. Only the storage size of the new column would be added.

Version unrelease

If the most recent version of a dataset has been released in the last 7 days, and there is no Next version already created, you'll have the option to unrelease it.

This will revert the dataset to the exact state it was in before the version was released. If anyone who is not a dataset editor has this version in a workflow, they will lose access to the data, though can revert to a previous version if it exists.

Version deletion

Any version of a dataset can be deleted as long as it is not the most recent released version. Deleting this version will permanently delete all metadata and data associated with it.

This version will no longer be available in any workflows, and researchers will lose access to any data referencing that table in their workflows. In order to continue with the dataset, researchers will need to change their analyses to a non-deleted version.

Deleting a version is permanent and can't be undone.

If you are deleting versions to reduce storage costs, be aware that Redivis stores data efficiently across versions – the storage used by a particular record will be deleted only if it is unique to the deleted version (or, if deleting a series of versions, if that record doesn't exist in any non-deleted version).

Unreleased and published datasets

When a dataset is first created it will be marked as unreleased.

As you work on the initial version of the dataset you can see the changes you make to the data in the version history modal. in this unreleased state you can add it to a workflow and analyze the data it contains. Note that the only people who can see it in the workflow are dataset editors. If you make changes to the data or delete it, your workflow will change instantly to reflect those changes.

To add an unreleased or unpublished dataset to a workflow, click the View dataset tab from the edit interface, and click the Analyze in workflow button. Since this dataset is not published it will not appear in the Add dataset interface in workflows.

Once you are ready to make your data available to non-editors, you will need to publish it. This will simultaneously release your version as well as make your dataset discoverable. As soon as it is published it will be accessible to all who meet its access rules.

If this is an organization dataset, it can be unpublished at any time. This will "unlist" the dataset so that it will not be listed on your organization home page and even users who meet the access requirements will not be able to see it. Anywhere it is used in a workflow it will instantly be made unavailable.

Unpublishing might be used if you need to temporarily halt usage of the dataset but don't want to disrupt all of the access rules.

Working with previous versions

When analyzing data in a workflow you can change any dataset to a previous version to instantly change every table to the corresponding table in that version. To do this, click on the dataset and click the Version button in the toolbar. Select the version you'd like to analyze and confirm. You can always switch versions at any point.

Normally you can only have one copy of dataset in a workflow but it's possible to add a second version if you'd like to compare them. In any workflow, click the + Add data button and locate the dataset. Right click on the dataset and choose to add another version to the workflow.

Last updated 6 months ago

Was this helpful?