# Dataset lifecycle

## Create a new version

When it's time to update a dataset's data, you'll want to create a new version. To do this, navigate to the dataset editor and click **Create next version.**

Before this version is released, it will be tagged as `next`. Only dataset editors will be able to see the `next` version on the dataset page and use it in their workflows.

{% hint style="info" %}
A dataset can have up to 1,000 versions. If your use case exceeds this limit, consider creating a new dataset that imports the previous dataset's tables once this limit has been reached.
{% endhint %}

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2Fr3XXFvOfyvTHOUuzMmPO%2FScreenshot%202024-12-09%20at%206.29.12%E2%80%AFPM_out.png?alt=media\&token=250bd479-279f-49b1-a8a3-8fe256a67fb5)

## Update existing tables

In your new version, you may have a combination of:

* Table additions (new in this version)
* Table deletions (no longer exist in this version)
* Table modifications (the table still exists, but its data has changed).&#x20;

If you have table modifications, it is strongly recommended to *not* delete and recreate the table. This is inefficient from a storage perspective, and breaks the table's lineage across versions. Instead, you should [import your new data](https://docs.redivis.com/reference/datasets/create-and-edit-datasets/import-sources) directly to the existing table.&#x20;

When importing data to an existing table, you can choose whether these uploads should be appended to, or fully replace, the contents of the existing table. The latter will functionally achieve the same outcome as deleting + recreating the table, but Redivis will be able to more efficiently store the data (computing a diff for each version), and the table will be linked across its versions.

## Release

Once you are ready to make your updates available to non-editors, you will need to release the version.&#x20;

As soon as it is released it will be accessible to all who meet its access rules.

## Unrelease

If the most recent version of a dataset has been released in the last 7 days, and there is no Next version already created, you'll have the option to unrelease it.&#x20;

This will revert the dataset to the exact state it was in before the version was released. If anyone who is not a dataset editor has this version in a workflow, they will lose access to the data, though can revert to a previous version if it exists.&#x20;

## Version storage

All versions of a dataset contribute to that dataset's total size, which in turn will count towards your [usage quotas](https://docs.redivis.com/reference/your-account/compute-credits-and-billing) or [organization billing](https://docs.redivis.com/reference/billing) (depending on whether the dataset is owned by you or an organization).

This total size will be displayed in the dataset editor, alongside the size for the current version. For datasets with one version, this total size may be slightly larger than the current version, as Redivis stores certain metadata behind the scenes to support future versioning.&#x20;

As new versions are created, Redivis will efficiently compute a row-level difference between the versions — only additions and updates to existing data will contribute to the dataset's total storage size, preventing data that is consistent across versions from being double-counted.

Adding or removing columns (or changing column types) won’t affect row uniqueness, as the underlying storage represents all values as strings. Only the storage size of the new column would be added.

## Archival and deletion

Datasets on Redivis are intended as a persistent store of information, and this persistence is critical to ensure the reproducibility of analyses.

In some cases, it may be necessary to remove a dataset or version, and you can choose from different removal options.

* **Archival** stops all data usage. Data remains stored and can be put into use again if unarchived. No other aspects of metadata, access, etc. are changed.
* **Deletion** permanently removes data from Redivis, leaving only a reference for citation purposes.

{% hint style="info" %}
When a dataset is deleted or archived, all derivative tables in researchers' workflows that correspond to that dataset will be archived, meaning that the underlying data in these derivative tables is deleted. Note that researchers will no longer be able to read from or query these derivative tables.

This is often helpful to ensure compliance with data deletion requirements, in that all derivative data is expunged at the time of deletion or archival. However, you should be aware of the potential impact this may cause to your researchers, and communicate with them appropriately.
{% endhint %}

#### Dataset archival

Archival is always reversible, and can be useful to stop usage due to out of date information or reduce costs.

When a dataset is archived, its data can no longer be queried or modified (though its metadata can still be edited). All derivate tables in [workflows](https://docs.redivis.com/reference/workflows) will be marked as archived, which prevents reading or querying the data. This can help ensure that a dataset remains inactive and limit storage costs, though the dataset may be unarchived at any time.

To archive a dataset, either right-click on the dataset when viewing the list of your datasets, or click the **Archive** button on the [dataset settings](https://docs.redivis.com/reference/overview#dataset-settings).

#### Version deletion

Any version of a dataset can be deleted, as long as it is not the currently released version. Deleting this version will delete all metadata and data associated with it.&#x20;

A deleted version will no longer be available for analysis, and any derivative workflow tables that reference the version will be marked as archived — meaning they can no longer by read or analyzed. In order to continue with the dataset, researchers will need to update their workflows to use a non-deleted version of the dataset.

If you delete an unreleased `next` version, this deletion is permanent and cannot be undone. If you delete a released version, there is a 7-day window within which the version can be undeleted, after which the deletion becomes permanent.

If you are deleting versions to reduce storage costs, be aware that Redivis stores data efficiently across versions – the storage used by a particular record will be deleted *only* if it is unique to the deleted version (or, if deleting a series of versions, if that record doesn't exist in any non-deleted version). The total storage savings of deleting a version will be shown to you prior to deletion confirmation.

#### Dataset deletion

In some situations it may be necessary to fully delete a dataset. Once deleted, the dataset will no longer be discoverable, though it will still show up in users' workflows that reference the dataset, and bookmarked URLs and DOIs will still resolve to the dataset's landing page. You can also view a list of deleted datasets by navigating to your workspace or organization administrator panel, and filtering datasets by `status: deleted`.

To ensure future reproducibility, dataset metadata and documentation is preserved upon deletion. However, all data will be fully expunged, and the dataset will no longer by queryable.

The dataset's public [access level](https://docs.redivis.com/reference/data-access/access-levels) will be persisted in its deleted state – meaning that if the dataset was previously visible, it will still be visible (but not discoverable) once deleted. Additionally, any users who explicitly had access to the dataset prior to deletion will have their access persisted — though notably, the dataset will be emptied of all data, and authorized researchers will only be able to view metadata. These default access rules can be modified by navigating to the deleted dataset reconfiguring access accordingly.

Datasets can be undeleted for 7 days after deletion by navigating to the dataset settings and clicking the **Undelete** button. After 7 days, deletion becomes permanent and the dataset's data will be unrecoverable.

{% hint style="warning" %}
If you are looking to delete a dataset to save on storage costs, consider deleting one or more [versions](https://docs.redivis.com/reference/versions#deletion) instead.&#x20;
{% endhint %}
