# Dataset lifecycle

## Create a new version

When it's time to update a dataset's data, you'll want to create a new version. To do this, navigate to the dataset editor and click **Create next version.**

Before this version is released, it will be tagged as `next`. Only dataset editors will be able to see the `next` version on the dataset page and use it in their workflows.

{% hint style="info" %}
A dataset can have up to 1,000 versions. If your use case exceeds this limit, consider creating a new dataset that imports the previous dataset's tables once this limit has been reached.
{% endhint %}

![](/files/IWwZZNyehNaH2KVGWw4D)

## Update existing tables

In your new version, you may have a combination of:

* Table additions (new in this version)
* Table deletions (no longer exist in this version)
* Table modifications (the table still exists, but its data has changed).&#x20;

If you have table modifications, it is strongly recommended to *not* delete and recreate the table. This is inefficient from a storage perspective, and breaks the table's lineage across versions. Instead, you should [import your new data](/reference/datasets/create-and-edit-datasets/import-sources.md) directly to the existing table.&#x20;

When importing data to an existing table, you can choose whether these uploads should be appended to, or fully replace, the contents of the existing table. The latter will functionally achieve the same outcome as deleting + recreating the table, but Redivis will be able to more efficiently store the data (computing a diff for each version), and the table will be linked across its versions.

## Release

Once you are ready to make your updates available to non-editors, you will need to release the version.&#x20;

As soon as it is released it will be accessible to all who meet its access rules.

## Unrelease

If the most recent version of a dataset has been released in the last 7 days, and there is no Next version already created, you'll have the option to unrelease it.&#x20;

This will revert the dataset to the exact state it was in before the version was released. If anyone who is not a dataset editor has this version in a workflow, they will lose access to the data, though can revert to a previous version if it exists.&#x20;

## Version storage

All versions of a dataset contribute to that dataset's total size, which in turn will count towards your [usage quotas](/reference/your-account/compute-credits-and-billing.md) or [organization billing](/reference/billing.md) (depending on whether the dataset is owned by you or an organization).

This total size will be displayed in the dataset editor, alongside the size for the current version. For datasets with one version, this total size may be slightly larger than the current version, as Redivis stores certain metadata behind the scenes to support future versioning.&#x20;

As new versions are created, Redivis will efficiently compute a row-level difference between the versions — only additions and updates to existing data will contribute to the dataset's total storage size, preventing data that is consistent across versions from being double-counted.

Adding or removing columns (or changing column types) won’t affect row uniqueness, as the underlying storage represents all values as strings. Only the storage size of the new column would be added.

## Archival

Datasets on Redivis are intended as a persistent store of information, and this persistence is critical to ensure the reproducibility of analyses. However, in some cases it may be necessary to stop or pause usage of a dataset (or specific versions) due to incorrect information or to reduce costs.

**Archival** stops all usage, including reading, querying, and exporting of data. However, no data is removed from Redivis, and full functionality can be restored at any time. No other aspects of metadata, or access are changed when a dataset is archived.

### Version archival

Specific versions within a dataset can be archived as long as they are not the `current` or `next` version. Archiving a version preserves all of its contents, while immediately stopping usage of its data.

Note that when a version is archived, all derivative tables in researchers' workflows are archived as well. These tables can be reconstituted when the data is unarchived, or, the user can update their workflow to use a different version and rerun their analysis.

To archive a version, navigate to the dataset's version history and open the action menu in the top right corner of any version. You can also shift-click to select multiple versions to archive or unarchive simultaneously.

### Dataset archival

Specific version within a dataset can also be archived, as long as they are not the `current` or `next` version. Archiving a version preserves all of its contents, while immediately stopping usage of its data.

Note that when a dataset is archived, all derivative tables in researchers' workflows are archived as well. These tables can be reconstituted when the dataset is unarchived, or, the user can update their workflow to use a different dataset and rerun their analysis.

To archive a dataset, navigate to the Dataset's settings in the editor, and click the **Archive** button. You can also bulk archive datasets by selecting multiple datasets and choosing the archive action. Note that datasets with an unreleased (`next`) version cannot be archived.

### Storage classes

When archiving data, you can select one of two storage classes:

* **Standard storage** will keep the data stored exactly as it is, and storage costs will remain unchanged. Archived data in standard storage can be moved to cold storage at any time.
* **Cold storage** will move the data to cheaper, long-term storage, and will compress any tabular data prior to archival. Data in cold storage will incur a significantly lower [monthly storage cost](/reference/billing.md), but there will be a [fee](/reference/billing.md#archival) to unarchive it.&#x20;

Accounts must have billing configured in order to utilize cold archival.

{% hint style="info" %}
If you are cold archiving a version to reduce storage costs, be aware that Redivis stores data efficiently across versions. The storage used by a particular record will be archived *only* if it is unique to the archived version (or, if archiving a series of versions, if that record doesn't exist in any non-archived versions). This is shown as the number of "discrete bytes" when selecting version(s) for archival.&#x20;
{% endhint %}

### Unarchival

Unarchiving a version or dataset makes the data usable again. You can unarchive any dataset or version by navigating to the same location where you archived it, and selecting the "Unarchive" option.

Unarchival of data in standard storage is free and instantaneous.&#x20;

Unarchival of data in cold storage will incur a one-time [unarchival fee](/reference/billing.md) and can take up to several minutes for larger datasets.

## Deletion

Datasets on Redivis are intended as a persistent store of information, and this persistence is critical to ensure the reproducibility of analyses. However, in some cases it may be necessary to fully delete a dataset (or specific versions) due to license requirements, administrative error, or to reduce costs.

**Deletion** permanently removes data, while preserving a reference to the deleted dataset or version for citation purposes.

### Version deletion

Any version of a dataset can be deleted, as long as it is not the currently released version. Deleting this version will delete all metadata and data associated with it.&#x20;

A deleted version will no longer be available for analysis, and any derivative workflow tables that reference the version will be marked as archived — meaning they can no longer by read or analyzed. In order to continue with the dataset, researchers will need to update their workflows to use a non-deleted version of the dataset.

To delete a version, navigate to the version modal where you can open the action menu in the top right corner of any version. You can also shift-click to select multiple versions to archive or unarchive simultaneously.&#x20;

{% hint style="info" %}
If you are deleting versions to reduce storage costs, be aware that Redivis stores data efficiently across versions – the storage used by a particular record will be deleted *only* if it is unique to the deleted version (or, if deleting a series of versions, if that record doesn't exist in any non-deleted version). This is shown as the number of "discrete bytes" when selecting version(s) for deletion.&#x20;
{% endhint %}

### Dataset deletion

In some situations it may be necessary to fully delete a dataset. Once deleted, the dataset will no longer be discoverable, though it will still show up in users' workflows that reference the dataset, and bookmarked URLs and DOIs will still resolve to the dataset's landing page. You can also view a list of deleted datasets by navigating to your workspace or organization administrator panel, and filtering datasets by `status: deleted`.

To ensure future reproducibility, dataset metadata and documentation is preserved upon deletion. However, all data will be fully expunged, and the dataset will no longer by usable.

The dataset's public [access level](/reference/data-access/access-levels.md) will be persisted in its deleted state – meaning that if the dataset was previously visible, it will still be visible (but not discoverable) once deleted. Additionally, any users who explicitly had access to the dataset prior to deletion will have their access persisted — though notably, the dataset will be emptied of all data, and authorized researchers will only be able to view metadata. These default access rules can be modified by navigating to the deleted dataset reconfiguring access accordingly.

To delete a dataset, click the **Delete** button on the dataset settings tab of the dataset editor. You can also right-click on the dataset when viewing the list of your datasets.

### Undeletion

If you delete an unreleased version or dataset, the deletion is permanent and cannot be undone. Otherwise, there is a 7-day window within which the dataset or version can be undeleted. After 7 days, deletion becomes permanent and the data will be unrecoverable.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.redivis.com/reference/datasets/create-and-edit-datasets/dataset-lifecycle.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
