Archival & deletion

Overview

Datasets on Redivis are intended as a persistent store of information, and this persistence is critical to ensure the reproducibility of analyses.

In some cases, it may be necessary to archive a dataset, so as to prevent future usage and save on cloud costs. In other scenarios, you may need to delete a specific version (e.g., to reduce storage costs, or permanently retract inappropriately released data). Finally, in some situations it may be necessary to fully delete a dataset, for example, when a license to that dataset has expired.

Archival operations are always reversible, whereas deletion of versions and datasets becomes permanent after 7 days.

When a version or dataset is deleted or archived, all derivative tables in researchers' workflows that correspond to that version (or dataset) will be archived, meaning that the underlying data in these derivative tables is deleted. Note that researchers will no longer be able to read from or query these derivative tables.

This is often helpful to ensure compliance with data deletion requirements, in that all derivative data is expunged at the time of deletion or archival. However, you should be aware of the potential impact this may cause to your researchers, and communicate with them appropriately.

Dataset archival

When a dataset is marked as archived, its data can no longer be queried or modified (though its metadata can still be edited). All derivate tables in workflows will be marked as archived, which prevents reading or querying the data. This can help ensure that a dataset remains inactive and limit storage costs, though the dataset may be unarchived at any time.

To archive a dataset, either right-click on the dataset when viewing the list of your datasets, or click the Archive button on the dataset settings.

Version deletion

Any version of a dataset can be deleted, as long as it is not the currently released version. Deleting this version will delete all metadata and data associated with it.

A deleted version will no longer be available for analysis, and any derivative workflow tables that reference the version will be marked as archived — meaning they can no longer by read or analyzed. In order to continue with the dataset, researchers will need to update their workflows to use a non-deleted version of the dataset.

If you delete an unreleased next version, this deletion is permanent and cannot be undone. If you delete a released version, there is a 7-day window within which the version can be undeleted, after which the deletion becomes permanent.

If you are deleting versions to reduce storage costs, be aware that Redivis stores data efficiently across versions – the storage used by a particular record will be deleted only if it is unique to the deleted version (or, if deleting a series of versions, if that record doesn't exist in any non-deleted version).

The total storage savings of deleting a version will be shown to you prior to deletion confirmation.

Dataset deletion

Once deleted, the dataset will no longer be discoverable, though it will still show up in users' workflows that reference the dataset, and bookmarked URLs and DOIs will still resolve to the dataset's landing page. You can also view a list of deleted datasets by navigating to your workspace or organization administrator panel, and filtering datasets by status: deleted.

To ensure future reproducibility, dataset metadata and documentation is preserved upon deletion. However, all data will be fully expunged, and the dataset will no longer by queryable.

The dataset's public access level will be persisted in its deleted state – meaning that if the dataset was previously visible, it will still be visible (but not discoverable) once deleted. Additionally, any users who explicitly had access to the dataset prior to deletion will have their access persisted — though notably, the dataset will be emptied of all data, and authorized researchers will only be able to view metadata. These default access rules can be modified by navigating to the deleted dataset reconfiguring access accordingly.

Datasets can be undeleted for 7 days after deletion by navigating to the dataset settings and clicking the Undelete button. After 7 days, deletion becomes permanent and the dataset's data will be unrecoverable.

PreviousSampling NextExporting content

Last updated 2 months ago

Was this helpful?

Good evening

Overview

Dataset archival

Version deletion

Dataset deletion