Data repository characteristics

Overview

Redivis is designed to support data-driven research throughout the research lifecycle, and can serve as a permanent repository for your data, analytical workflows, and data derivatives.

In May 2022, the Subcommittee on Open Science (SOS) of the United States Office of Science and Technology Policy (OSTP) released a document outlining the "desirable characteristics of data repositories".

These characteristics are intended to help agencies direct Federally funded researchers toward repositories that enable management and sharing of research data consistent with the principles of FAIR data practices. Various agencies have adopted these guidelines, including the NIH.

Redivis is specifically designed to meet these desirable characteristics of a data repository, outlined below:

Desirable Characteristics for All Data Repositories

Unique Persistent Identifiers

Assigns a persistent identifier (PID), such as a DOI

Identifier points to a persistent landing page

Long-Term Sustainability

Plan for long-term data management

Maintain integrity, authenticity, and availability of datasets

Stable technical infrastructure

Stable funding plans

Contingency plans to ensure data are available and maintained

Metadata

Datasets accompanied by metadata

Aids in the easy discovery, reuse, and citation of datasets

Schema appropriate to relevant data communities

Curation and Quality Assurance

Provide or allow others to provide expert curation

Quality assurance for accuracy and integrity of datasets and metadata

Free and Easy Access

Broad, equitable, and maximally open access to datasets and metadata

Access is free of charge in a timely manner consistent with privacy

Broad and Measured Reuse

Makes datasets and metadata available to reuse

Provides ability to measure attribution, citation, and reuse of data

Clear Use Guidance

Provides documentation for access and use

Security and Integrity

Prevents unauthorized access, modification, and release of data

Confidentiality

Ensures administrative, technical, and physical safeguards

Continuous monitoring of requirements

Common Format

Download, access, and export available in non-proprietary formats

Provenance

Ability to record the origin, chain of custody, and modification of data or metadata

Retention Policy

Provides policy for data retention

Additional Considerations for Human Data

Fidelity to Consent

Utilizes consistent consent

Restricted Use Compliant

Enforces data use restrictions

Privacy

Implements measures to protect data from inappropriate access.

Plan for Breach

Has a response plan for detected data breaches.

Download Control

Controls and audits access to and download of datasets.

Violations

Has procedures for addressing violations and data mismanagement.

Request Review

Process for reviewing data access requests.


Detailed information

Unique Persistent Identifiers

Assigns datasets a citable, unique persistent identifier, such as a digital object identifier (DOI) or accession number, to support data discovery, reporting, and research assessment. The identifier points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer available.

Data can be uploaded to datasets within an organization, where every version of that dataset is assigned a DOI through the organization's DOI-issuing credentials. DOIs will always resolve to the URL of the dataset page. In the case when a dataset is restricted or deleted, base metadata will remain available.

Long-Term Sustainability

Has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; and having contingency plans to ensure data are available and maintained during and after unforeseen events.

Redivis uses highly-available and redundant Google Cloud infrastructure to ensure data is stored securely and to the highest technical standards. Redivis maintains a formal disaster recovery and business continuity plan that is regularly exercised to ensure our ability to maintain availability and data durability during unforeseen events.

Redivis undergoes annual security audits and penetration testing by an external firm, and utilizes a formalized software development and review process to maintain the soundness of its technical infrastructure.

Funding for Redivis is provided by recurring annual subscriptions from its member academic institutions. This model provides consistent annual revenue to support the ongoing maintenance of the platform. Redivis is an employee-owned company without any external investors with an equity stake, allowing us solely focus on the needs of our customers, our employees, and our mission of improving accessibility in the research data science.

Metadata

Ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the community(ies) the repository serves. Domain-specific repositories would generally have more detailed metadata than generalist repositories.

All Redivis datasets contain extensive metadata and documentation. Some metadata fields (including summary statistics for all variables) are automatically generated. Some fields are optional depending on the editor's insight and preference. Every dataset has space for short and long-form text, supporting files, and links, alongside variable labels and descriptions.

Metadata is available in various machine readable formats, such as schema.org and DataCite JSON, and can be viewed through the interface or downloaded via the API.

As a generalist repository, the metadata schema is intentionally broad and flexible, but specific groups on Redivis can choose to enforce more specific metadata standards within their datasets.

Curation and Quality Assurance

Provides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.

All datasets on Redivis are owned either by an Organization (curated by any administrator) or an individual User. Additional users may be added as editors to a dataset, so as to provide further curation and quality assurance.

It is ultimately up to the editors of a dataset to provide curation, though Redivis is designed to support this process as much as possible. Redivis automatically computes checksums and runs fixity checks on all uploaded files, and computes univariate summary statistics of all variables to aid in the quality assurance process. Metadata completeness is also reported to editors, encouraging them to provide as much information as possible.

Free and Easy Access

Provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and ethical limits required to maintain privacy and confidentiality, Tribal sovereignty, and protection of other sensitive data.

Datasets and projects can be explored on Redivis without any requirement to have an account or log in. If someone would need to apply to access restricted data, or analyze and download data they will need to make an account to do so. All individual accounts are complete free and require no specific affiliation. Data access restrictions are set and maintained by the data owner.

Broad and Measured Reuse

Makes datasets and their metadata available with broadest possible terms of reuse; and provides the ability to measure attribution, citation, and reuse of data (i.e., through assignment of adequate metadata and unique PIDs).

The data owner can choose to publish any dataset publicly or set appropriate access restrictions based on the sensitivity of the data. Redivis imposes no additional limits on the availability and reuse of data. Any dataset that has a DOI can be cited and tracked in publications, and any reuse on Redivis is displayed on the dataset's usage tab.

Clear Use Guidance

Provides accompanying documentation describing terms of dataset access and use (e.g., particular licenses, need for approval by a data use committee).

Dataset owners can describe any usage agreements in their access requirements and have space to document any additional usage rules. Redivis imposes no additional limits on the availability and use of data.

Security and Integrity

Has documented measures in place to meet generally accepted criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data.

Redivis is SOC2 certified and prioritizes technical security. There are multiple layers of administrative controls to make it clear what actions data owners are taking, and all actions whether administrator or researcher are automatically logged for review.

Redivis is well-designed to handle workflows around sharing of sensitive and high-risk data. It provides technical mechanisms to support the reuse of sensitive data when allowed, while enabling the enforcement of appropriate guardrails and access controls defined by data administrators.

Confidentiality

Has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data.

Redivis is SOC2 certified and prioritizes technical security. All data is encrypted in transit and at rest, and stored on Google Cloud infrastructure that maintains robust technical and physical access controls.

In addition to an annual security audit, Redivis also undergoes annual penetration testing by an outside firm to further ensure the soundness of its security posture.

Common Format

Allows datasets and metadata downloaded, accessed, or exported from the repository to be in widely used, preferably non-proprietary, formats consistent with those used in the community(ies) the repository serves.

Data and metadata on Redivis can be imported and exported in multiple common formats. Data analysis is performed in common, generally open-source programming languages (SAS and Stata being available exceptions). Redivis does not introduce any of its own proprietary formats or programming languages.

Provenance

Has mechanisms in place to record the origin, chain of custody, and any modifications to submitted datasets and metadata.

All datasets contain Provenance documentation, which is automatically populated based on administrator actions. This information can be further edited or supplemented with additional related identifiers.

All modifications to a dataset or tracked in the adminstrative logs.

Retention Policy

Provides documentation on policies for data retention within the repository.

Redivis publishes a data retention policy. Data is always owned by the user or organization who uploads the dataset, and they have control over a dataset's presence and availability. The dataset owner may apply additional policies towards data retention.


Uses documented procedures to restrict dataset access and use to those that are consistent with participant consent and changes in consent.

Redivis provides extensive options to limit access on sensitive or restricted data, including allowing access on different levels (e.g. metadata, sample, data). Access is granted and revoked on Redivis instantly, allowing for immediate changes in access based on changing circumstances.

Access rules on Redivis are defined and enforced by the dataset owner.

Restricted Use Compliant

Uses documented procedures to communicate and enforce data use restrictions, such as preventing reidentification or redistribution to unauthorized users.

Redivis has built in data restrictions available to administrators. Data download or export can be restricted completely or limited only to administrator approval in order to prevent redistribution.

Data administrators can also communicate and collect formal acknowledgement of other use restrictions through access requirements. Moreover, data administrators can easily audit the use of restricted data in the audit logs to further check for and limit any non-compliance or other misuse.

Privacy

Implements and provides documentation of measures (for example, tiered access, credentialing of data users, security safeguards against potential breaches) to protect human subjects’ data from inappropriate access.

Redivis has a built-in tiered access system that allows administrators to restrict access to sensitive data. These controls apply to data derivatives as well, where any data output inherits the access rules of the source dataset(s) used to create that output. These controls on derivative data allow researchers to share analyses with colleagues without worrying about accidentally leaking sensitive information, since all collaborators will need to comply with the access rules in order to view those outputs.

Additional technical safeguards ensure data privacy. Data users can establish their identity through their institutional identity provider, and Redivis undergoes regular security audits to ensure the soundness of its security posture.

Plan for Breach

Has security measures that include a response plan for detected data breaches.

Redivis has detailed internal security protocols and documented security breach plans which are regularly exercised by technical personnel via tabletop exercises. All systems are continuously monitored for potential breaches, with immediate alert pathways and clear escalation protocols to respond to any breach.

Download Control

Controls and audits access to and download of datasets (if download is permitted).

Redivis has built in data restrictions available to administrators. Data download or export can be restricted completely or limited only to specific external systems / upon administrator approval. All data downloads are logged for subsequent audit and review. Downloads are only available to authenticated users who have access to the underlying data.

Violations

Has procedures for addressing violations of terms-of-use by users and data mismanagement by the repository.

Violations of the terms-of-use may lead to account suspension or revocation, as outlined in the terms. Additional technical controls are in place to prevent abuse or misuse of Redivis's systems.

As a matter of policy, Redivis aims to be as permissive as possible, recognizing that often misuse is the result of accidental behavior or misunderstanding. These controls are designed to protect the system for all users, and are not intended to ever be punitive towards good-faith actors.

Request Review

Makes use of an established and transparent process for reviewing data access requests.

Administrators can define access requirements on any restricted dataset. These access requirements are transparent to all users, and researchers must apply and be approved for a given set of requirements in order to gain access. Requirements also have space for both administrators and applicants to leave comments specifically in the context of the data application.

Last updated