Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • Desirable Characteristics for All Data Repositories
  • Additional Considerations for Human Data
  • Detailed information

Was this helpful?

Export as PDF
  1. Redivis for open science

Data repository characteristics

Last updated 4 months ago

Was this helpful?

Overview

Redivis is designed to support data-driven research throughout the research lifecycle, and can serve as a permanent repository for your data, analytical workflows, and data derivatives.

In May 2022, the Subcommittee on Open Science (SOS) of the United States Office of Science and Technology Policy (OSTP) outlining the "desirable characteristics of data repositories".

These characteristics are intended to help agencies direct Federally funded researchers toward repositories that enable management and sharing of research data consistent with the principles of FAIR data practices. Various agencies have adopted these guidelines, .

Redivis is specifically designed to meet these desirable characteristics of a data repository, outlined below:

Desirable Characteristics for All Data Repositories

Assigns a persistent identifier (PID), such as a DOI

Identifier points to a persistent landing page

Plan for long-term data management

Maintain integrity, authenticity, and availability of datasets

Stable technical infrastructure

Stable funding plans

Additional Considerations for Human Data


Detailed information

Unique Persistent Identifiers

Assigns datasets a citable, unique persistent identifier, such as a digital object identifier (DOI) or accession number, to support data discovery, reporting, and research assessment. The identifier points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer available.

Long-Term Sustainability

Has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; and having contingency plans to ensure data are available and maintained during and after unforeseen events.

Redivis undergoes annual security audits and penetration testing by an external firm, and utilizes a formalized software development and review process to maintain the soundness of its technical infrastructure.

Funding for Redivis is provided by recurring annual subscriptions from its member academic institutions. This model provides consistent annual revenue to support the ongoing maintenance of the platform. Redivis is an employee-owned company without any external investors with an equity stake, allowing us solely focus on the needs of our customers, our employees, and our mission of improving accessibility in the research data science.

Metadata

Ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the community(ies) the repository serves. Domain-specific repositories would generally have more detailed metadata than generalist repositories.

As a generalist repository, the metadata schema is intentionally broad and flexible, but specific groups on Redivis can choose to enforce more specific metadata standards within their datasets.

Curation and Quality Assurance

Provides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.

It is ultimately up to the editors of a dataset to provide curation, though Redivis is designed to support this process as much as possible. Redivis automatically computes checksums and runs fixity checks on all uploaded files, and computes univariate summary statistics of all variables to aid in the quality assurance process. Metadata completeness is also reported to editors, encouraging them to provide as much information as possible.

Free and Easy Access

Provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and ethical limits required to maintain privacy and confidentiality, Tribal sovereignty, and protection of other sensitive data.

Broad and Measured Reuse

Makes datasets and their metadata available with broadest possible terms of reuse; and provides the ability to measure attribution, citation, and reuse of data (i.e., through assignment of adequate metadata and unique PIDs).

Clear Use Guidance

Provides accompanying documentation describing terms of dataset access and use (e.g., particular licenses, need for approval by a data use committee).

Security and Integrity

Has documented measures in place to meet generally accepted criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data.

Redivis is well-designed to handle workflows around sharing of sensitive and high-risk data. It provides technical mechanisms to support the reuse of sensitive data when allowed, while enabling the enforcement of appropriate guardrails and access controls defined by data administrators.

Confidentiality

Has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data.

In addition to an annual security audit, Redivis also undergoes annual penetration testing by an outside firm to further ensure the soundness of its security posture.

Common Format

Allows datasets and metadata downloaded, accessed, or exported from the repository to be in widely used, preferably non-proprietary, formats consistent with those used in the community(ies) the repository serves.

Provenance

Has mechanisms in place to record the origin, chain of custody, and any modifications to submitted datasets and metadata.

Retention Policy

Provides documentation on policies for data retention within the repository.


Fidelity to Consent

Uses documented procedures to restrict dataset access and use to those that are consistent with participant consent and changes in consent.

Access rules on Redivis are defined and enforced by the dataset owner.

Restricted Use Compliant

Uses documented procedures to communicate and enforce data use restrictions, such as preventing reidentification or redistribution to unauthorized users.

Privacy

Implements and provides documentation of measures (for example, tiered access, credentialing of data users, security safeguards against potential breaches) to protect human subjects’ data from inappropriate access.

Additional technical safeguards ensure data privacy. Data users can establish their identity through their institutional identity provider, and Redivis undergoes regular security audits to ensure the soundness of its security posture.

Plan for Breach

Has security measures that include a response plan for detected data breaches.

Redivis has detailed internal security protocols and documented security breach plans which are regularly exercised by technical personnel via tabletop exercises. All systems are continuously monitored for potential breaches, with immediate alert pathways and clear escalation protocols to respond to any breach.

Download Control

Controls and audits access to and download of datasets (if download is permitted).

Violations

Has procedures for addressing violations of terms-of-use by users and data mismanagement by the repository.

As a matter of policy, Redivis aims to be as permissive as possible, recognizing that often misuse is the result of accidental behavior or misunderstanding. These controls are designed to protect the system for all users, and are not intended to ever be punitive towards good-faith actors.

Request Review

Makes use of an established and transparent process for reviewing data access requests.

Contingency plans to ensure data are available and maintained

Datasets accompanied by metadata

Aids in the easy discovery, reuse, and citation of datasets

Schema appropriate to relevant data communities

Provide or allow others to provide expert curation

Quality assurance for accuracy and integrity of datasets and metadata

Broad, equitable, and maximally open access to datasets and metadata

Access is free of charge in a timely manner consistent with privacy

Makes datasets and metadata available to reuse

Provides ability to measure attribution, citation, and reuse of data

Provides documentation for access and use

Prevents unauthorized access, modification, and release of data

Ensures administrative, technical, and physical safeguards

Continuous monitoring of requirements

Download, access, and export available in non-proprietary formats

Ability to record the origin, chain of custody, and modification of data or metadata

Provides policy for data retention

Utilizes consistent consent

Enforces data use restrictions

Implements measures to protect data from inappropriate access.

Has a response plan for detected data breaches.

Controls and audits access to and download of datasets.

Has procedures for addressing violations and data mismanagement.

Process for reviewing data access requests.

Data can be uploaded to datasets within an organization, where every version of that dataset is through the organization's . DOIs will always resolve to the URL of the . In the case when a dataset is restricted or deleted, base metadata will remain available.

Redivis uses highly-available and redundant Google Cloud infrastructure to ensure data is stored and to the highest technical standards. Redivis maintains a formal disaster recovery and business continuity plan that is regularly exercised to ensure our ability to maintain availability and data durability during unforeseen events.

All Redivis datasets contain extensive and . Some metadata fields (including for all variables) are automatically generated. Some fields are optional depending on the editor's insight and preference. Every dataset has space for short and long-form , , and , alongside variable .

Metadata is available in various machine readable formats, such as schema.org and DataCite JSON, and can be viewed through the interface or downloaded via the .

All datasets on Redivis are owned either by an (curated by any administrator) or an individual . Additional users may be added as editors to a dataset, so as to provide further curation and quality assurance.

and can be explored on Redivis without any requirement to have an account or log in. If someone would need to apply to , or they will need to make an to do so. All individual accounts are complete free and require no specific affiliation. Data access restrictions are set and maintained by the data owner.

The data owner can choose to publish any dataset publicly or set appropriate based on the sensitivity of the data. Redivis imposes no additional limits on the availability and reuse of data. Any dataset that has a DOI can be cited and tracked in publications, and any reuse on Redivis is displayed on the dataset's usage tab.

Dataset owners can describe any usage agreements in their and have space to document any additional usage rules. Redivis imposes no additional limits on the availability and use of data.

Redivis is SOC2 certified and prioritizes technical . There are multiple layers of administrative controls to make it clear what actions data owners are taking, and all actions whether administrator or researcher are automatically for review.

Redivis is SOC2 certified and prioritizes technical . All data is encrypted in transit and at rest, and stored on Google Cloud infrastructure that maintains robust technical and .

Data and metadata on Redivis can be imported and exported in . Data analysis is performed in common, generally open-source programming languages (SAS and Stata being available exceptions). Redivis does not introduce any of its own proprietary formats or programming languages.

All datasets contain documentation, which is automatically populated based on administrator actions. This information can be further edited or supplemented with additional related identifiers.

All modifications to a dataset or tracked in the .

Redivis publishes a . Data is always owned by the user or organization who uploads the dataset, and they have control over a dataset's presence and availability. The dataset owner may apply additional policies towards data retention.

Redivis provides extensive options to on sensitive or restricted data, including allowing access on different levels (e.g. metadata, sample, data). Access is granted and revoked on Redivis instantly, allowing for immediate changes in access based on changing circumstances.

Redivis has built in data available to administrators. Data download or export can be restricted completely or limited only to administrator approval in order to prevent redistribution.

Data administrators can also communicate and collect formal acknowledgement of other use restrictions through . Moreover, data administrators can easily audit the use of restricted data in the to further check for and limit any non-compliance or other misuse.

Redivis has a built-in that allows administrators to restrict access to sensitive data. These controls apply to data derivatives as well, where any data output inherits the access rules of the source dataset(s) used to create that output. These controls on derivative data allow researchers to without worrying about accidentally leaking sensitive information, since all collaborators will need to comply with the access rules in order to view those outputs.

Redivis has built in data available to administrators. Data download or export can be restricted completely or limited only to specific external systems / upon administrator approval. All data downloads are logged for subsequent audit and review. Downloads are only available to authenticated users who have access to the underlying data.

Violations of the may lead to account suspension or revocation, as outlined in the terms. Additional technical controls are in place to prevent abuse or misuse of Redivis's systems.

Administrators can define access on any restricted dataset. These access requirements are transparent to all users, and researchers must and be approved for a given set of requirements in order to gain access. Requirements also have space for both administrators and applicants to leave specifically in the context of the data application.

✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
securely
Organization
User
Datasets
workflows
access restricted data
analyze and download data
account
access restrictions
access requirements
security
logged
security
physical access controls
adminstrative logs
data retention policy
limit access
restrictions
tiered access system
share analyses with colleagues
restrictions
terms-of-use
✅
✅
✅
✅
✅
✅
released a document
including the NIH
Unique Persistent Identifiers
Long-Term Sustainability
Metadata
Curation and Quality Assurance
Free and Easy Access
Broad and Measured Reuse
Clear Use Guidance
Security and Integrity
Confidentiality
Common Format
Provenance
Retention Policy
Fidelity to Consent
Restricted Use Compliant
Privacy
Plan for Breach
Download Control
Violations
Request Review
API
dataset page
metadata
documentation
summary statistics
labels and descriptions
access requirements
requirements
apply
text
supporting files
links
Provenance
assigned a DOI
DOI-issuing credentials
multiple common formats
comments
audit logs