Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • 1. Create a dataset
  • 2. Import data
  • 3. Populate metadata
  • 4. Release
  • Next steps

Was this helpful?

Export as PDF
  1. Guides

Create & manage datasets

Last updated 5 months ago

Was this helpful?

Overview

Datasets are a core component of Redivis. They are a versioned collection of tables containing data, alongside rich documentation and metadata.

Datasets can be hosted by organizations or individual users, and every dataset has its own Dataset page. Datasets can be shared with other users on Redivis according to their access configuration.

1. Create a dataset

Administrators can create datasets for their organization from the Datasets tab of their organization's Administrator panel. These datasets can be seen and managed by any administrator in the organization. When released they will be visible on the organization’s home page to anyone who as overview access tot he dataset.

Alternatively, anyone with a Redivis account can create a dataset on the Datasets tab of their Workspace. These datasets are by default only visible to their owner, and have simplified options to support sharing with your collaborators.

When you first create a dataset, it will be unpublished and only visible to other editors. This means you can edit the dataset and validate everything before releasing it. You can also validate and reconfigure its access rules before release.

2. Import data

At the core of every dataset is the data it contains, so we recommend starting here.

All data in a dataset is stored in tables. You can create a new table on the Tables tab of your dataset and start importing data. Redivis can upload data from your computer or another location you’ve linked, such as Box, Google Drive, AWS, and Google Cloud.

Once your data is finished importing you can validate that this table looks as you expect it to.

You can create more tables here if this dataset has multiple separate tables.

However, if your data is split across multiple files that all follow the same structure (such as a different file for each state, or each year of data but with generally the same variables) you will want to import all of these files to the same table, where they will be automatically appended together.

3. Populate metadata

Metadata is essential to helping your researchers find and utilize your dataset. While some metadata will be generated automatically, such as variable summary statistics and counts, other metadata will require additional input.

Dataset metadata

On the Overview tab of the dataset there are multiple suggested sections to help break down information. You can fill out any of these that apply, such as the methodology, tags, contact information, etc.

Redivis will automatically generate citation and provenance information based on what we know about the dataset, but you can update this information with anything more specific.

If you have additional information that you want to include that doesn't fit one of these headers, you can create a custom section. Custom sections can also be set to be visible only to certain access levels if you have sensitive information.

Table metadata

You should also populate the metadata on each table. Tables can have a description, as well as an entity field that defines what each row in the table represents. You can also define the temporal and geographic range on the table, when relevant.

Variable metadata

Each variable within a table has its own metadata. The variable name and type will be pre-determined from your data, but you should add a short label and longer description to each variable to help researchers understand what that variable measures.

Additionally, some variables will contain coded values, in which case you should provide value labels that represent the human-readable term for each code.

4. Release

Once you are ready to make your dataset available to others, you'll need to release it. You can click the Review and publish or Review and release button in the top right of the page.

You'll want to double check all of your data before moving forward. While you can continue to edit the documentation and metadata after the version is released, the data in a version cannot be changed.

You can unrelease a version for up to 7 days after release, though this should generally be avoided. When updating your data to correct for mistakes, you’ll need to release a new version.

You should also confirm your access settings you set up when creating this dataset.

Once this dataset is released, it will become visible and available to anyone who would qualify for access.

Next steps

Edit data

You can use tools right on Redivis to create new versions of your dataset.

Administer your organization

Organizations allow for groups, centers, and institutions to more easily work with data by providing administrators with tools to effectively version and distribute data from a central location.

Learn more in the guide.

Learn more in the guide.

Learn more in the guide.

to set up a new organization, and learn more in the guide.

Upload tabular data as tables
Create and populate a dataset
Edit data in a dataset
Contact us
Administer an organization