Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Workflows
      • Sub-institutions
      • Reports
      • Logs
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • 1. Upload the raw data
  • 2. Add this dataset to a workflow
  • 3. Transform data
  • Example edit 1: Rename and retype a variable
  • Example edit 2: Recoding the values of a variable
  • Example edit 3: Add a new variable
  • 4. Re-upload the finished tables
  • Next steps

Was this helpful?

Export as PDF
  1. Guides
  2. Create & manage datasets

Cleaning tabular data

Last updated 5 months ago

Was this helpful?

Overview

Redivis datasets are a great place to host data for interrogating in a workflow, but you can also edit the underlying data.

Perhaps you found an issue with the source, or want to restructure it, before making it available to others. Redivis has all the tools you'll need to do this in a versioned, step-by-step process with transparency into the changes you've made.

1. Upload the raw data

If you haven't yet, you want to work with to a Redivis dataset and release the version. A personal dataset or one belonging to an organization works just as well.

2. Add this dataset to a workflow

Create a new workflow and add this dataset to it.

If you want to share the data transformation process with others for transparency, you can make this workflow public in the share modal. (Due to the way Redivis works, still only people with access to the underlying data will be able to see the data in the workflow even though the workflow is public.)

For this guide you can follow along in this workflow .

3. Transform data

Select the table you want to make changes to and create a new transform.

Example edit 1: Rename and retype a variable

Example edit 2: Recoding the values of a variable

Create a new variable with the same name. "Keep" this variable and "Discard" the original variable. Select the method as "Case (if/else)" and create the conditions you want to recode to.

Example edit 3: Add a new variable

This example shows a new variable with today's data as the "date_uploaded" but you could create any variable you want from the data in the table. Perhaps a aggregation that would be helpful to see with this data? Or the sum of multiple other variables?

You can transform this table however you want, using the Graphical interface or SQL code. Make sure to move all variables in this table from the "Discard" section to the "Keep" section except for any variables that have been replaced and will be left behind (in this case the store_and_fwd_flag from the source table).

Validate that this new table looks correct by looking at the output table below this transform!

Make changes to any tables in this dataset in this same wokflow.

4. Re-upload the finished tables

Go back to the original dataset, and create a new version.

Open a table that you made changes to in your workflow, and click to "Import data."

Choose your merge strategy as "Replace" since you want to replace the existing table with the new one you created.

Select the data source as "Redivis" and type in the information from the workflow you made your changes. The sequence here will be:

  1. Your user name

  2. The name of the workflow (underscores replace spaces)

  3. The table name (underscores replace spaces)

For example: username.testworkflow.table_name

Once it's done uploading you can validate that the data looks like it's supposed to.

Now you can close this table and repeat this process for any other tables in this dataset you've edited.

Once this is finished, you can add the link to the worfklow into this dataset's documentation or release notes along with a note on the changes. Remember that if you made the workflow public, anyone who has data access to this dataset can view the changes you made.

Redivis stores data as compactly as possible so if you are concerned with data storage costs, only any new records created will create more storage need. If you would like to delete the first version of the data you can also do that by opening the version modal in the dataset editor and clicking "Delete version" here.

Next steps

Start working with your data

Once your dataset is released, bring it into a workflow to transform and analyze it leveraging lightning fast tools from your browser.

Use this to edit this table. Some common actions include:

Create new steps to and any variables you'd like to update.

Now release this version! This new version of the data contains the edited data. Anyone using this data in a workflow will see that this dataset has a new version next time they open a workflow. You can see the and the of our Demo tables updates live.

Learn more in the guide.

transform
rename
retype
release notes
updated table
Analyze data in a workflow
upload the data
Demo tables edits