Cleaning tabular data
Last updated
Last updated
Redivis datasets are a great place to host data for interrogating in a project, but you can also edit the underlying data.
Perhaps you found an issue with the source, or want to restructure it, before making it available to others. Redivis has all the tools you'll need to do this in a versioned, step-by-step process with transparency into the changes you've made.
If you haven't yet, upload the data you want to work with to a Redivis dataset and release the version. A personal dataset or one belonging to an organization works just as well.
For this guide we will use this publicly available dataset Demo tables.
Create a new project and add this dataset to it.
If you want to share the data transformation process with others for transparency, you can make this project public in the share modal. (Due to the way Redivis works, still only people with access to the underlying data will be able to see the data in the project even though the project is public.)
For this guide you can follow along in this project Demo tables edits.
Select the table you want to make changes to and create a new transform.
Use this transform to edit this table. Some common actions include:
Create new steps to rename and retype any variables you'd like to update.
Create a new variable with the same name. "Keep" this variable and "Discard" the original variable. Select the method as "Case (if/else)" and create the conditions you want to recode to.
This example shows a new variable with today's data as the "date_uploaded" but you could create any variable you want from the data in the table. Perhaps a aggregation that would be helpful to see with this data? Or the sum of multiple other variables?
You can transform this table however you want, using the Graphical interface or SQL code. Make sure to move all variables in this table from the "Discard" section to the "Keep" section.
Validate that this new table looks correct by looking at the output table below this transform!
Make changes to any tables in this dataset in this same project.
Go back to the original dataset, and create a new version.
Open a table that you made changes to in your project, and click to "Import data."
Choose your merge strategy as "Replace" since you want to replace the existing table with the new one you created.
Select the data source as "Redivis" and type in the information from the project you made your changes. The sequence here will be:
Your user name
The name of the project (underscores replace spaces)
The table name (underscores replace spaces)
For example: username.testproject.table_name
Once it's done uploading you can validate that the data looks like it's supposed to.
Now you can close this table and repeat this process for any other tables in this dataset you've edited.
Once this is finished, you can add the link to the project into this dataset's documentation or release notes along with a note on the changes. Remember that if you made the project public, anyone who has data access to this dataset can view the changes you made.
Now release this version! This new version of the data contains the edited data. Anyone using this data in a project will see that this dataset has a new version next time they open a project. You can see the release notes and the updated table of our Demo tables updates live.
Redivis stores data as compactly as possible so if you are concerned with data storage costs, only any new records created will create more storage need. If you would like to delete the first version of the data you can also do that by opening the version modal in the dataset editor and clicking "Delete version" here.
Once your dataset is released, bring it into a project to transform and analyze it leveraging lightning fast tools from your browser.
Learn more in the Work with data in a project guide.