The project tool in Redivis allows you to clean, manipulate, and combine different datasets to produce reshaped and reduced tables for further analysis. The project query engine handles massive tables (tens of billions of records) with ease, and allows you to focus on rapid iteration as you explore your data and validate your data processing workflow.
The project for this walkthrough is publicly available here.
Navigate to the Projects tab of your workspace and click the New project button to create and open your project. You can also click the Add to project button on any dataset page to create a new project with that dataset in it.
A pop-up window will appear where you can name and describe your project, assign it to a study, and control user access.
Once in the project tool, click one of the +Add dataset buttons to search and select your dataset of choice. You can work with an existing dataset on Redivis, such as this BLS Local Area Unemployment Statistics, or create your own. Once you've added data to the project, you can start manipulating and querying any of the tables.
You can select any of the tables within the dataset to work with and click the +Transform button to begin querying the data. Transforms allow us to filter and collapse rows, create variables, and combine several tables into one output table. Within the transform, we see that you can bring in new tables, create new variables, filter rows and more.
Within a transform, you can combine tables within and across datasets from different organizations. Let's add the US Gini Coefficient dataset—publicly available on Redivis— to this project and join it with our unemployment data.
You can perform various types of Joins in Redivis, but today we will run an inner join on the county identifier variables. Before we run each transform, we must select which variables to propagate to our output table. Let's keep the county/state name, unemployment rate, population, and Gini index.
Clicking the Run button in the upper right hand corner should run this transform, but we see here that the transform is invalid because the county codes are of different types (one is a string while the other is an integer).
In Redivis, we can easily and dynamically retype and rename variables. We will control click on the
fipscounty variable in the Discard section and cast it as an integer.
We can now click Run and view the resulting output table in a Variables or Cells view. Here we can also click through some of the variable metadata and top-level summary statistics.
In addition to combining tables, transforms are a place where we can create new variables using a variety of different methods. For example, we can use the Case method to categorize counties as having either a small or large population (large being greater than 100,000).
One of the most common operations in a transform is to reduce our source table to a subset of records that match our criteria. Using the filter interface, we can easily look at data for counties in the West Coast (California, Oregon, Washington, and Alaska).
On any table in the project, you can click the Download button in the upper right hand corner to export the data in your preferred format. You can also reference the API information to integrate the data directly into your own software and visualization tools, such as JupyterLab, Observable, and Google Data Studio.
Learn more about exporting data in the export and integrations documentation.
Leverage value lists to codify common sets of filter conditions
Writing custom SQL to perform complex and highly customized data transformations
Creating partition variables to analyze and collapse across common keys (e.g., average hospital length of stay by disease)
Share and collaborating with your peers in real time