The project tool in Redivis allows you to clean, manipulate, and combine different datasets to produce reshaped and reduced tables for further analysis. The project query engine handles massive tables (tens of billions of records) with ease, and allows you to focus on rapid iteration as you explore your data and validate your data processing workflow.
Navigate to the Projects tab of your workspace and click the New project button to create and open your project. You can also click the Add to project button on any dataset page to create a new project with that dataset in it.
A pop-up window will appear where you can name and describe your project, assign it to a study, and control user access.
Once in the project tool, click one of the +Add dataset buttons to search and select your dataset of choice. You can work with an existing dataset on Redivis, such as this BLS Local Area Unemployment Statistics, or create your own. Once you've added data to the project, you can start manipulating and querying any of the tables.
You can select any of the tables within the dataset to work with and click the +Transform button to begin querying the data. Transforms allow us to filter and collapse rows, create variables, and combine several tables into one output table. Within the transform, we see that you can bring in new tables, create new variables, filter rows and more.
Within a transform, you can combine tables within and across datasets from different organizations. Let's add the US Gini Coefficient dataset—publicly available on Redivis— to this project and join it with our unemployment data.
You can perform various types of Joins in Redivis, but today we will run an inner join on the county identifier variables. Before we run each transform, we must select which variables to propagate to our output table. Let's keep the county/state name, unemployment rate, population, and Gini index.
Clicking the Run button in the upper right hand corner should run this transform, but we see here that the transform is invalid because the county codes are of different types (one is a string while the other is an integer).
In Redivis, we can easily and dynamically retype and rename variables. We will control click on the
fipscounty variable in the Discard section and cast it as an integer.
We can now click Run and view the resulting output table in a Variables or Cells view. Here we can also click through some of the variable metadata and top-level summary statistics.
In addition to combining tables, transforms are a place where we can create new variables using a variety of different methods. For example, we can use the Case method to categorize counties as having either a small or large population (large being greater than 100,000).
One of the most common operations in a transform is to reduce our source table to a subset of records that match our criteria. Using the filter interface, we can easily look at data for counties in the West Coast (California, Oregon, Washington, and Alaska).
Once you have a table with all the information you need in it, you can select any table and click the + Notebook button to create a notebook referencing this table. You can programmatically add any other table references to this notebook, as long as the tables are within this project.
Notebooks are available in Python, R, Stata (with a license) or SAS (with a license). Import any libraries you'd like to work with and write code referencing this table.
You can share your in-progress work or finished results with collaborators by clicking on your user picture in the top left of a project. You can both work on the project in real-time seamlessly. Note that If any of the data in your project is restricted, your collaborator must also have data access to view contents of the project and might need to apply on their own.
If you'd like to use the work you're creating on Redivis elsewhere, click the Download button from the top right of a table or notebook to export in your preferred format.
You can export any non-restricted data to either to your computer or to one of several integrations, including Google Data Studio.
You can also use our API information to programmatically access data from external software and visualization tools, such as JupyterLab, Observable, or any other environment.
Learn more about exporting data in the export and integrations documentation.
Leverage value lists to codify common sets of filter conditions
Write custom SQL to perform complex and highly customized data transformations
Create partition variables to analyze and collapse across common keys (e.g., average hospital length of stay by disease)
Share and collaborate with your peers in real time