No-code visualization

This guide demonstrates using a Redivis workflow to gather key variables from different tables, clean them, and and consolidate them into a single table for analysis in Google Looker, all using point and click interfaces.

Workflow objective

We'll use Gini coefficient data (a common measure of income distribution) from the American Community Survey (ACS) to build a map that shows the range of income inequality by US county in 2018.

This workflow is on Redivis! We also suggest you recreate this workflow as we go to best learn the process.

1. Explore data and start a workflow

We have already uploaded all relevant data for this workflow into a single dataset in the Redivis Demo organization called Example Workflow Dataset which contains contains three tables:

ACS Gini coefficient table
ACS table that measures population by county
National Weather Service dataset that maps a US county to lat/long coordinates.

Add this dataset to your workflow and click on the dataset node to see all of the available tables. Select Gini Index by County and can look at the data format by clicking Cells.

2. Clean data

We want to map the Gini estimate and population by US county in our final data visualization. To map data points by county, we will need a variable that represents the latitude, longitude coordinates of that county. We also want the state, county name, and five-digit county code.

Therefore, we want our final dataset to include six distinct variables:‌

Gini Estimate
State
County Name
County Code
Latitude, Longitude
Population

We will get Lat/Long and Population by joining our table with other tables later on in the workflow. For now, we can create a transform to start reshaping our data.

Rename: Gini Estimate

We already have this variable in our dataset: B19083_001E. We will create a Rename step to rename this to gini_estimate so we can keep better track of it.

New variables: County and State Names

We want to break up our NAME variable into two separate columns, county and state. So we will add the Create variables step and use the Regexp extract method.

This method allows users to create a new variable with characters that match a specific pattern of the old variable. For example, creating a new variable for county, we can select all characters in NAME preceding the comma. We can add a new block and do a similar process on the NAME variable to get the State.

New variable: County Code

The last five digits in the GEO_ID variable represent the county code. We create another new variable block and name our new variable County_code. Select the Substring method and select all values starting at index 10 up to a max length of five.

Variable selection

Finally, we choose which variables to keep or discard in our transform. We’ll keep all variables except for GEO_ID, NAME (county and state combined), and B19083_001M (the margin of error for the Gini estimate).

Click the run button in the upper right-hand corner to run this transform.

3. Sanity check the output table

Click on the newly generated output table to investigate the output of the transform. We can see that this table has the same number of records as our initial table and has the four variables we selected. If we click on the County and State variables we can see in the frequency tables that they look like we wanted them to.

Since this table looks like we expect we can move on to the next step! Otherwise we'd need to go back to the initial transform to change our inputs.

4. ‌Join geographic information

We now have a table with the Gini estimate, state, county name, and county code, but we need the latitude and longitude information in order to map each county and we also want to account for the size of each county with a population variable. To do this, we'll perform two joins.

We could continue to work in our initial transform but we are choosing to create new transforms for each join to keep steps separated conceptually and provide output tables to reference along the way.

Matching county to latitude, longitude data

First, we will match each county code to its lat/long coordinates by joining with the "County to Lat/Long Coordinates" table.

Create a new transform and add a Join step.

A Left Join returns the complete source table (Gini Index by County) and all matching records from the right table (County to Lat/Long). We set the County_code variable in the source table equal to the County_code variable in the right table, which matches a set of lat/long coordinates to each county code (if there is no coordinate, it will return Null).

When you set two variables equal to each other in a Join, they must be of the same type. If, for example, you set a string type variable equal to an integer type variable, you will have to retype the string as an int (or vice versa).

Some county codes, however, map to more than one set of latitude, longitude coordinates so we must create two new variables in this transform: unique_lat and unique_long.

For unique_lat, we will partition on all kept variables except LAT and take the average of the LAT variable. This says that for that each county, compute the average of all possible latitudes and store that average in the variable unique_lat.

We will select all variables from the source table as well as unique_lat and unique_long in our output table. Finally, we select "distinct" in our variable selector so that we will drop all records that are unique.

Running this transform outputs the following table, where each county now corresponds to a unique set of latitude and longitude variables.

Matching county to population data

We now have all the desired variables in our dataset except for population per county, which we will need when we create the final visualization.‌

Create a new transform and add a Join step.

Using the same steps from the previous join, we will perform a second join to match each county to its total population by incorporating the Population by County table (also in the original dataset). We select a Left Join and set the County_code variable in the source table (Gini Index by County) equal to the County_code variable in the right table (Population by County).

Selecting all variables to remain in our output table and running this transform outputs the following table.

5. Final data cleaning

We will be using Google Looker Studio to visualize this data so we will need to update our data to make it work smoothly with their specified format for geographic data. Using their reference sheet, we can see that we will need our latitude and longitude information in the format:

Comma separated latitude and longitude decimal values (e.g., "51.5074,-0.1278" specifies to London, England)

We will need to combine our two separate values and add a comma between them.

We can do that using the Concat method for creating a new variable, but that method only accepts string inputs so we'll first need to retyping the unique_lat and unique_long variables from float to string using the Retype step.

Then we use the Create new variables step to make a new variable named lattitude_longitude and use the Concat method to concatenate the unique_lat string, a string with a comma, and the unique_long string.

We can now discard unique_lat and unique_long and keep the combined variable, latitude_longitude.

Running this transform yields the final table. From here, we can edit variable metadata and/or download our new dataset in the format that we'd like.

6. Export data to Looker Studio

The next step would normally be to create a notebook in this workflow and use Python, R, or Stata to analyze this table. However if we want to create a quick visualization and aren't familiar with any of those coding languages we have easy options to export data to other systems such as Google Looker Studio.

For this example we will link this table directly to Google Looker Studio by clicking Export table on this table.

We could download our final table and the re-upload it into Looker Studio, but we choose to link the table through the Redivis connector so that if we come back to this original workflow and make changes they will be reflected in the visual that we're about to make there.

Follow the prompts of the connector to log in and authorize access, then we'll need to indicate the table we'd like to use. For that we'll need:

The owner of the workflow's ID (shown in the top menu bar of the workflow)
The name of the workflow (shown in the middle of the black workflow toolbar)
The name of the table (shown on the title of the table node)

Note that the connector won't recognize some symbols (such as :) so you might need to update one of them if there is an issue connecting.

When the table is being imported we will have the option to change the type for variables. We'll need to change the type of our lattitude_longitude variable from string to the Lattitude, Longitude geography type.

Then click Create report to get started!

7. Build a visualization

For this example we are going to build a bubble map that shows the size of population against the intensity of gini disparity in certain regions.

Get started by adding a chart and selecting the bubble map type. We will need to select Redivis as the data source for this map.

Then we will need to define which of our variables map to which parts of the visual. We will also want to change how they are aggregated, from SUM to AVG.

Location: lattitude_longitude
Tooltip: County
Size: (AVG) population
Color metric: (SUM) gini_estimate

The resulting map is automatically interactive. Users can hover over each bubble and view the county name, population, and Gini coefficient. The bubble size is determined by the county population and the bubble color is determined by the Gini value. View the interactive report:

Gini Index by US County 2018 (Example Project)Google Data Studio

You can continue to create a variety of visualizations in this report, including scatter plots, Google maps, and stacked bar charts.

Refer to the Looker Studio Help pages for additional specific guidance!

Next steps

Perhaps we see something in this workflow we want to tweak, or we want to go back and change some of our data decisions. Workflows are iterative and at any point you can go back and change our source data, our transform configuration or notebooks and rerun them.

Notebooks can also create output tables which allow you to sanity check the work we did in the notebook or perhaps create a table to use in another notebook or transform. You can also fork this workflow to work on a similar analysis, or export any table in this workflow for work elsewhere.

Last updated 6 months ago

Was this helpful?