Dataset nodes

Overview

A dataset node is a copy of a dataset in a project.

Dataset nodes display a list of the tables they contain. You can click on any table to view its contents, or click "Transform" to build a transform on it.

Adding dataset nodes to a project

Adding a dataset to a project will make a copy of that dataset in the form of a dataset node at the top of your project tree. You can add a dataset to the project by:

  • Click the Analyze in project button on a dataset page.

  • Click the Add dataset button in the top left of the project toolbar within a project.

Restrictions

No project can have two copies of the same version of the same dataset.

Samples

Some large datasets have 1% samples which are useful for quickly testing querying strategies before running transforms against the full dataset.

If a 1% sample is available for a dataset, it will automatically be added to your project by default instead of the full sample. Samples are indicated by the dark circle icon to the top left of a dataset node in the left panel and in the list of the dataset's tables.

All sampled tables in the same dataset will be sampled on the same variable with the same group of values (so joining two tables in the same dataset with 1% samples will still result in a 1% sample).

​To switch to the full sample, click "Sample" button in the top right of the menu bar when you have a dataset selected.

Your downstream transforms and tables will become stale, since an upstream change has been made. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.

Versions

When a new version of a dataset is released by an administrator, the corresponding dataset node on your project tree will become purple. To upgrade the dataset's version, click the "Version" button in the top right of the menu bar when you have a dataset selected.

You can view version diffs and select whichever version you want to use here.

After updating, your downstream transforms and tables will become stale. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.

Dataset node states

As you work in a project, nodes colors and symbols will change on the tree view to help you keep track of your work progress.

StateDisplayDetails

Sampled

Black circle with 1% icon

This means that you are using a 1% sample of the data. When a dataset has a sample, it will automatically default to it when added to a project. You can change this to the full sample and back at any time in the dataset node

Outdated version

Purple background

For datasets this means that you are not using the latest version. This means that you have either intentionally switched to using an older version, or that this dataset's administrator has released a new version that you can to.

Incomplete access

All black background, or dashed borders

You don't have full access the node. Click on the Incomplete access button in the top bar to begin applying for access to the relevant datasets.

Last updated