Dataset nodes

A dataset node is a copy of a dataset in a project.

Add a dataset to the project

To work with additional datasets within a project, click + Dataset in the minimap (when empty) or on the left side of the top bar.

In the modal, you can select a dataset to add it to your project. Note that you might not be able to select some datasets if they already exist in your project. You can only have one copy of a dataset in a project at a time. You can however have different versions of the same dataset simultaneously.

Dataset nodes display a list of the tables they contain. You can click on any table to view its contents, or click "Query" to build a transform on it.

Samples

Some large datasets have 1% samples which are useful for quickly testing querying strategies before running transforms against the full dataset.

If a 1% sample is available for a dataset, it will automatically be added to your project by default instead of the full sample. Samples are indicated by the dark circle icon to the top left of a dataset node in the left panel and in the list of the dataset's tables.

All sampled tables in the same dataset will be sampled on the same variable with the same group of values (so joining two tables in the same dataset with 1% samples will still result in a 1% sample).

‚ÄčTo switch to the full sample, click "Sample" button in the top right of the menu bar when you have a dataset selected.

Your downstream transforms and tables will become stale, since an upstream change has been made. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.

Versions

When a new version of a dataset is released by an administrator, the corresponding dataset node on your project minimap will become purple. To upgrade the dataset's version, click the "Version" button in the top right of the menu bar when you have a dataset selected.

You can select whichever version you want to use here, or view the full version history on the dataset page.

After updating, your downstream transforms and tables will become stale. You can run these nodes individually to update their contents, or use the run all functionality by clicking on the project's name in the top menu bar.