Table nodes in your project can either be generated by running transform, or will be part of a dataset you've added. The main purpose of table nodes in projects is to store data which you can sanity check and further query.
You can create multiple transforms to operate on a single table, which allows you to create various branches within your project. To create a new transform, select the table or dataset and click the small + icon that appears under the bottom right corner of the node.
After you run a transform, you can investigate the downstream output table to get feedback on the success and validity of your querying operation – both the filtering criteria you've applied and the new features you've created.
Understanding at the content of an output table allows you perform important sanity checks at each step of your research process, answering questions like:
- Did my filtering criteria remove the rows I expected?
- Do my new variables contain the information I expect?
- Does the distribution of values in a given variable make sense?
- Have I dropped unnecessary variables?
If you haven't interacted with tables in your project for a while, these tables may become archived, which will temporarily limit your ability to view cells and query data in that table. This is done to prevent runaway storage costs, while leveraging the built-in reproducibility of projects to allow you to unarchive the table and pick up where you last left off.
The archival algorithm prioritizes tables that are large, quick to regenerate, and intermediary (not at the bottom of the tree). It currently does not archive tables less than 1GB; in many cases you may never interact with archived tables.
Note that the transform immediately upstream (or any additional upstream transforms, if multiple sequential tables are archived) is invalid, you'll have to resolve the invalid state before un-archiving the table.
If you've added a dataset to a project that contains files (storage for non-tabular data) you will see a table with a File index label in that dataset's list of tables. This is an automatically generated table, where every row represents one file. You can work with this table just like any other in the project tool, but the
file_idvariable will remain linked to the files in that dataset for use in a notebook.