Table concepts

Overview

Tables are the "container" for all data on Redivis. They are made of rows and columns, where a row represents an individual entry or observation, and the column represents a Variable.

Tables belong to either a Dataset or a Workflow. In datasets, tables are created by uploading data. In workflows, tables are created as the resulting output of a transform or notebook.

When exploring a table, you will be able to switch between the following tabs:

Tables can be used for analysis within a Workflow. Alternatively, they can be exported for analysis in other environments.

Embedded view of a table with 2.7B records. Click around to interact!

Table characteristics

Field
Notes

Name

The table's name. If in a dataset, must be unique across all tables for that version of the dataset. If in a workflow, must be unique across all tables currently in the workflow.

Description

Optional. A free-form description of the table's contents. May not exceed 5000 characters.

Bibliography

This table's citation, and any recorded related identifiers.

Variable count

Total number of variables in the table.

Row count

Total number of rows, or records, in the table.

Size

Total size of the table, in bytes.

Entity

Optional. The concept that one record in this table represents. For example, the table's entity might represent a unique patient, or a specific hospitalization, or a prescription.

Temporal range

Optional. The range of time that this table covers. This can either be set manually, or calculated from the min/max of a particular variable.

If calculated from a variable, that variable must have type date, dateTime, or integer. If the variable is an integer, its values will be assumed to represent a year and must be in the range [0, 9999].

Sample

If this table is sampled, you will see a marker for whether you are looking at the full dataset or the 1% sample. To interact with sampled tables, add the dataset to a workflow.

Table types

All data on Redivis is stored within a table, including geospatial data and unstructured files:

Tabular

Tabular data is, unsurprisingly, stored within a table on Redivis, with its rows and columns mapped to the table's rows and variables. Table contents can be viewed in the cells viewer, queried, downloaded, and read into various programming interfaces as a tabular data frame.

Geospatial

Geospatial data is also stored within tables on Redivis. Each row maps to a geospatial feature, with various feature metadata encoded as variables, alongside a geometry variable that encodes the actual feature (which could be a point, line, polygon, multi-polygon, etc).

When viewing the cells, you can preview a given geographic feature by hovering (or clicking) on the value in the cells view. Geospatial tables can also be queried (taking advantage of various geography methods), exported, or read as geospatial data frame in various programming interfaces.

Embeded view of a table with geospatial data

Files

When data files aren't inherently tabular or geospatial (e.g., a collection of 1 million images), they can still be uploaded to a dataset. In this case, these individual data files are also represented within a table, often referred to as a "file index table". Each record in the table represents a single file, with a globally unique file_id variable, as well as other variables containing metadata about the file.

When viewing the cells, you can preview a given file by hovering (or clicking) on the file_id value. The metadata in file index tables can also be queried, potentially allowing you to join on file names, or extract a subset of files based on certain characteristics. Finally, these files can be read and downloaded via various programming interfaces, or exported via the interface.

Embedded view of a file index table

Access

Access to a particular table will always be governed by the owner(s) associated with the table's contents.

For tables within a dataset, your access to a table will be the same as access to that dataset. You must have metadata access in order to view variable names and summary statistics, and data access in order to view cells, run queries, and export data.

For tables within a workflow, you must both have view access to the workflow, as well as corresponding access to all datasets whose data is present in that table. For example, if a particular workflow combines content from two datasets into a new output table, you'll need access to both datasets to view the table.

Bibliography

All tables automatically encode information about their lineage on Redivis. For example, if a table is created within a dataset, then transformed in a workflow, which is then forked into another workflow and joined with a new dataset, all of the information about the source datasets and workflows that created the table will be present in its bibliography.

This allows you to authoritatively cite any table on Redivis, making sure to credit everyone whose work contributed to a final output!

Last updated

Was this helpful?