Glossary

Redivis defines a tiered access level system on datasets and tables, allowing data administrators to make as much information as widely available as possible without compromising data security. Valid access levels on Redivis are:

  • None

  • Overview

  • Metadata

  • Sample (datasets only)

  • Data

  • Edit

Datasets are the fundamental entity on Redivis. They are a version-controlled resource made up of documentation and any number of tables, unstructured data files, and corresponding metadata.

An entity is a table-level characteristic that defines what an individual record, or row, in the table represents. For example, a row may represent an individual person, or a hospitalization, or a hospital. By defining the entity on tables, researchers can quickly develop a clear picture of what the information in the table represents.

Dataset files are used for storing unstructured data within datasets. Redivis supports any file type, at scale (millions of files, multiple terabytes per file). Every file is assigned a globally unique identifier, and is indexed in a file index table, allowing you to query and analyze these files within a workflow.

Logins define pathways for you to sign in to your Redivis account. For example, you may use your academic institution login as well as a personal Google account, facilitating account recovery if you move between institutions. When you're a member of an organization, you must authenticate with the login specified through your membership.

Institutions serve as an umbrella across multiple organizations, providing a shared administrative context for certain actions (provisioning new organizations, managing storage costs) while also offering a unified discovery interface for searching across all datasets within an institution.

A member represents the intersection between a Redivis user and an organization. Members of organizations can apply for data access and interact with other organization resources. Specific members can also be made organization administrators.

An organization is an administrative context for hosting and managing data on Redivis. Each organization has its own landing page, or "data portal". Organization are managed by administrators who curate the organization's datasets and control how they are made accessible to the research community.

Permission groups define standardized access controls that can be applied across an organization's datasets.

A workflow is a secure, high-performance and collaborative environments for analyzing data. In a workflow, researchers can work with any dataset they have access to, and build out analytical workflows in a reproducible manner.

Requirements are created by organizations, and can be assigned to a specific access level on a dataset. Researchers must be approved for all requirements before they can gain access at that level. Some requirements are fulfilled directly by an organization member, others by a study that is working with that organization's datasets.

Restrictions define additional rules on datasets, even once a user has access. A common use case is to limit data exports to specific environments to limit the risk of data exfiltration. Restrictions can be configured such that users can request exceptions on a case-by-case basis.

Dataset owners can optionally create a 1% sample on all tables of their dataset. With this sample created, access to the sample can be controlled separately from the full dataset, and researchers can take advantage of faster computations on the data subset. When datasets are sampled on a common variable, a deterministic sample is generated such that all tables contain a sample from the same subset.

A study represent a group of collaborators working on a common research initiative. Studies can interact with multiple datasets and create many workflows as they explore their investigation. Certain requirements can also be fulfilled on behalf of a study.

Tables represent the fundamental "data" component on Redivis. They are made up of rows (a.k.a. records, observations) and variables (a.k.a. columns, indicators). Datasets can contain any number of tables, and these tables can then be queried, analyzed, and merged in workflows — where derivative tables are then created as the output of a transform.

Tags can be assigned to datasets to aid in their discoverability and categorization.

A transform exists as a node within a workflow. It takes any number of tables as input and executes a series of steps to merge, filter, analyze, and collapse these inputs into one output table. Transforms are built on SQL and researchers can develop transforms using a graphical interface or by inputting SQL code directly.

Variables represent the concept being measured by a particular column in a table. Variables can contain various metadata to help researchers better understand them, display univariate summary statistics aligned with the variable's type, and are referenced by transforms when manipulating data.

A value list a saved set of literals that are stored within a workflow. Value lists can be referenced by transforms throughout the workflow, allowing researchers to easily (and cleanly) reference a consistent set of values as they manipulate data.

All datasets on Redivis efficiently store their tables' data through an immutable version history. Each version of a dataset stores different documentation, metadata, and tables, allowing datasets to evolve over time without compromising reproducibility. Any reference to a dataset's tables in a workflow (or through the API) will be tied to a specific version.

Your workspace is your private home on Redivis, showing various resources related to your work. From your workspace, you can create datasets and workflows, find organizations, and manage your profile information.

Last updated