Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Overview
  • Step structure:
  • Field descriptions
  • Examples
  • Example 1: Basic filter
  • Example 2: Multiple conditions
  • Example 3: Nested conditions
  • Example 4: Comparing variables and value lists
  • Reference: Comparison statements
  • Left expression
  • Operator
  • Right expression

Was this helpful?

Export as PDF
  1. Reference
  2. Workflows
  3. Transforms

Step: Filter

Overview

The Filter step will select for rows that meet a certain set of conditions.

Example starting data:

/*---------+--------*
 | student | score  |
 +---------+--------+
 | jane    | 83     |
 | neal    | 35     |
 | sam     | 74     |
 | pat     | 62     |
 *---------+--------*/

Example output data:

Filter out rows with scores less than 70.

/*---------+--------*
 | student | score  |
 +---------+--------+
 | jane    | 83     |
 | sam     | 74     |
 *---------+--------*/

Step structure:

Basic state

  • A filter step will be made up of one or more filter blocks with a completed condition.

  • When there are multiple filter blocks in a step, conditions in all blocks must be met for rows to be kept.

  • If you have a more complex filter statement that is dependent on multiple nested conditions you can press the + button to expand the filter block.

Expanded state

  • When multiple conditions are needed in a block, you must specify how they relate to each other (AND vs OR)

  • Any nested conditions are resolved before higher level conditions.

Field descriptions

Variable(s)

The variable whose values will be evaluated.

Depending on the operator selected, it may contain multiple variables. Use wildcard characters (e.g. * for bulk selection.

[Operator]

How the two sides of the condition will be evaluated.

Value(s) or Variable(s)

Examples

Example 1: Basic filter

Lets say we only want to reduce our table to only contain information about results from the final test.

Starting data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 83    | jane    | 2020-04-01 |
 | quiz    | 35    | pat     | 2020-04-01 |
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 74    | jane    | 2020-05-01 |
 | midterm | 62    | pat     | 2020-05-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Input fields:

  • Variable(s): The variable test is where the final value we want to evaluate on is located.

  • [Operator]: We want fields that exactly match, so we choose =.

  • Value(s) or variable(s): We want to only keep rows where final is present, so we put that here.

Output data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Example 2: Multiple conditions

Let's say we don't just want values from the final but only those from the final with a score above 60.

Starting data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 83    | jane    | 2020-04-01 |
 | quiz    | 35    | pat     | 2020-04-01 |
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 74    | jane    | 2020-05-01 |
 | midterm | 62    | pat     | 2020-05-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Input fields:

  • We input the data as in the above example, but since we now have two conditions, we have to decided how they relate to each other. In this case we want data that meets all the conditions so select All conditions must be satisfied (AND).

Output data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Example 3: Nested conditions

Let's say we want to keep all data from the final greater than 60, or any any scores above 85.

Starting data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 83    | jane    | 2020-04-01 |
 | quiz    | 35    | pat     | 2020-04-01 |
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 74    | jane    | 2020-05-01 |
 | midterm | 62    | pat     | 2020-05-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Input fields:

  • We want to keep all rows where scores are over 85 OR are from the final and over 60. So we set up the final and over 60 conditions under an AND umbrella, and nest that under the OR umbrella alongside our condition about scores over 85.

  • When executing, the nested conditions (scores from the final over 60) will be evaluated first to be true or false. Then the higher level condition (scores over 85 OR (scores about the final and over 60) will be evaluated. Any rows that meet this higher level condition will be kept in the output, and any that do not will be discarded.

Output data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Example 4: Comparing variables and value lists

Let's say we only want to keep rows where scores are greater than the average score, and that are for our selected students. Right now our selected students are Jane and Pat but we know that might change in the future.

Starting data:

/*---------+-------+---------+------------+---------------*
 | test    | score | student | date       | score_average |
 +---------+-------+---------+------------+---------------+
 | quiz    | 83    | jane    | 2020-04-01 | 69            |
 | quiz    | 35    | pat     | 2020-04-01 | 69            |
 | quiz    | 89    | sam     | 2020-04-01 | 69            |
 | midterm | 74    | jane    | 2020-05-01 | 76.3333       |
 | midterm | 62    | pat     | 2020-05-01 | 76.3333       |
 | midterm | 93    | sam     | 2020-05-01 | 76.3333       |
 | final   | 77    | jane    | 2020-06-01 | 76            |
 | final   | 59    | pat     | 2020-06-01 | 76            |
 | final   | 92    | sam     | 2020-06-01 | 76            |
 *---------+-------+---------+------------+---------------*/

Input fields:

    • With multiple inputs on either side of a condition, the condition is evaluated as an OR. So this condition will evaluate to true if the value in student equals any value on the list (jane or pat).

    • While we could have input jane and pat into the right side of this condition and gotten the same result, using a list makes it easy in the future to change our selected students centrally and update it everywhere the list is used.

  • For our second condition, we want to evaluate how two values in the same row compare to each other.

    • When executed, this condition will look at each row to check if the value in score is greater than the value in score_average. If so, then the condition is met.

Output data:

/*---------+-------+---------+------------+---------------*
 | test    | score | student | date       | score_average |
 +---------+-------+---------+------------+---------------+
 | quiz    | 83    | jane    | 2020-04-01 | 69            |
 | midterm | 74    | jane    | 2020-05-01 | 76.3333       |
 | final   | 77    | jane    | 2020-06-01 | 76            |
 *---------+-------+---------+------------+---------------*/

Reference: Comparison statements

A comparison always evaluates to either TRUE or FALSE.

Comparisons are made up of one or more comparison rows. A comparison row always evaluates to TRUE or FALSE; multiple rows can be nested together with a logical AND/OR.

A comparison row is made up of three components:

Left expression

The left expression can contain variable parameters from the source table, joined tables, as well as any newly created variables. All referenced new variables must be "upstream" from the current comparison, with the exception of joins, which may reference new variables that are constructed by any variables upstream of that join.

Depending on the operator selected, the left expression may contain multiple variables. In this case, each left expression will be evaluated against the right expression(s), and logically joined via an OR. If you want to take the logical AND of multiple variables, create a separate comparison row for each variable.

Multiple left hand values are only supported for = and like operators.

Where multiple variables can be entered in a comparison, you can use * in the interface to select all matching results.

For example, typing DIAG* and pressing enter will add all variables beginning with "DIAG" to this field.

Operator

Redivis supports the following operators:

=, !=

Checks if any the value(s) in the left expression are (not) equal to any values in the right expression. NULLs are treated as equivalent (NULL == NULL -> TRUE and NULL != NULL -> FALSE)

>, >=, <, <=,

Checks if the value in the left expression is less than, greater than, etc. the right expression. String comparisons are lexicographically ordered, other data types are based on the numeric / temporal order of that value. Comparisons between NULL values will always be false.

like / ! like

Checks if the string(s) in the left hand expression matches specified pattern(s) in the right hand expression. The pattern may contain the following characters:

  • A percent sign "%" matches any number of characters

  • An underscore "_" matches a single character

  • You can escape "\", "_", or "%" using one backslash. For example, "\%"

Right expression

The right expression can contain any variables allowed in the left expression, as well as literal values and lists. The comparison row will evaluate to TRUE when the left hand expression matches any of the right hand expressions, except for the != and !like comparators, where the comparison will evaluate to true if all values are not equal to / not like the left expression.

To match against a null datum (empty cell), you must specify the special literal value NULL here.

Last updated 6 months ago

Was this helpful?

The variable or value which the previously chosen Variable(s) will be evaluated against. Depending on the operator selected, it may contain multiple variables, values, or a . Use wildcard characters (e.g. * for bulk selection.

First step, we create a called selected_students with the values jane and pat on it. Then we can select it as compared to student.

value list
value list