Step: Filter

Overview

The Filter step will select for rows that meet a certain set of conditions.

Example starting data:

/*---------+--------*
 | student | score  |
 +---------+--------+
 | jane    | 83     |
 | neal    | 35     |
 | sam     | 74     |
 | pat     | 62     |
 *---------+--------*/

Example output data:

Filter out rows with scores less than 70.

/*---------+--------*
 | student | score  |
 +---------+--------+
 | jane    | 83     |
 | sam     | 74     |
 *---------+--------*/

Step structure:

Basic state

  • A filter step will be made up of one or more filter blocks with a completed condition.

  • When there are multiple filter blocks in a step, conditions in all blocks must be met for rows to be kept.

  • If you have a more complex filter statement that is dependent on multiple nested conditions you can press the + button to expand the filter block.

Expanded state

  • When multiple conditions are needed in a block, you must specify how they relate to each other (AND vs OR)

  • Any nested conditions are resolved before higher level conditions.

Field descriptions

Examples

Example 1: Basic filter

Lets say we only want to reduce our table to only contain information about results from the final test.

Starting data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 83    | jane    | 2020-04-01 |
 | quiz    | 35    | pat     | 2020-04-01 |
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 74    | jane    | 2020-05-01 |
 | midterm | 62    | pat     | 2020-05-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Input fields:

  • Variable(s): The variable test is where the final value we want to evaluate on is located.

  • [Operator]: We want fields that exactly match, so we choose =.

  • Value(s) or variable(s): We want to only keep rows where final is present, so we put that here.

Output data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Example 2: Multiple conditions

Let's say we don't just want values from the final but only those from the final with a score above 60.

Starting data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 83    | jane    | 2020-04-01 |
 | quiz    | 35    | pat     | 2020-04-01 |
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 74    | jane    | 2020-05-01 |
 | midterm | 62    | pat     | 2020-05-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Input fields:

  • We input the data as in the above example, but since we now have two conditions, we have to decided how they relate to each other. In this case we want data that meets all the conditions so select All conditions must be satisfied (AND).

Output data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Example 3: Nested conditions

Let's say we want to keep all data from the final greater than 60, or any any scores above 85.

Starting data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 83    | jane    | 2020-04-01 |
 | quiz    | 35    | pat     | 2020-04-01 |
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 74    | jane    | 2020-05-01 |
 | midterm | 62    | pat     | 2020-05-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 59    | pat     | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Input fields:

  • We want to keep all rows where scores are over 85 OR are from the final and over 60. So we set up the final and over 60 conditions under an AND umbrella, and nest that under the OR umbrella alongside our condition about scores over 85.

  • When executing, the nested conditions (scores from the final over 60) will be evaluated first to be true or false. Then the higher level condition (scores over 85 OR (scores about the final and over 60) will be evaluated. Any rows that meet this higher level condition will be kept in the output, and any that do not will be discarded.

Output data:

/*---------+-------+---------+------------*
 | test    | score | student | date       |
 +---------+-------+---------+------------+
 | quiz    | 89    | sam     | 2020-04-01 |
 | midterm | 93    | sam     | 2020-05-01 |
 | final   | 77    | jane    | 2020-06-01 |
 | final   | 92    | sam     | 2020-06-01 |
 *---------+-------+---------+------------*/

Example 4: Comparing variables and value lists

Let's say we only want to keep rows where scores are greater than the average score, and that are for our selected students. Right now our selected students are Jane and Pat but we know that might change in the future.

Starting data:

/*---------+-------+---------+------------+---------------*
 | test    | score | student | date       | score_average |
 +---------+-------+---------+------------+---------------+
 | quiz    | 83    | jane    | 2020-04-01 | 69            |
 | quiz    | 35    | pat     | 2020-04-01 | 69            |
 | quiz    | 89    | sam     | 2020-04-01 | 69            |
 | midterm | 74    | jane    | 2020-05-01 | 76.3333       |
 | midterm | 62    | pat     | 2020-05-01 | 76.3333       |
 | midterm | 93    | sam     | 2020-05-01 | 76.3333       |
 | final   | 77    | jane    | 2020-06-01 | 76            |
 | final   | 59    | pat     | 2020-06-01 | 76            |
 | final   | 92    | sam     | 2020-06-01 | 76            |
 *---------+-------+---------+------------+---------------*/

Input fields:

  • First step, we create a value list called selected_students with the values jane and pat on it. Then we can select it as compared to student.

    • With multiple inputs on either side of a condition, the condition is evaluated as an OR. So this condition will evaluate to true if the value in student equals any value on the list (jane or pat).

    • While we could have input jane and pat into the right side of this condition and gotten the same result, using a list makes it easy in the future to change our selected students centrally and update it everywhere the list is used.

  • For our second condition, we want to evaluate how two values in the same row compare to each other.

    • When executed, this condition will look at each row to check if the value in score is greater than the value in score_average. If so, then the condition is met.

Output data:

/*---------+-------+---------+------------+---------------*
 | test    | score | student | date       | score_average |
 +---------+-------+---------+------------+---------------+
 | quiz    | 83    | jane    | 2020-04-01 | 69            |
 | midterm | 74    | jane    | 2020-05-01 | 76.3333       |
 | final   | 77    | jane    | 2020-06-01 | 76            |
 *---------+-------+---------+------------+---------------*/

Reference: Comparison statements

A comparison always evaluates to either TRUE or FALSE.

Comparisons are made up of one or more comparison rows. A comparison row always evaluates to TRUE or FALSE; multiple rows can be nested together with a logical AND/OR.

A comparison row is made up of three components:

Left expression

The left expression can contain variable parameters from the source table, joined tables, as well as any newly created variables. All referenced new variables must be "upstream" from the current comparison, with the exception of joins, which may reference new variables that are constructed by any variables upstream of that join.

Depending on the operator selected, the left expression may contain multiple variables. In this case, each left expression will be evaluated against the right expression(s), and logically joined via an OR. If you want to take the logical AND of multiple variables, create a separate comparison row for each variable.

Multiple left hand values are only supported for = and like operators.

Where multiple variables can be entered in a comparison, you can use * in the interface to select all matching results.

For example, typing DIAG* and pressing enter will add all variables beginning with "DIAG" to this field.

Operator

Redivis supports the following operators:

=, !=

Checks if any the value(s) in the left expression are (not) equal to any values in the right expression. NULLs are treated as equivalent (NULL == NULL -> TRUE and NULL != NULL -> FALSE)

>, >=, <, <=,

Checks if the value in the left expression is less than, greater than, etc. the right expression. String comparisons are lexicographically ordered, other data types are based on the numeric / temporal order of that value. Comparisons between NULL values will always be false.

like / ! like

Checks if the string(s) in the left hand expression matches specified pattern(s) in the right hand expression. The pattern may contain the following characters:

  • A percent sign "%" matches any number of characters

  • An underscore "_" matches a single character

  • You can escape "\", "_", or "%" using one backslash. For example, "\%"

Right expression

The right expression can contain any variables allowed in the left expression, as well as literal values and lists. The comparison row will evaluate to TRUE when the left hand expression matches any of the right hand expressions, except for the != and !like comparators, where the comparison will evaluate to true if all values are not equal to / not like the left expression.

To match against a null datum (empty cell), you must specify the special literal value NULL here.

Last updated