Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • Variable concepts
  • Defining the new variable

Was this helpful?

Export as PDF
  1. Guides
  2. Analyze data in a workflow
  3. Example workflows

Select first/last encounter

Last updated 1 year ago

Was this helpful?

E.g.: I want the date of the first and last hospital admission for every patient in my table

Let's take an example table which contains all hospital visits in one facility over many years. Each record contains information about what happened in the visit, including a patient id to identify who was present. An individual patient might be represented in one or many records in this table since each record represents a separate hospital visit.

patient_id

encounter_date

1

2012-01-01

1

2011-01-01

1

2010-01-01

1

2013-01-01

2

2009-01-01

2

2008-01-01

2

2015-01-01

3

2014-02-01

Let's say that for our research we want to find the first and last encounter — or more formally, the min and max encounter date for each patient.

Computing a min or max value is an aggregate operation that scans all records and chooses the highest or lowest value. To compute the min or max value in a variable we will want to use an analytic method. We can do this either by our table (which would drop records and/or variables) or by using a partition to calculate a new variable without changing the rest of the data.

In this example we will use a partition to create a new variable since we want the rest of the data to remain unchanged.

Variable concepts

Conceptually we will define our aggregation variable as the date of the encounter since that is where we want to look for the min and max values.

We will define our partition as the patient id since we want to find the min or max for each patient. This will limit the min or max value scan to all values of our aggregation variable across each unique value of our partitioned variable (patient id).

You can see more examples of using partitions with analytic methods on the step page.

Defining the new variable

For this particular case, we want to find the MIN and MAX of the encounter_date of each patient; in other words, partitioned on patient_id . This will create two new variables, min_encounter_date and max_encounter_date for each patient.

When we run this query, note that no records will be dropped; rather, these new variables will be have consistent values for any given patient_id .

patient_id

encounter_date

min_encounter_date

max_encounter_date

1

2012-01-01

2010-01-01

2013-01-01

1

2011-01-01

2010-01-01

2013-01-01

1

2010-01-01

2010-01-01

2013-01-01

1

2013-01-01

2010-01-01

2013-01-01

2

2009-01-01

2008-01-01

2015-01-01

2

2008-01-01

2008-01-01

2015-01-01

2

2015-01-01

2008-01-01

2015-01-01

3

2014-02-01

2014-02-01

2014-02-01

Now, let's say we only want to keep records which contain each patient's first and last encounter. We can do this easily be creating a new row filter:

We can also generalize this approach to find the Nth encounter for each patient — take a look at the RANK, DENSE_RANK, and ROW_NUMBER methods.

And if we want to be more specific on our partition (e.g., for the first / last encounter in a given calendar year), we can always apply additional variables (year ) when defining our partition.

new variable
aggregating
Create variables