Redivis Documentation
API DocumentationRedivis Home
  • Introduction
  • Redivis for open science
    • FAIR data practices
    • Open access
    • Data repository characteristics
    • Data retention policy
    • Citations
  • Guides
    • Getting started
    • Discover & access data
      • Discover datasets
      • Apply to access restricted data
      • Create a study
    • Analyze data in a workflow
      • Reshape data in transforms
      • Work with data in notebooks
      • Running ML workloads
      • Example workflows
        • Analyzing large tabular data
        • Create an image classification model
        • Fine tuning a Large Language Model (LLM)
        • No-code visualization
        • Continuous enrollment
        • Select first/last encounter
    • Export & publish your work
      • Export to other environments
      • Build your own site with Observable
    • Create & manage datasets
      • Create and populate a dataset
      • Upload tabular data as tables
      • Upload unstructured data as files
      • Cleaning tabular data
    • Administer an organization
      • Configure access systems
      • Grant access to data
      • Generate a report
      • Example tasks
        • Emailing subsets of members
    • Video guides
  • Reference
    • Your account
      • Creating an account
      • Managing logins
      • Single Sign-On (SSO)
      • Workspace
      • Studies
      • Compute credits and billing
    • Datasets
      • Documentation
      • Tables
      • Variables
      • Files
      • Creating & editing datasets
      • Uploading data
        • Tabular data
        • Geospatial data
        • Unstructured data
        • Metadata
        • Data sources
        • Programmatic uploads
      • Version control
      • Sampling
      • Exporting data
        • Download
        • Programmatic
        • Google Data Studio
        • Google Cloud Storage
        • Google BigQuery
        • Embedding tables
    • Workflows
      • Workflow concepts
      • Documentation
      • Data sources
      • Tables
      • Transforms
        • Transform concepts
        • Step: Aggregate
        • Step: Create variables
        • Step: Filter
        • Step: Join
        • Step: Limit
        • Step: Stack
        • Step: Order
        • Step: Pivot
        • Step: Rename
        • Step: Retype
        • Step: SQL query
        • Variable selection
        • Value lists
        • Optimization and errors
        • Variable creation methods
          • Common elements
          • Aggregate
          • Case (if/else)
          • Date
          • DateTime
          • Geography
          • JSON
          • Math
          • Navigation
          • Numbering
          • Other
          • Statistical
          • String
          • Time
      • Notebooks
        • Notebook concepts
        • Compute resources
        • Python notebooks
        • R notebooks
        • Stata notebooks
        • SAS notebooks
        • Using the Jupyter interface
      • Access and privacy
    • Data access
      • Access levels
      • Configuring access
      • Requesting access
      • Approving access
      • Usage rules
      • Data access in workflows
    • Organizations
      • Administrator panel
      • Members
      • Studies
      • Workflows
      • Datasets
      • Permission groups
      • Requirements
      • Reports
      • Logs
      • Billing
      • Settings and branding
        • Account
        • Public profile
        • Membership
        • Export environments
        • Advanced: DOI configuration
        • Advanced: Stata & SAS setup
        • Advanced: Data storage locations
        • Advanced: Data egress configuration
    • Institutions
      • Administrator panel
      • Organizations
      • Members
      • Datasets
      • Reports
      • Settings and branding
    • Quotas and limits
    • Glossary
  • Additional Resources
    • Events and press
    • API documentation
    • Redivis Labs
    • Office hours
    • Contact us
    • More information
      • Product updates
      • Roadmap
      • System status
      • Security
      • Feature requests
      • Report a bug
Powered by GitBook
On this page
  • (1) Add start and end of adjacent periods to each row
  • (2) Select rows with start and end of continuous periods
  • (3) Add end date of following enrollment periods to each row and collapse

Was this helpful?

Export as PDF
  1. Guides
  2. Analyze data in a workflow
  3. Example workflows

Continuous enrollment

Last updated 6 months ago

Was this helpful?

The following example is illustrated on Redivis in the – you'll need access to data to view the details.‌

Many insurance claims datasets on Redivis contain information about enrollment, detailing the periods of time when an individual represented in a dataset was covered by an insurance plan. If you intend to characterize patients based on their insurance coverage (or lack thereof) during certain key events (procedures, diagnoses, etc), it's often important to identify periods of continuous enrollment for each individual – and capture each continuous enrollment period for each patient in a single row.‌

These claims datasets describe enrollment information in multiple discrete rows per patient, each corresponding to patient per month. However, an overall continuous enrollment period may be broken up across rows into months or other non-uniform chunks, so we'll employ the following process to combine multiple sequential rows into a single row with one start date and one end date, describing one continuous period.‌

In this example, we will process the table to create a table in which each row describes a single period of continuous enrollment. We show an artificial snapshot of the dataset below, where patient 1 has multiple periods of continuous enrollment due to some gaps in coverage, and patient 2 has a single period of continuous enrollment.

patient_id

enrollment_start_date

enrollment_end_date

1

2012-01-01

2012-01-31

1

2012-02-01

2012-02-28

1

2012-04-01

2012-04-30

1

2012-06-01

2012-06-30

1

2012-07-01

2012-07-31

1

2012-08-01

2012-08-31

2

2012-01-01

2012-01-31

We want to create a final table with 3 rows for patient 1 to account for gaps in enrollment in March and May of 2012, and 1 row for patient 2. Our desired output has a row for each continuous period per patient, shown below:

patient_id

enrollment_start_date_continuous

enrollment_end_date_continuous

1

2012-01-01

2012-02-28

1

2012-04-01

2012-04-30

1

2012-06-01

2012-08-31

2

2012-01-01

2012-01-31

The variable names in this example are not actual MarketScan variable names. But, with appropriate data access, you can see the real variables used in the first transform of the Redivis example workflow by hovering over the (renamed) variables patient_id, enrollment_start_date, and enrollment_end_date in the Keep section.‌

(1) Add start and end of adjacent periods to each row

Throughout this example, we'll create variables that partition the dataset by patient identifier (here, patient_id) to ensure that each patient is processed individually. But, to account for the fact that a single patient may have many periods of continuous enrollment, each spanning many months, we need to identify the correct start (from enrollment_start_date) and end (from enrollment_end_date) of a continuous period out of multiple rows and capture them in a single row.‌

First, we create a partition variable lag_end_date using the lag method, which will order each row by enrollment_start_date in a given patient_id partition and copy the previous row's enrollment_end_date value into the each row. We also create lead_start_date using lead, to copy the following row's enrollment_start_date value into each row.‌

These methods generate values which tell us how close the preceding and following enrollment periods are with respect the each row's enrollment period.

patient_id

enrollment_start_date

enrollment_end_date

lag_end_date

lead_start_date

1

2012-01-01

2012-01-31

NULL

2012-02-01

1

2012-02-01

2012-02-28

2012-01-31

2012-04-01

1

2012-04-01

2012-04-30

2012-02-28

2012-06-01

1

2012-06-01

2012-06-30

2012-04-30

2012-07-01

1

2012-07-01

2012-07-31

2012-06-30

2012-08-01

1

2012-08-01

2012-08-31

2012-07-31

NULL

2

2012-01-01

2012-01-31

NULL

NULL

(2) Select rows with start and end of continuous periods

In a second, downstream transform we'll create new variables, which will use the above lead and lag values to identify which rows correspond to the beginning and end of a continuous enrollment period.‌

First, we'll compare enrollment_start_date and lag_end_date to find the difference (in days) of the start of each period and the end of the previous period. and compare enrollment_end_date and lead_start_date to find the end of the period.‌

We see via diff_lag_end_enrollment_start (created using the date diff method) which rows describe an enrollment period directly following the previous period, and which rows describe an enrollment period with a larger gap since the previous period. We also create diff_enrollment_end_lead_start to identify gaps between an enrollment period and the next period.

patient_id

enrollment_start_date

enrollment_end_date

lag_end_date

lead_start_date

diff_lag_end_enrollment_start

diff_enrollment_end_lag_start

1

2012-01-01

2012-01-31

NULL

2012-02-01

NULL

1

1

2012-02-01

2012-02-28

2012-01-31

2012-04-01

1

31

1

2012-04-01

2012-04-30

2012-02-28

2012-06-01

31

31

1

2012-06-01

2012-06-30

2012-04-30

2012-07-01

31

1

1

2012-07-01

2012-07-31

2012-06-30

2012-08-01

1

1

1

2012-08-01

2012-08-31

2012-07-31

NULL

1

NULL

2

2012-01-01

2012-01-31

NULL

NULL

NULL

NULL

Next, we'll encode booleans from our difference variables to simplify further filtering. We'll identify rows corresponding to the start of a continuous period as those with a diff_lag_end_enrollment_start value of either NULL (the row is the first period in the partition) or greater than 1 (the row comes after a gap in enrollment). And we identify rows corresponding to the end of a continuous period as those with a diff_enrollment_end_lead_start value of either NULL (the row is the last period in the partition) or greater than 1 (the row comes before a gap in enrollment).‌

patient_id

enrollment_start_date

enrollment_end_date

lag_end_date

lead_start_date

diff_lag_end_enrollment_start

diff_enrollment_end_lag_start

is_start_continuous_period

is_end_continuous_period

1

2012-01-01

2012-01-31

NULL

2012-02-01

NULL

1

true

false

1

2012-02-01

2012-02-28

2012-01-31

2012-04-01

1

31

false

true

1

2012-04-01

2012-04-30

2012-02-28

2012-06-01

31

31

true

true

1

2012-06-01

2012-06-30

2012-04-30

2012-07-01

31

1

true

false

1

2012-07-01

2012-07-31

2012-06-30

2012-08-01

1

1

false

false

1

2012-08-01

2012-08-31

2012-07-31

NULL

1

NULL

false

true

2

2012-01-01

2012-01-31

NULL

NULL

NULL

NULL

true

true

This leaves us with either 1 or 2 rows corresponding to a continuous enrollment period. If a continuous period spans multiple rows (months, in this case), we'll have 2 rows (a start row and an end row). But if a period only spans one row, we'll have both start and end captured by that 1 row. In our example, the row containing the middle (neither start nor end) of the 2012-06-01 to 2012-08-31 enrollment period for patient 1 was dropped. We can also ignore our intermediate lead..., lag..., and diff... variables, since our final processing step will only consider a row's is_start_continuous_period and is_end_continuous_period values.

patient_id

enrollment_start_date

enrollment_end_date

is_start_continuous_period

is_end_continuous_period

1

2012-01-01

2012-01-31

true

false

1

2012-02-01

2012-02-28

false

true

1

2012-04-01

2012-04-30

true

true

1

2012-06-01

2012-06-30

true

false

1

2012-08-01

2012-08-31

false

true

2

2012-01-01

2012-01-31

true

true

(3) Add end date of following enrollment periods to each row and collapse

Finally, we want to collapse our table to ensure 1 row per continuous enrollment period per patient. Since we have only rows corresponding to start and end of continuous periods, we create another partition variable lead_end_date (in the same transform is fine, since this step will happen after the previous filter) which copies the enrollment_end_date value of the following row on to each row.‌

We can also use the partition row filter to keep only rows with is_start_continuous_period as true, since our lead_end_date has copied over the end date of the continuous enrollment period, contained in each row's following row.

We end up with a table where each and every row contains both the start date of the continuous enrollment period and the end date of that continuous enrollment period.

patient_id

enrollment_start_date

enrollment_end_date

lead_end_date

is_start_continuous_period

is_end_continuous_period

1

2012-01-01

2012-01-31

2012-02-28

true

false

1

2012-04-01

2012-04-30

2012-06-30

true

true

1

2012-06-01

2012-06-30

2012-08-31

true

false

2

2012-01-01

2012-01-31

NULL

true

true

A final processing step captures the correct end date of a continuous period. We now have only rows whose enrollment_start_date value contains the start of a continuous period, but these rows fall into two categories:‌

  • First, we have the rows that do not also correspond to the end of an enrollment period (where the is_end_continuous_period value is false). For these, we want to look at lead_end_date, the end date of the next row, which represents the final date of the continuous period, since we filtered out all the intermediate rows above.

  • Second, we have which also correspond to the end of an enrollment period (where the is_end_continuous_period value is true) – in this example, if the continuous period was only 1 month. For these, the row defines a period (in this example, 1 month) that also contains the end date, so we just get the enrollment_end_date from that same row. Note that the lead_end_date value is incorrect in this case, since the next row contains the start of the next continuous period, or NULL if the period falls at the end of a partition.

We capture the above logic in a new variable enrollment_end_date_continuous, created in an additional transform, since our previous operation involved a partition.

We end up with a final table below, containing the patient identifier, and the start date (renamed to enrollment_start_date_continuous for consistency, and end date of each continuous enrollment period.

patient_id

enrollment_start_date_continuous

enrollment_end_date_continuous

1

2012-01-01

2012-02-28

1

2012-04-01

2012-04-30

1

2012-06-01

2012-08-31

2

2012-01-01

2012-01-31

We see via boolean variables is_start_continuous_period and is_end_continuous_period (created the method) if a given row corresponds to the start of a continuous period, the end of a continuous period, or both.

Then, to capture only the start and end of a continuous period, we'll use a to keep only rows which are true for either is_start_continuous_period or is_end_continuous_period.‌

MarketScan Continuous Enrollment workflow
MarketScan
MarketScan Enrollment Detail
case
filter