# Continuous enrollment

{% hint style="success" %}
The following example is illustrated on Redivis in the [MarketScan Continuous Enrollment workflow](https://redivis.com/projects/2409) – you'll need access to [MarketScan](https://redivis.com/datasets/96hs-egqe74693) data to view the details.‌
{% endhint %}

Many insurance claims datasets on Redivis contain information about enrollment, detailing the periods of time when an individual represented in a dataset was covered by an insurance plan. If you intend to characterize patients based on their insurance coverage (or lack thereof) during certain key events (procedures, diagnoses, etc), it's often important to identify periods of *continuous* enrollment for each individual – and capture each continuous enrollment period for each patient in a single row.‌

These claims datasets describe enrollment information in multiple discrete rows per patient, each corresponding to patient per month. However, an overall continuous enrollment period may be broken up across rows into months or other non-uniform chunks, so we'll employ the following process to combine multiple sequential rows into a single row with one start date and one end date, describing one continuous period.‌

In this example, we will process the [MarketScan Enrollment Detail](https://redivis.com/datasets/96hs-egqe74693/tables/rscp-96kmeddcq) table to create a table in which each row describes a single period of continuous enrollment. We show an artificial snapshot of the dataset below, where patient 1 has multiple periods of continuous enrollment due to some gaps in coverage, and patient 2 has a single period of continuous enrollment.

| patient\_id | enrollment\_start\_date | enrollment\_end\_date |
| ----------- | ----------------------- | --------------------- |
| 1           | 2012-01-01              | 2012-01-31            |
| 1           | 2012-02-01              | 2012-02-28            |
| 1           | 2012-04-01              | 2012-04-30            |
| 1           | 2012-06-01              | 2012-06-30            |
| 1           | 2012-07-01              | 2012-07-31            |
| 1           | 2012-08-01              | 2012-08-31            |
| 2           | 2012-01-01              | 2012-01-31            |

We want to create a final table with 3 rows for patient 1 to account for gaps in enrollment in March and May of 2012, and 1 row for patient 2. Our desired output has a row for each continuous period per patient, shown below:

| patient\_id | enrollment\_start\_date\_continuous | enrollment\_end\_date\_continuous |
| ----------- | ----------------------------------- | --------------------------------- |
| 1           | 2012-01-01                          | 2012-02-28                        |
| 1           | 2012-04-01                          | 2012-04-30                        |
| 1           | 2012-06-01                          | 2012-08-31                        |
| 2           | 2012-01-01                          | 2012-01-31                        |

The variable names in this example are *not* actual MarketScan variable names. But, with appropriate data access, you can see the real variables used in the first transform of the Redivis example workflow by hovering over the (renamed) variables `patient_id`, `enrollment_start_date`, and `enrollment_end_date` in the **Keep** section.‌

### (1) Add start and end of adjacent periods to each row <a href="#id-1-add-start-and-end-of-adjacent-periods-to-each-row" id="id-1-add-start-and-end-of-adjacent-periods-to-each-row"></a>

Throughout this example, we'll create variables that partition the dataset by patient identifier (here, `patient_id`) to ensure that each patient is processed individually. But, to account for the fact that a single patient may have *many* periods of continuous enrollment, each spanning *many* months, we need to identify the correct start (from `enrollment_start_date`) and end (from `enrollment_end_date`) of a continuous period out of multiple rows and capture them in a single row.‌

First, we create a partition variable `lag_end_date` using the **lag** method, which will order each row by `enrollment_start_date` in a given `patient_id` partition and copy the previous row's `enrollment_end_date` value into the each row. We also create `lead_start_date` using **lead**, to copy the following row's `enrollment_start_date` value into each row.‌

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FAQCsLpJtAzLkxdDsgf7J%2F1a_out.png?alt=media\&token=be86c072-e32c-43fd-b1a3-35443a0a9ac1)

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FXcHD5OzhVg04wYD5oCSA%2F1b_out.png?alt=media\&token=9bce57eb-1330-4dca-8f9b-e8a4c985c3ef)

These methods generate values which tell us how close the preceding and following enrollment periods are with respect the each row's enrollment period.

| patient\_id | enrollment\_start\_date | enrollment\_end\_date | lag\_end\_date | lead\_start\_date |
| ----------- | ----------------------- | --------------------- | -------------- | ----------------- |
| 1           | 2012-01-01              | 2012-01-31            | *NULL*         | 2012-02-01        |
| 1           | 2012-02-01              | 2012-02-28            | 2012-01-31     | 2012-04-01        |
| 1           | 2012-04-01              | 2012-04-30            | 2012-02-28     | 2012-06-01        |
| 1           | 2012-06-01              | 2012-06-30            | 2012-04-30     | 2012-07-01        |
| 1           | 2012-07-01              | 2012-07-31            | 2012-06-30     | 2012-08-01        |
| 1           | 2012-08-01              | 2012-08-31            | 2012-07-31     | *NULL*            |
| 2           | 2012-01-01              | 2012-01-31            | *NULL*         | *NULL*            |

### (2) Select rows with start and end of continuous periods <a href="#id-2-select-rows-with-start-and-end-of-continuous-periods" id="id-2-select-rows-with-start-and-end-of-continuous-periods"></a>

In a second, downstream transform we'll create new variables, which will use the above lead and lag values to identify which rows correspond to the beginning and end of a continuous enrollment period.‌

First, we'll compare `enrollment_start_date` and `lag_end_date` to find the difference (in days) of the start of each period and the end of the previous period. and compare `enrollment_end_date` and `lead_start_date` to find the end of the period.‌

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FAvne8gLH1LPqg9kUKVPJ%2F2a_out.png?alt=media\&token=9119a069-dbea-417b-b9d8-643bc064c810)

<figure><img src="https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FCAXy78HXM0O3sTyA6Dhz%2F2b_out.png?alt=media&#x26;token=4b85995a-b716-4878-a701-fa47ce3ddf09" alt=""><figcaption></figcaption></figure>

We see via `diff_lag_end_enrollment_start` (created using the **date diff** method) which rows describe an enrollment period directly following the previous period, and which rows describe an enrollment period with a larger gap since the previous period. We also create `diff_enrollment_end_lead_start` to identify gaps between an enrollment period and the next period.

| patient\_id | enrollment\_start\_date | enrollment\_end\_date | lag\_end\_date | lead\_start\_date | diff\_lag\_end\_enrollment\_start | diff\_enrollment\_end\_lag\_start |
| ----------- | ----------------------- | --------------------- | -------------- | ----------------- | --------------------------------- | --------------------------------- |
| 1           | 2012-01-01              | 2012-01-31            | *NULL*         | 2012-02-01        | *NULL*                            | 1                                 |
| 1           | 2012-02-01              | 2012-02-28            | 2012-01-31     | 2012-04-01        | 1                                 | 31                                |
| 1           | 2012-04-01              | 2012-04-30            | 2012-02-28     | 2012-06-01        | 31                                | 31                                |
| 1           | 2012-06-01              | 2012-06-30            | 2012-04-30     | 2012-07-01        | 31                                | 1                                 |
| 1           | 2012-07-01              | 2012-07-31            | 2012-06-30     | 2012-08-01        | 1                                 | 1                                 |
| 1           | 2012-08-01              | 2012-08-31            | 2012-07-31     | *NULL*            | 1                                 | *NULL*                            |
| 2           | 2012-01-01              | 2012-01-31            | *NULL*         | *NULL*            | *NULL*                            | *NULL*                            |

Next, we'll encode booleans from our difference variables to simplify further filtering. We'll identify rows corresponding to the start of a continuous period as those with a `diff_lag_end_enrollment_start` value of either *NULL* (the row is the first period in the partition) or *greater than 1* (the row comes after a gap in enrollment). And we identify rows corresponding to the end of a continuous period as those with a `diff_enrollment_end_lead_start` value of either *NULL* (the row is the last period in the partition) or *greater than 1* (the row comes before a gap in enrollment).‌

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2F2xj7jAZe7MGgmsCm29AQ%2F2c_out.png?alt=media\&token=50a56cf3-eb43-496d-8c21-aac3d27086fe)

<figure><img src="https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2F49CgiSuMavKvC4A5rf0m%2F2d_out.png?alt=media&#x26;token=5d9837a5-26de-4ace-96fd-aa73f5b5fb4f" alt=""><figcaption></figcaption></figure>

We see via boolean variables `is_start_continuous_period` and `is_end_continuous_period` (created the [**case**](https://docs.redivis.com/reference/workflows/transforms/variable-creation-methods/case-if-else) method) if a given row corresponds to the start of a continuous period, the end of a continuous period, or both.

| patient\_id | enrollment\_start\_date | enrollment\_end\_date | lag\_end\_date | lead\_start\_date | diff\_lag\_end\_enrollment\_start | diff\_enrollment\_end\_lag\_start | is\_start\_continuous\_period | is\_end\_continuous\_period |
| ----------- | ----------------------- | --------------------- | -------------- | ----------------- | --------------------------------- | --------------------------------- | ----------------------------- | --------------------------- |
| 1           | 2012-01-01              | 2012-01-31            | *NULL*         | 2012-02-01        | *NULL*                            | 1                                 | true                          | false                       |
| 1           | 2012-02-01              | 2012-02-28            | 2012-01-31     | 2012-04-01        | 1                                 | 31                                | false                         | true                        |
| 1           | 2012-04-01              | 2012-04-30            | 2012-02-28     | 2012-06-01        | 31                                | 31                                | true                          | true                        |
| 1           | 2012-06-01              | 2012-06-30            | 2012-04-30     | 2012-07-01        | 31                                | 1                                 | true                          | false                       |
| 1           | 2012-07-01              | 2012-07-31            | 2012-06-30     | 2012-08-01        | 1                                 | 1                                 | false                         | false                       |
| 1           | 2012-08-01              | 2012-08-31            | 2012-07-31     | *NULL*            | 1                                 | *NULL*                            | false                         | true                        |
| 2           | 2012-01-01              | 2012-01-31            | *NULL*         | *NULL*            | *NULL*                            | *NULL*                            | true                          | true                        |

Then, to capture *only* the start and end of a continuous period, we'll use a [**filter**](https://docs.redivis.com/reference/workflows/transforms/step-filter) to keep only rows which are `true` for either `is_start_continuous_period` or `is_end_continuous_period`.‌

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2Fkgx6uGLeXrXyJBSWvSsu%2F2e_out.png?alt=media\&token=dcd5e2b7-cc32-4a16-b9eb-6366c0b98541)

This leaves us with either 1 or 2 rows corresponding to a continuous enrollment period. If a continuous period spans multiple rows (months, in this case), we'll have 2 rows (a start row and an end row). But if a period only spans one row, we'll have both start and end captured by that 1 row. In our example, the row containing the middle (neither start nor end) of the 2012-06-01 to 2012-08-31 enrollment period for patient 1 was dropped. We can also ignore our intermediate `lead...`, `lag...`, and `diff...` variables, since our final processing step will only consider a row's `is_start_continuous_period` and `is_end_continuous_period` values.

| patient\_id | enrollment\_start\_date | enrollment\_end\_date | is\_start\_continuous\_period | is\_end\_continuous\_period |
| ----------- | ----------------------- | --------------------- | ----------------------------- | --------------------------- |
| 1           | 2012-01-01              | 2012-01-31            | true                          | false                       |
| 1           | 2012-02-01              | 2012-02-28            | false                         | true                        |
| 1           | 2012-04-01              | 2012-04-30            | true                          | true                        |
| 1           | 2012-06-01              | 2012-06-30            | true                          | false                       |
| 1           | 2012-08-01              | 2012-08-31            | false                         | true                        |
| 2           | 2012-01-01              | 2012-01-31            | true                          | true                        |

### (3) Add end date of following enrollment periods to each row and collapse <a href="#id-3-add-end-date-of-following-enrollment-periods-to-each-row-and-collapse" id="id-3-add-end-date-of-following-enrollment-periods-to-each-row-and-collapse"></a>

Finally, we want to collapse our table to ensure 1 row per continuous enrollment period per patient. Since we have only rows corresponding to start and end of continuous periods, we create another partition variable `lead_end_date` (in the same transform is fine, since this step will happen *after* the previous filter) which copies the `enrollment_end_date` value of the following row on to each row.‌

We can also use the partition row filter to keep only rows with `is_start_continuous_period` as `true`, since our `lead_end_date` has copied over the end date of the continuous enrollment period, contained in each row's following row.

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FLcv2UJ4h41ca1UTcUaQn%2F3a_out.png?alt=media\&token=160dc9db-4b57-4f49-a52c-ed2e6167ccca)

<figure><img src="https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FaXpUmhhO0WgQPympFrow%2F3b_out.png?alt=media&#x26;token=8183e5f0-acd5-46ca-b35e-e2899d2e9d7b" alt=""><figcaption></figcaption></figure>

We end up with a table where each and every row contains both the start date of the continuous enrollment period *and* the end date of that continuous enrollment period.

| patient\_id | enrollment\_start\_date | enrollment\_end\_date | lead\_end\_date | is\_start\_continuous\_period | is\_end\_continuous\_period |
| ----------- | ----------------------- | --------------------- | --------------- | ----------------------------- | --------------------------- |
| 1           | 2012-01-01              | 2012-01-31            | 2012-02-28      | true                          | false                       |
| 1           | 2012-04-01              | 2012-04-30            | 2012-06-30      | true                          | true                        |
| 1           | 2012-06-01              | 2012-06-30            | 2012-08-31      | true                          | false                       |
| 2           | 2012-01-01              | 2012-01-31            | *NULL*          | true                          | true                        |

A final processing step captures the correct end date of a continuous period. We now have only rows whose `enrollment_start_date` value contains the start of a continuous period, but these rows fall into two categories:‌

* First, we have the rows that *do not* also correspond to the end of an enrollment period (where the `is_end_continuous_period` value is `false`). For these, we want to look at `lead_end_date`, the end date of the *next* row, which represents the *final* date of the continuous period, since we filtered out all the intermediate rows above.
* Second, we have which also correspond to the end of an enrollment period (where the `is_end_continuous_period` value is `true`) – in this example, if the continuous period was only 1 month. For these, the row defines a period (in this example, 1 month) that also contains the end date, so we just get the `enrollment_end_date` from that same row. Note that the `lead_end_date` value is incorrect in this case, since the next row contains the start of the *next* continuous period, or *NULL* if the period falls at the end of a partition.

We capture the above logic in a new variable `enrollment_end_date_continuous`, created in an additional transform, since our previous operation involved a partition.

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FlVmZVsEiQkWFk7oeF6AF%2F3c_out.png?alt=media\&token=a4b93674-713c-4ccf-9d26-33b771ad2985)

We end up with a final table below, containing the patient identifier, and the start date (renamed to `enrollment_start_date_continuous` for consistency, and end date of each continuous enrollment period.

| patient\_id | enrollment\_start\_date\_continuous | enrollment\_end\_date\_continuous |
| ----------- | ----------------------------------- | --------------------------------- |
| 1           | 2012-01-01                          | 2012-02-28                        |
| 1           | 2012-04-01                          | 2012-04-30                        |
| 1           | 2012-06-01                          | 2012-08-31                        |
| 2           | 2012-01-01                          | 2012-01-31                        |

[<br>](https://app.gitbook.com/@redivis/s/redivis-docs/~/diff/drafts/-MF7vtJ0H03tHCj02xh5/projects/examples/select-first-last-encounter)
