enrollment_end_datein the Keep section.
patient_id) to ensure that each patient is processed individually. But, to account for the fact that a single patient may have many periods of continuous enrollment, each spanning many months, we need to identify the correct start (from
enrollment_start_date) and end (from
enrollment_end_date) of a continuous period out of multiple rows and capture them in a single row.
lag_end_dateusing the lag method, which will order each row by
enrollment_start_datein a given
patient_idpartition and copy the previous row's
enrollment_end_datevalue into the each row. We also create
lead_start_dateusing lead, to copy the following row's
enrollment_start_datevalue into each row.
lag_end_dateto find the difference (in days) of the start of each period and the end of the previous period. and compare
lead_start_dateto find the end of the period.
diff_lag_end_enrollment_start(created using date diff) which rows describe an enrollment period directly following the previous period, and which rows describe an enrollment period with a larger gap since the previous period. We also create
diff_enrollment_end_lead_startto identify gaps between an enrollment period and the next period.
diff_lag_end_enrollment_startvalue of either NULL (the row is the first period in the partition) or greater than 1 (the row comes after a gap in enrollment). And we identify rows corresponding to the end of a continuous period as those with a
diff_enrollment_end_lead_startvalue of either NULL (the row is the last period in the partition) or greater than 1 (the row comes before a gap in enrollment).
is_end_continuous_period(created the case method) if a given row corresponds to the start of a continuous period, the end of a continuous period, or both.
diff...variables, since our final processing step will only consider a row's
lead_end_date(in the same transform is fine, since this step will happen after the previous filter) which copies the
enrollment_end_datevalue of the following row on to each row.
true, since our
lead_end_datehas copied over the end date of the continuous enrollment period, contained in each row's following row.
enrollment_start_datevalue contains the start of a continuous period, but these rows fall into two categories:
false). For these, we want to look at
lead_end_date, the end date of the next row, which represents the final date of the continuous period, since we filtered out all the intermediate rows above.
true) – in this example, if the continuous period was only 1 month. For these, the row defines a period (in this example, 1 month) that also contains the end date, so we just get the
enrollment_end_datefrom that same row. Note that the
lead_end_datevalue is incorrect in this case, since the next row contains the start of the next continuous period, or NULL if the period falls at the end of a partition.
enrollment_end_date_continuous, created in an additional transform, since our previous operation involved a partition.
enrollment_start_date_continuousfor consistency, and end date of each continuous enrollment period.