Select first/last encounter

Selecting first / last encounter

E.g.: I want the date of the first and last hospital admission for every patient in my table

Let's take an example where we have a cohort of hospital patients, where in addition to a unique identifier we have a timestamp / date variable for all of their encounters. For each of these patients, we want to find the first and last encounter — or more formally, the min and max encounter date for that patient

Any time we want to compute aggregate variables for a particular subset of our data, we have a pretty strong indicator to use partition clause(s) within a transform. Partitioned queries allow for us to compute aggregate values on any arbitrary partition that we define on our table, with a partition being a unique combination of one or more variables. For example, we could use the SUM partition method, partitioning on patient_id and year , to take the sum of a cost variable to create a new variable that represents the total cost for each patient over a given year.

For this particular case, we want to find the MIN and MAX of the encounter_date of each patient; in other words, partitioned on patient_id . This will create two new variables, min_encounter_date and max_encounter_date for each patient. When we run this query, note that no records will be dropped; rather, these new variables will be have consistent values for any given patient_id .

patient_id

encounter_date

min_encounter_date

max_encounter_date

1

2012-01-01

2010-01-01

2013-01-01

1

2011-01-01

2010-01-01

2013-01-01

1

2010-01-01

2010-01-01

2013-01-01

1

2013-01-01

2010-01-01

2013-01-01

2

2009-01-01

2008-01-01

2015-01-01

2

2008-01-01

2008-01-01

2015-01-01

2

2015-01-01

2008-01-01

2015-01-01

Now, let's say we only want each patient's first and last encounter. We can do this easily be creating a new row filter, choosing records where min_encounter_date = encounter_date OR max_encounter_date = encounter_date !

We can also generalize this approach to find the Nth encounter for each patient — take a look at the RANK, DENSE_RANK, and ROW_NUMBER methods. And if we want to be more specific on our partition (e.g., first / last encounter in a given calendar year), we can always apply additional variables (year ) when defining our partition.