Select first/last encounter
E.g.: I want the date of the first and last hospital admission for every patient in my table
Let's take an example table which contains all hospital visits in one facility over many years. Each record contains information about what happened in the visit, including a patient id to identify who was present. An individual patient might be represented in one or many records in this table since each record represents a separate hospital visit.
patient_id
encounter_date
1
2012-01-01
1
2011-01-01
1
2010-01-01
1
2013-01-01
2
2009-01-01
2
2008-01-01
2
2015-01-01
3
2014-02-01
Let's say that for our research we want to find the first and last encounter — or more formally, the min and max encounter date for each patient.
Computing a min or max value is an aggregate operation that scans all records and chooses the highest or lowest value. To compute the min or max value in a variable we will want to use an analytic new variable method. We can do this either by aggregating our table (which would drop records and/or variables) or by using a partition to calculate a new variable without changing the rest of the data.
In this example we will use a partition to create a new variable since we want the rest of the data to remain unchanged.
Variable concepts
Conceptually we will define our aggregation variable as the date of the encounter since that is where we want to look for the min and max values.
We will define our partition as the patient id since we want to find the min or max for each patient. This will limit the min or max value scan to all values of our aggregation variable across each unique value of our partitioned variable (patient id).
You can see more examples of using partitions with analytic methods on the Create variables step page.
Defining the new variable
For this particular case, we want to find the MIN and MAX of the encounter_date
of each patient; in other words, partitioned on patient_id
. This will create two new variables, min_encounter_date
and max_encounter_date
for each patient.
When we run this query, note that no records will be dropped; rather, these new variables will be have consistent values for any given patient_id
.
patient_id
encounter_date
min_encounter_date
max_encounter_date
1
2012-01-01
2010-01-01
2013-01-01
1
2011-01-01
2010-01-01
2013-01-01
1
2010-01-01
2010-01-01
2013-01-01
1
2013-01-01
2010-01-01
2013-01-01
2
2009-01-01
2008-01-01
2015-01-01
2
2008-01-01
2008-01-01
2015-01-01
2
2015-01-01
2008-01-01
2015-01-01
3
2014-02-01
2014-02-01
2014-02-01
Now, let's say we only want to keep records which contain each patient's first and last encounter. We can do this easily be creating a new row filter:
We can also generalize this approach to find the Nth encounter for each patient — take a look at the RANK, DENSE_RANK, and ROW_NUMBER methods.
And if we want to be more specific on our partition (e.g., for the first / last encounter in a given calendar year), we can always apply additional variables (year
) when defining our partition.
Last updated