How To Use First. and Last. Variable In SAS?

The FIRST. And LAST. functions can be used to identify first or last observations by group in the SAS dataset.

  • First.Variable : It assigns value 1 to the first observation and 0 to the rest of the observations within the group in a SAS dataset.
  • Last.Variable: It assigns value 1 to the last observation and 0 to the rest of the observations within the group in a SAS dataset.

These are also called indicators to identify first and last observation in the group. There is no separate column present in the dataset but you could add a new column by assigning this indicator value to it. We have covered this as well further in the article.

PLEASE NOTE: The SAS dataset must be sorted on the BY group before using the first. and last. functions. 

The following sample dataset will be used to demonstrate how First. And Last. variable works and its different use cases.

				/* create a dataset */
data sasDataset;
	input ID var;
	datalines;
1 10
5 60
1 20
4 50
3 40
2 50
3 70
3 60
4 30
5 20
;
run;

/* view dataset*/
proc print data=sasDataset;
run;
			
sample sas dataset - first. and last. variable in sas

PREPARATION: Sort Data On BY Variable

As mentioned earlier, before you use the first. and last. functions, your input dataset must be sorted on the BY variable.

				/* sort dataset by ID */
proc sort data=sasDataset;
	by ID;
run;

/* view sorted dataset by ID*/
proc print data=sasDataset;
run;
			
sorted sas dataset - first. and last. variable in sas

The above sorted dataset will be used for grouping data based on BY statement. In this example we are using an ID variable in BY statement hence grouping data done on ID values. 

Group 1: ID=1

There are two rows with ID=1 hence

The first row will be assigned as FIRST.ID=1 and the second row will be assigned as FIRST.ID=0

The first row will be assigned as LAST.ID=0 and the second that is also last row will be assigned as LAST.ID=1

  • First row     : FIRST.ID=1, LAST.ID=0
  • Second row: FIRST.ID=0, LAST.ID=1

Group 2: ID=2

There is only one row with ID=2

  • First row     : FIRST.ID=1, LAST.ID=1

Group 3: ID=3

There are THREE rows with ID=3 

  • First row     : FIRST.ID=1, LAST.ID=0
  • Second row: FIRST.ID=0, LAST.ID=0
  • Third row    : FIRST.ID=0, LAST.ID=1

Example 1: How FIRST. And LAST. Function Assigns Values

The following example creates a new dataset work.sasDataset_FirstLast with two new variables first_ID that holds first.ID value and last_ID variable that holds last.ID value.

				/* sort dataset by ID */
proc sort data=sasDataset;
	by ID;
run;


/* How first. and last. indicator (1 or 0) value gets assigned */
data sasDataset_FirstLast;
	set sasDataset;
	by ID;
	first_ID=first.ID;
	last_ID=last.ID;
run;

/* view dataset */
proc print data=sasDataset_FirstLast;
run;
			
first. and last. variable in sas

Example 2: How To Use FIRST. Variable In SAS

The FIRST. function assigns value 1 to the first observation and 0 for the rest of the observations within the group.

You can use the First. to extract the first observation and either store it in the separate dataset or update the existing dataset.

The following example creates a new sas dataset “work.first_ID” with unique data based on first observation grouped by ID values.

				/* sort dataset by ID */
proc sort data=sasDataset;
	by ID;
run;

/* how to use first. variable in sas */
data first_ID;
set sasDataset;
	by ID;
	if first.ID=1 then output;
run;

/* view dataset */
proc print data=first_ID; run;
			
How to use first. variable in SAS

Example 3: How To Use LAST. Variable In SAS

The LAST. function assigns value 1 to the last observation and 0 for the rest of the observations within the group.

You can use the Last. to extract the last observation and either store it in the separate dataset or update the existing dataset.

The following example creates a new sas dataset “work.last_ID” with unique data based on last observation grouped by ID values.

				/* sort dataset by ID */
proc sort data=sasDataset;
	by ID;
run;

/* How to use last. variable in sas */
data last_ID;
set sasDataset;
	by ID;
	if last.ID=1 then output;
run;

/* view dataset */
proc print data=last_ID; run;
			
How to use Last. variable in SAS

Example 4: How To Identify and Remove Duplicate Rows Using First. and Last. Variable In SAS

In this example we’ll delete duplicate values based on the BY variable that is ID variable here in our demo example and create a new dataset “work.no_dup_data” which will not have any duplicate rows. 

PS: Here we will DELETE all the duplicate rows and only keep unique values.

All the unique rows can be identified by filtering out data based on FIRST.ID=1 and LAST.ID=1.

				/* sort dataset by ID */
proc sort data=sasDataset;
	by ID;
run;

/* remove duplicate rows and keep purely unique values*/
data no_dup_data;
	set sasDataset;
	by ID;
	if first.Id=1 and last.ID=1 then output;
run;

/* view dataset */
proc print data=no_dup_data; run;
			
remove duplicates using first. and last. variable in sas

Example 5: How To Identify Duplicate Rows Using First. and Last. Variable In SAS

This example is exactly opposite of the previous example. In the above example we created a new dataset with unique rows. If you apply the reverse condition then you’ll be able to extract duplicate rows.

All the duplicate rows can be identified by filtering out data based on rows that are NOT marked as FIRST.ID=1 and LAST.ID=1.

Here’s how you can identify duplicate rows:

				/* sort dataset by ID */
proc sort data=sasDataset;
	by ID;
run;

/* identify duplicate rows */
data dup_data;
	set sasDataset;
	by ID;
	if NOT (first.Id=1 and last.ID=1) then output;
run;

/* view dataset */
proc print data=dup_data;
run;
			
Find duplicates using first. and last. variable in sas

FAQ

What are FIRST. and LAST. variables in SAS data steps?

FIRST. and LAST. are automatic variables in SAS data steps. FIRST. is set to 1 for the first observation in a BY-group and 0 for subsequent observations. LAST. is set to 1 for the last observation in a BY-group and 0 otherwise. These variables are often used to perform calculations and data manipulations based on the first and last occurrences of specific conditions within BY-groups.

What are FIRST. and LAST. variables in SAS, and how are they used in data processing?

FIRST. and LAST. variables are automatic variables in SAS used to identify the first and last occurrences of observations within BY-groups. They are commonly used in data processing to perform calculations and manipulations specific to the first and last observations in a group.

Are FIRST. and LAST. variables applicable only in sorted data in SAS?

FIRST. and LAST. variables are typically used in sorted data or data that is grouped using the BY statement in SAS. They are specifically designed to work with BY-groups. While sorting the data is a common use case, these variables can also be utilized in unsorted data within BY-group processing, ensuring accurate identification of the first and last observations within specific groups.

Can FIRST. and LAST. variables be used in SAS PROC SQL queries?

Yes, FIRST. and LAST. variables can be used in SAS PROC SQL queries. They can be employed within the SELECT statement to calculate values specifically for the first and last observations in the result set. By combining them with conditional statements, you can customize the output based on the position of observations within groups.