In this article you’ll learn 8 different ways with examples on how to select the first N rows in SAS. Sometimes you may want to extract the first 5 observations or just a specific row, let’s say 5th observation, or you may want to extract observations within the specified range.
Everything can be achieved using obs, FIRSTOBS and automatic variable (_N_) in SAS. Here are 8 different scenarios you’ll learn in this article to select the first N-rows in SAS.
- Select First N Rows With OBS= Option.
- Select First N Rows With _N_ automatic Variable
- Select N-th Observation Using _N_ Variable
- Select N-th Observation Using FIRSTOBS and OBS
- Select Last Observation Using End= Option In SAS
- Select First And Last Observations
- Select First N Observations Using PROC SQL
- Select N Observations Randomly
The following sample dataset can be used to learn more about the above scenarios explained with the examples.
Here we are creating TIME dimensional data which can be easily used to explain all the examples.
/* Create a dimension table */
data DIM;
format date date9.;
do dt='01jan2025'd to '31dec2025'd;
date=dt;
month=month(dt);
year=year(dt);
quarter=QTR(dt);
output;
end;
drop dt;
run;
/* remove unnecessary rows and keep first date of the month */
data DIM;
set DIM;
by month;
if first.month then output;
run;
/* view dataset */
proc print data=DIM; run;
1. Select First N Rows With OBS= Option.
You can select the first N rows using the data step with OBS= option. This option tells SAS when to stop processing observations. In a way it helps to exclude rows and only keeps the first N-rows which you want.
The OBS option can be applied when the dataset is being created or reading the base dataset. Applying OBS option on base dataset is always recommended as it only reads necessary observations and avoids reading the entire dataset. It’s very effective when you’re dealing with big datasets.
The following code select first 5 observations from DIM dataset and stores it in the new dataset.
/* select first N rows with OBS= option */
/* more efficient way to select first 5 rows */
data first_5_rows_DIM ;
set DIM (obs=5);
run;
/* view dataset */
proc print data=first_5_rows_DIM; run;
2. Select First N Rows With _N_ (Automatic Variable)
First let’s understand what an automatic variable is and how it works in SAS. And then we will see how you can select the first N rows using this variable. You can read an in depth article here on _N_ automatic variable in SAS.
_N_ (Automatic Variable):
The value for _N_ is initially set to 1. Each time the DATA step loops past the DATA statement, the variable _N_ increments by 1. The value of _N_ represents the number of times the DATA step has iterated.
It means if you want to extract the first 5 rows then you can simply apply a filter on _N_ as: _N_ <=5. This is what is done in the following example to select the first 5 observations.
/* select first N rows with _N_ automatic variable */
data first_5_rows_DIM ;
set DIM;
if _N_ <= 5;
run;
/* view dataset */
proc print data=first_5_rows_DIM; run;
3. Select N-th Observation Using _N_ Variable
In the previous example you learn about how to select first N observations. In this example you’ll learn how to select very specific observations using _N_ (automatic variable) in SAS.
Let’s say, you want to select a very specific 4th observation from the dataset then it can be done using _N_ variable. Check out this article to learn more about _N_ Automatic Variable In SAS.
/* select N-th Observation Using _N_ variable*/
data select_4th_row_DIM ;
set DIM;
if _N_ = 4 then output;
run;
/* view dataset */
proc print data=select_4th_row_DIM; run;
4. Select N-th Observation Using FIRSTOBS and OBS
In our first example you have already learned how to use the OBS option to select the first N-observation. When you use the OBS= option it starts extracting rows from the beginning and stops reading observations at the row number specified in the OBS option.
You can control the start position by using the FIRSTOBS option. The FIRSTOBS tells SAS from where to start reading observations and the OBS option tells SAS at what row it should stop reading observations.
- FIRSTOBS: First row number to start reading observations
- OBS : Last row number to stop reading observations
Trick: If you specify the same row number for start position and end position then it selects only that specific observation. That’s what we want to do here.
The following example demonstrates how to select a very specific 4th observation using firstobs and obs options.
/* select N-th Observation Using firstobs and obs*/
data select_4th_row_DIM ;
set DIM (firstobs=4 obs=4);
run;
/* view dataset */
proc print data=select_4th_row_DIM; run;
5. Select Last Observation Using END= Option In SAS
You can select the last observation using the data step with end= option. Let’s understand how end= option works in SAS. The END= option to tell SAS to create a temporary numeric value whose value is used to detect the last observation.
Here is the example explaining how this temp numeric variable is being created. Let’s create a new column and assign the temp variable value to it.
/* select last observation in SAS */
data last_obs_flg_DIM;
set DIM end=last_obs; /* temporary variable 'last_obs'*/
last_obs_flg=last_obs; /* new variable */
run;
/* view dataset */
proc print data=last_obs_flg_DIM; run;
Here is how you can use the end= temporary variable to select the last observation.
/* select last observation */
data select_last_obs_DIM;
set DIM end=last_obs;
if last_obs=1 then output;
run;
/* view dataset */
proc print data=select_last_obs_DIM; run;
6. Select First And Last Observations
In this example you’ll use automatic variable _N_ and end=last_obs option to select first and last observations.
- First observation can be selected using _N_
- Last observation can be selected using end= option
/* Select first and last Observations */
data select_first_last_obs_DIM;
set DIM end=last_obs;
if _N_=1 OR last_obs=1 then output;
run;
/* view dataset */
proc print data=select_first_last_obs_DIM; run;
7. Select First N Observations Using PROC SQL
Proc sql procedure has options to restrict the observations when it tries to create the sas dataset. You can use outobs= or inobs= options with proc sql procedure to select first N observations in SAS.
The INOBS= option is similar to the SAS system option OBS=. The OUTOBS= option restricts the number of rows that PROC SQL displays or writes to a table.
The difference is the INOBS statement limits the data before it even is processed by the query, whereas the OUTOBS limits the data as the query is completed.
The following queries produce the same results.
/* Select N Observations using PROC SQL
(outobs= and inobs= options) */
/* OUTOBS= */
proc sql outobs=5;
create table first_5_obs_DIM as select * from work.DIM;
quit;
/* INOBS= */
proc sql inobs=5;
create table first_5_obs_DIM as select * from work.DIM;
quit;
/* view dataset */
proc print data=first_5_obs_DIM; run;
8. Select N Observations Randomly
This last example demonstrates how to select N observations randomly. Here we will use the proc surveyselect procedure.
The SURVEYSELECT procedure provides a variety of methods for selecting probability-based random samples. The procedure can select a simple random sample or can sample according to a complex multistage design that includes stratification, clustering, and unequal probabilities of selection.
If you do not specify a selection method, PROC SURVEYSELECT uses simple random sampling (METHOD=SRS) by default unless you specify a SIZE statement or the PPS option in the SAMPLINGUNIT statement.
The following example shows how to select 6 random observations from the DIM dataset and store it in the new dataset work.random_obs_DIM.
/* Select N Observations Randomly */
proc surveyselect data=work.DIM
out=work.random_6_obs_DIM
method=srs
sampsize=6
seed=123456789;
run;
/* view dataset */
proc print data=random_6_obs_DIM; run;
FAQ
There are multiple ways to select the first n rows in SAS. You can use the PROC SQL procedure and specify the OUTOBS= option to limit the number of rows in the output. Alternatively, you can use the data step and the N automatic variable to output only the first n observations from the input dataset.
The _N_ automatic variable in SAS is used to count the number of iterations of the data step. It can be used to select the first row or the first n rows of a dataset by using an IF statement. For example, if N = 1 then output; will select only the first row of the dataset.
The FIRST.variable is a temporary variable that indicates whether the current row is the first row of a group defined by the BY statement. It takes the value 1 for the first row of a group and 0 for the others. It can be used to select the first row of a group or to perform calculations based on groups