How To Compare (matching/non-matching) Rows From SAS Data sets

You can compare and find the matching and non-matching rows from two datasets in SAS using data merge technique. In the data merge technique data step statement is used with merge statement.

In this method you can compare multiple datasets by listing them after the MERGE statement and create new datasets for matching and non-matching rows.

While comparing and creating matching and non matching rows from sas datasets, you can relate these operations with how you join the datasets. It works similar to SQL Joins.

How Does DATA MERGE Work?

Use the MERGE statement in the Data step along with the IN= parameter. The IN= option creates the temporary numeric variable and assigns the values 1 or 0 according to the BY variable value occurrences.

Syntax:

/* data merge in sas */

data output-data-set;
    merge dataset-1(in=parameter) dataset-2(in=parameter);
    by id-variables;
run;

To demonstrate further multiple scenarios and use cases to compare matching and non-matching rows from SAS datasets, the below sample datasets will be used. 

/* create two sample datasets */
data ds_one(label='First Data Set');
	input student_id $ marks major $;
	datalines;
10 87 Maths
20 92 AR/VR
30 92 History
80 95 Art
;
run;

data ds_two(label='Second Data Set');
	input student_id $ marks major $;
	datalines;
10 87 Maths
20 92 AR/VR
30 91 Science
40 94 Music
50 96 Robotics
60 98 AI/ML
;
run;

/* view datasets */
proc print data=ds_one;
	title 'First Data Set - work.ds_one';

proc print data=ds_two;
	title 'Second Data Set - work.ds_two';
run;
sample datasets for data merge in sas

DATA MERGE Simple Example

Before we see advanced examples, let’s start with the basic example where data merge is used and a new temporary created variable from IN=parameter stored in a new variable.

NOTE: To compare datasets using DATA MERGE technique, all the input datasets must be sorted on the BY variables. This is prep-mandatory step you have to execute.

/* sort data on BY variable */
proc sort data=ds_one;
	by student_id;
run;

proc sort data=ds_two;
	by student_id;
run;

/* data merge simple example */
data output_ds;
	merge ds_one(in=X) ds_two(in=Y);
	by student_id;
	new_varX=X; /* assigns value 1 when student_id present in ds_one otherwise 0 */
	new_varY=Y; /* assigns value 1 when student_id present in ds_two otherwise 0 */
run;

/* view dataset */
proc print data=output_ds; run;
data merge IN= parameter dataset in sas
  • new_varX=X; – It assigns value 1 when student_id present in ds_one otherwise 0
  • new_varY=Y; – It assigns value 1 when student_id present in ds_two otherwise 0

Compare Datasets And Find Matching Rows (Inner Join)

The matching rows from both the datasets can be done using data merge and IN= parameter. Now you know how In= parameter works and how it flags the rows. You can use that information to find matching rows.

In other words, you can join the two dataset using INNER JOIN and extract common records based on BY variables. It generates the same output as it generates when proc sql – inner join is used.

/* sort data on BY variable */
proc sort data=ds_one; by student_id;
run;
proc sort data=ds_two; by student_id;
run;

/* Find Matching Rows in SAS */
data matching_rows;
	merge ds_one(in=X) ds_two(in=Y);
	by student_id;
	if X=1 and Y=1; /* inner join */
run;

/* view dataset */
proc print data=matching_rows; run;
compare datasets and find matching rows in sas

Matching row selection conditions can be written in some different ways as well without directly referring to 1 or 0. Following condition produces the same result:

if X and Y;

Compare Datasets And Find Non-Matching Rows

This method is exactly opposite of the previous method. Here you’re only interested in finding non-matching values from both the datasets which will be extracted based on the BY variable.

In this example all the matching student_id’s will be ignored and non-matching, unique student_id’s from both the datasets will be extracted.

/* Find non-matching rows in SAS */
data nonmatching_rows;
	merge ds_one(in=X) ds_two(in=Y);
	by student_id;
	if (X=1 and Y=0) OR (X=0 and Y=1); /* extract non-matching rows */
run;

/* view dataset */
proc print data=nonmatching_rows; run;
compare datasets and find non matching rows in sas

Non-matching row selection conditions can be written in some different ways as well without directly referring to 1 or 0.

The below condition:

  • if (X=1 and Y=0) OR (X=0 and Y=1); 

Can be written in following ways that produces the same result:

  • if (X and NOT Y) OR (Y and NOT X); 
  • if NOT (X=1 and Y=1);
  • if NOT (X and Y);

Compare Datasets With The Left Merge

This method is similar to SQL – LEFT Join. The left merge compares datasets and checks data present in the left that is first dataset and only fetch matching rows from the right that is second dataset.

Comparison will be done based on BY variable values. Don’t forget to sort both the datasets on the BY variable before comparing it.

In the below example all the student_ids from the left dataset ds_one will be considered and extracted. 

For the variables which are common in both the datasets, for the matching student_ids data will be extracted from the second dataset ds_two – ALWAYS. For non-matching id’s data will be extracted from the first dataset ds_one.

/* LEFT MERGE in SAS */
data leftmerge_rows;
	merge ds_one(in=X) ds_two(in=Y);
	by student_id;
	if X; /* extract all ids from left (first) dataset and matching rows from right (second) dataset */
run;

/* view dataset */
proc print data=leftmerge_rows; run;
Compare datasets with left merge in SAS

In the above result, pay close attention to student_id 30 and check corresponding values.

The first two observations have exact match between the datasets but for the third observation (student_id: 30), data extracted from the second dataset ds_two.

Compare Datasets With The Right Merge

This method is similar to SQL – RIGHT Join. The right merge compares datasets and checks data present in the right that is the second dataset and only fetches matching rows from the left that is the first dataset.

Comparison will be done based on BY variable values. Don’t forget to sort both the datasets on the BY variable before comparing it.

In the below example all the student_id from the right dataset ds_two will be considered and extracted. 

For the variables which are common in both the datasets, for the matching student_ids data will be extracted from the second dataset ds_two – ALWAYS. For non-matching id’s data will be extracted from the right second dataset ds_two.

/* RIGHT MERGE in SAS */
data rightmerge_rows;
	merge ds_one(in=X) ds_two(in=Y);
	by student_id;
	if Y; /* extract all ids from right (second) dataset and matching rows from left (first) dataset */
run;

/* view dataset */
proc print data=rightmerge_rows; run;
Compare datasets with right merge in SAS

In the above result, pay close attention to student_id 30 and check corresponding values.

The first two observations have exact match between the datasets but for the third observation (student_id: 30), data extracted from the second dataset ds_two – similar to the previous example.

Compare Datasets With The Full Merge

This method is similar to SQL – FULL Join. The full merge compares datasets, and lists student_Id’s from both the datasets and extracts data from the right that is second dataset ds_two for the corresponding common variables. 

When you execute a full merge in SAS you don’t need to specify any condition. It automatically extracts values based on BY variables from both the dataset for the comparison.

/* FULL MERGE in SAS */
data fullmerge_rows;
	merge ds_one(in=X) ds_two(in=Y);
	by student_id;
run;

/* view dataset */
proc print data=fullmerge_rows; run;
Compare datasets with full merge in SAS

In case if you want to specify conditions for the full merge then you can use any of the following conditions or just don’t mention anything. All these conditions produce the same result. 

  • If X in (1,0) OR Y in (1,0)
  • If X OR Y

FAQ – DATA MERGE To Compare SAS Datasets

How can I compare two datasets in SAS? 

Use the PROC COMPARE procedure to compare two datasets. It compares the contents of two SAS datasets, selected variables in different datasets, or variables within the same dataset. It determines matching variables and matching observations.

How can I write matched and non-matching observations from merged datasets to different new datasets?

Use the MERGE statement to merge the datasets and then use the IF-THEN statement to write matched and non-matching observations to different new datasets. For more details check out this article. 

How can I compare two datasets irrespective of the order of records? 

Sort both datasets by the same variable and then use the PROC COMPARE procedure or Data merge method to compare the sorted datasets. This will ensure that the order of records is not considered while comparing the datasets.