There are multiple methods that can be used to combine or append two or more datasets in SAS. In this article you’ll learn 5 simple ways to combine and append SAS datasets as follows.
- Appending Or Concatenating Two Datasets Vertically – Stacking The Datasets
- Appending Or Concatenating Stacked Sorted Data Values
- PROC APPEND To Concatenate Datasets
- PROC DATASETS To Concatenate Datasets
- PROC SQL To Concatenate Datasets
We will explore each of these methods to combine and append datasets in SAS using the following sample data sets work.University1 and work.University2
/* create sample datasets */
data University1;
input student $ grade $;
datalines;
Jackobe A
Kjell B
Morten A
Steven C
Peter A
;
proc print;
title 'Sample Dataset: work.University1';
run;
data University2;
input student $ grade $;
datalines;
Anna B
Jan A
Nils A
Kristin C
Michael A
;
proc print;
title 'Sample Dataset: work.University2';
run;
5 Ways To Combine and Append SAS Datasets
1. Appending Or Concatenating Two Datasets Vertically – Stacking The Datasets
You can append data vertically using the SET statement in the data step. It reads an observation from one or more SAS data sets. You can also say, it stacks the multiple datasets one after another.
The best part is you can combine or append many datasets by listening after the SET statement.
In this example two sample datasets University1 and University2 will be stacked vertically and created a new dataset named work.CombinedUniversity
/* Appending Or Concatenating Two Datasets Vertically - Stacking the datasets */
data CombinedUniversity;
set University1 University2;
run;
/* view dataset */
proc print data=CombinedUniversity;
title 'CombinedUniversity Dataset';
run;
2. Appending Or Concatenating Stacked Sorted Data Values
This method is quite similar to the previous one. The only minor difference here is it stack the rows after sorting the dataset. It means rows from multiple datasets will append together but after sorting it out on a specified BY variable in the data step.
The BY statement controls the operation of a SET, MERGE, MODIFY, or UPDATE statement in the DATA step and sets up special grouping variables.
Prerequisite: Before you use BY statement to combine or append multiple datasets with SET statement, all of your input datasets must be sorted on BY variable.
In this example two input datasets University1 and University2 must be sorted on BY variable “student” using the PROC SORT procedure.
/* sorting datasets on BY variable */
proc sort data=University1;
by student;
run;
proc sort data=University2;
by student;
run;
/* Appending Or Concatenating Stacked Sorted Data Values */
data CombinedUniversity;
set University1 University2;
by student;
run;
/* view dataset */
proc print data=CombinedUniversity;
run;
If you compare the output of this query with the previous method, you can clearly see the sequence of the rows has changed due to the sorted stacked method used to combine and append datasets.
3. PROC APPEND To Concatenate Datasets
This is a very popular SAS procedure when it comes to appending the two datasets. The only drawback with this method is, it can only appends two datasets at a time. Whereas first methods can combine and append two or more dataset at the same time.
You could append more than two datasets using the proc append method but you have to write this procedure multiple times in order to cover all the datasets which you want to append.
In the proc append procedure base=dataset is the master dataset and data=dataset is the dataset which is to be appended to the master dataset.
If the master-dataset doesn’t exist then it creates a new dataset and appends the data. In this example work.CombinedUniversity dataset does not exist in the beginning so when the first proc append procedure runs it creates work.CombinedUniversity dataset and appends work.University1 dataset.
The second proc append procedure appends the work.University2 dataset to the CombinedUniversity dataset.
/* PROC APPEND To Concatenate Datasets */
proc append base=CombinedUniversity data=University1;
run;
proc append base=CombinedUniversity data=University2;
run;
/* view dataset */
proc print data=CombinedUniversity;
run;
Did you know? How to use the PROC APPEND in 10 different ways.
4. PROC DATASETS To Concatenate Datasets
This is a slightly different version of the proc append procedure. With the PROC DATASETS procedure you can use the APPEND statement to append multiple datasets.
Here also for the first time if base=dataset doesn’t exist then it creates a new dataset and appends data from data=dataset. You must write the multiple append statement to append more than one datasets.
/* PROC DATASETS To Concatenate Datasets */
proc datasets lib=work memtype=data;
append base=CombinedUniversity data=University1;
append base=CombinedUniversity data=University2;
run;
quit;
/* view dataset */
proc print data=CombinedUniversity;
title 'CombinedUniversity Dataset';
run;
Did you know? How to use the PROC DATASETS in 10 different ways.
5. PROC SQL To Concatenate Datasets
You can use the PROC SQL method to combine and append multiple datasets in SAS. There are two simple methods which can be used to combine and append SAS datasets using proc sql procedure.
- PROC SQL With UNION ALL option
- PROC SQL With OUTER UNION Corr option
1. PROC SQL With UNION ALL option
You can select all the observations from two or more datasets and combine them together using the UNION ALL option.
The UNION option processes unique rows from both the queries. But you can preserve duplicate rows as well using ALL keyword.
The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query-expression’s performance.
/* PROC SQL To Concatenate Datasets: using union all */
proc sql;
create table combineduniversity as
select * from University1
union all
select * from University2;
quit;
proc print;
title 'A UNION ALL B';
run;
2. PROC SQL With OUTER UNION Corr Option
The OUTER UNION option concatenates the query results. The CORR which is also known as CORRESPONDING option causes PROC SQL to match the columns in table-expressions by name and not by ordinal position.
/* PROC SQL To Concatenate Datasets: using outer union corr */
proc sql;
create table CombinedUniversity as
select * from University1
outer union corR
select * from University2 ;
quit;
proc print;
title ' A OUTER UNION CORR B';
run;
FAQ
Yes, you can combine or append datasets with different structures in SAS, but you need to ensure that the variables you are combining or appending are compatible.
If there are conflicts when combining or appending datasets in SAS, such as variables with the same name but different types, SAS will give an error. You need to resolve these conflicts before combining or appending.
Combining and appending datasets in SAS involves bringing together data from multiple datasets into one. This can be done either by adding new observations (appending) or new variables (combining).