In SAS you can import data from flat files, and also extract data from various other databases such as sql, oracle, db2, and so on. Sometimes you extract what is not necessary for you.
The best example is blank rows coming from source. The empty rows are of no use hence it is important to identify and remove empty columns in SAS.
In this article you’ll find the best and most efficient way to identify and remove empty columns in SAS.
The following sample dataset is created to demonstrate the process with the example.
/* create a dataset */
data myData;
input num_var1 num_var2 num_var3 char_var4 $ 8-11 char_var5 $ 12-13;
datalines;
1 . 10 A
2 . 60 B
3 . . C
4 . 50 D
5 . 40 E
;
run;
/* view dataset*/
proc print data=myData;
run;
If you observe the above dataset, you have got three numeric variables (num_var1, num_var2, num_var3) and two character variables, char_var4 and char_var5
How To Identify Empty Columns
Now your task is to identify blank empty columns from this dataset and remove those columns. Identification of empty columns can be done by observing the dataset and browsing through all the observations.
It’s absolutely possible when you have smaller datasets but it’s difficult when it comes to large datasets with millions of observations.
When you see the above dataset, you can clearly see the empty columns.
The empty columns in this example:
- num_var2
- char_var4
PROC FREQ Procedure
The PROC FREQ procedure in BASE SAS is used primarily for counting, displaying and analyzing categorical type data. You can use proc freq procedure to identify empty columns very easily.
All you need is to apply proc freq procedure on all the columns present in the dataset using _all_ . You also need to use the NLEVELS option with proc freq procedure.
Number of Variable Levels Table
If you specify the NLEVELS option in the PROC FREQ statement, PROC FREQ displays the “Number of Variable Levels” table. This table provides the number of levels for all variables named in the TABLES statements.
PROC FREQ determines the variable levels from the formatted variable values. The “Number of Variable Levels” table contains the following information:
- Variable name
- Levels, which is the total number of levels of the variable
- Number of Nonmissing Levels, if there are missing levels for any of the variables
- Number of Missing Levels, if there are missing levels for any of the variables
The following code demonstrates how you can use the simple PROC FREQ procedure with NLEVELS option on the sample dataset work.myData.
/* proc freq with nlevels */
proc freq data=work.myData nlevels;
tables _all_ / missing noprint;
quit;
If you look at the above results generated by proc freq with nlevels option, it shows the indication of empty columns if you filter “Nonmissing Levels”=0 from the displayed output.
The problem with this method is it generates results but doesn’t create a dataset. You need to find a way to put the output result into a dataset.
To create a new dataset with the result output from proc freq procedure, you can use the following ODS statement with nlevels option.
ods output nlevels=work.<sas-data-set-name>;
The above statement can be added in the beginning of proc freq and it generates a new dataset with the desired output data in it.
After proc freq you can add the ODS CLOSE statement. It is not necessary but just for the clarity and readability of the code it’s better to add it.
In the following code, proc freq is applied on the columns and output results stored in the new dataset called work.myfreq_data which can be used further to remove empty columns in SAS.
/* ods statement */
ods output nlevels=work.myfreq_data;
/* proc freq with nlevels option*/
proc freq data=work.myData nlevels;
tables _all_ / missing noprint;
quit;
ods output close;
/* view dataset */
proc print data=work.myfreq_data ;
title 'dataset work.myfreq_data';
run;
How To Remove Empty Columns In SAS
Now you have a proc freq output dataset that holds the information about empty columns. You can use this dataset to create macro variable “allmiss” –that will hold list of empty column names. It can be done through simple proc sql code.
There are two different ways you can remove empty columns. Let’s learn more about both the methods which are explained below.
Method 1: Remove Empty Columns Using DATA Step
In this method we first create a macro variable that holds the list of empty column names and then use it in the data step with the drop statement. This method simply recreates the same dataset excluding the empty columns.
The following code creates a macro variable “allmiss” with empty column names separated by blank spaces and the same macro variable is used in the data step.
/* list empty column names */
proc sql noprint;
select tableVar into :allmiss separated by ' '
from work.myfreq_data where NNonMissLevels=0;
quit;
%put Variables with all missing values, ALLMISS=&allMiss;
/* data step: remove empty columns in sas*/
data myData_NEW;
set myData(drop=&allmiss); /* drop empty columns */
run;
As this method recreates the existing dataset the execution time can be higher compared to the next method-2 explained below.
Method 2: Remove Empty Columns Using PROC SQL – ALTER Statement.
In this method we first create a macro variable that holds the list of empty column names and then use it in the proc sql procedure with the ALTER & DROP statements. This method simply deletes the empty columns.
The following code creates a macro variable “allmiss” with empty column names separated by comma and the same macro variable is used in the proc sql.
/* list empty column names */
proc sql noprint;
select tableVar into :allmiss separated by ','
from work.myfreq_data where NNonMissLevels=0;
quit;
%put Variables with all missing values, ALLMISS=&allMiss;
/* proc sql: remove empty columns in sas*/
proc sql;
alter table myData drop &allmiss;
quit;
This method is much faster than the previous method as it only deletes the mentioned columns after the DROP statement.
FAQ
You can use the PROC FREQ procedure with the nlevels option to count the number of distinct values in each column. It works on both character and numeric variables.
You can identify empty columns using the PROC FREQ with NLEVELS option and remove them with a DROP statement in either data step or proc sql-alter statement.