The WHERE statement can be used to filter out or subsetting the data based on specific conditions. It specifies specific conditions to use to select observations from a SAS dataset.
Using the WHERE statement might improve the efficiency of your SAS programs because SAS is not required to read all observations from the input data set.
The WHERE statement cannot be executed conditionally. That is, you cannot use it as part of an IF-THEN statement.
WHERE statements can contain multiple WHERE expressions that are joined by logical operators such as OR, AND, GE, LE, LIKE, etc
Syntax:
/* syntax */
WHERE <Where-expression logical-operator..> ;
Syntax Description:
Where-expression : It is an arithmetic or logical expression that consists of a sequence of operators, operands, and SAS functions. An operand is a variable, a SAS function, or a constant.
An operator is a symbol that requests a comparison, logical operation, or arithmetic calculation. The expression must be enclosed in parentheses and the logical-operator can be AND, OR, NOT, etc
Comparison:
- = or EQ equal to
- ^=, ¬=, ~=, or NE not equal to
- > or GT greater than
- < or LT less than
- >= or GE greater than or equal to
- <= or LE less than or equal to
- IN equal to one of a list
Logical (Boolean)
- & or AND logical and
- | or OR logical or
- ~,^ , ¬, or NOT logical not
WHERE Expression Only
- BETWEEN–AND an inclusive range
- ? or CONTAINS a character string
- IS NULL or IS MISSING missing values
- LIKE match patterns
- =* sounds-like
Benefits Of Using WHERE Statement
You can use the WHERE command in SAS/FSP software to subset data for editing and browsing.
The WHERE statement selects observations before they are brought into the program data vector. The WHERE statement can produce a different data set from the sub-setting IF when a BY statement accompanies a SET, MERGE, or UPDATE statement.
A DATA or PROC step attempts to use an available index to optimize the selection of data when an indexed variable is used in the WHERE expression.
The following sample dataset will be created and used to demonstrate various use cases of how to use WHERE statement in SAS.
/* create sample dataset */
data cars;
keep Make Model Type Origin DriveTrain MSRP Invoice;
set sashelp.cars;
run;
/* view dataset */
proc print data=cars;
run;
Example 1: Use WHERE Statement In The Data Step
This example produces a subset of a dataset that contains only BMW Sports cars. Where statement is applied on make and type variables with “AND” operator.
/* where statement in sas data step */
data BMW_Sports_cars;
set cars;
where make='BMW' and type='Sports';
run;
/* view dataset */
proc print data=BMW_Sports_cars;
run;
Example 2: Use WHERE Statement With SET Statement
This is the exactly similar example of what we have seen earlier. The Where statement is shifted to the SET statement.
It functions the same and produces the same subset of dataset that contains only BMW Sport cars details.
/* where statement with SET statement in data step */
data BMW_Sports_cars;
set cars (where=(make='BMW' and type='Sports'));
run;
/* view dataset */
proc print data=BMW_Sports_cars;
run;
Example 3: Use WHERE Statement With IN condition
You can use IN condition to specify one or more values to filter out the input data to create a new SAS dataset.
When you’re completely not sure about the case sensitivity of the values it’s recommended to use UPCASE function and pass the input where expression text in upper case.
In this example we are not sure how “audi” text appears on the “make” variable so we have applied UPCASE on that variable and put the AUDI text in uppercase.
Similar thing has been done for the text value “FRONT” on the “drivetrain” variable.
/* where statement with IN condition */
data selected_cars;
set cars;
where upcase(make) in ('AUDI') and upcase(drivetrain) in ('FRONT');
proc print;
run;
Example 4: Use WHERE Statement With NOT IN Condition
This is the negative expression of the IN condition. When you use NOT IN it exclude those listed values then create a dataset with the rest of the data.
In the below example we have used NOT IN on “type” variable to exclude car type “SEDAN” but also have added “AND” operator to only select AUDI cars.
/* where statement with NOT IN condition */
data selected_cars;
set cars;
where upcase(type) not in ('SEDAN') and upcase(make) in ('AUDI') ;
proc print;
run;
Example 5: Use WHERE Statement With Value Range
This is usually applicable on numeric variables where you specify the value range using GT, LT, GE, LE, etc conditions.
- > or GT greater than
- < or LT less than
- >= or GE greater than or equal to
- <= or LE less than or equal to
In this example we are applying range values on the MSRP variable to select rows which have MSRP values greater than or equal to 85,000 and less than or equal to 100,000.
/* where statement with value range */
data selected_cars;
set cars;
where MSRP >=85000 and MSRP <=100000;
proc print;
run;
Example 6: Use WHERE Statement With BETWEEN-AND
The between-and condition functions similar to GE and LE conditions described in the previous example. It means if you use the same value range that is 85,000 and 100,000 it will produce the same output.
Here is the example where we have applied condition: “MSRP between 85000 and 100000” which functions same as “MSRP >=85000 and MSRP <=100000”
/* where statement with between-and condition*/
data selected_cars;
set cars;
where MSRP between 85000 and 100000;
proc print;
run;
Example 7: Use WHERE Statement With LIKE Operator
You can use the LIKE operator to search for specific characters or words in the character variable to select the rows. In this example we are looking for rows where we have an “auto” word in the “model” variable.
The upper function is used to avoid case sensitivity issues. We want to select all the rows where the “model” variable has any variant of this word such as “Auto”, AUTO”, “auto” , and so on.
The percentage sign (%) means the “auto” text may have some other text in the beginning or at the end of this word.
/* where statement with like operator */
data selected_cars;
set cars;
where upcase(model) like '%AUTO%';
proc print;
run;
You can use the STRIP function to suppress the blank spaces from the string data. The STRIP function returns the argument with all leading and trailing blanks removed.
It’s always good to use and make sure you don’t miss any rows when you try to filter based on words or character.
/* like operator with STRIP function*/
data selected_cars;
set cars;
where strip(upcase(model)) like '%AUTO%';
proc print;
run;
LIKE Operator
The LIKE operator selects observations by comparing the values of a character variable to a specified pattern, which is referred to as pattern matching. The LIKE operator is case sensitive. There are two special characters available for specifying a pattern:
- percent sign (%)
It specifies that any number of characters can occupy that position. We have already demonstrated how you can use the LIKE operator with the percent sign(%).
You can select the car type name which starts with character “H” by using the following code.
/* where clause with matching pattern */
data selected_cars;
set cars;
where strip(upcase(type)) like 'H%';
proc print;
run;
- underscore (_)
It matches just one character in the value for each underscore character. You can specify more than one consecutive underscore character in a pattern, and you can specify a percent sign and an underscore in the same pattern.
For example, you can use different forms of the LIKE operator to select character values from this list of first names:
- Diana
- Diane
- Dianna
- Dianthus
- Dyan
The following table shows which of these names is selected by using various forms of the LIKE operator:
Pattern → Name Selected
- like ‘D_an’ → Dyan
- like ‘D_an_’ → Diana, Diane
- like ‘D_an__’ → Dianna
- like ‘D_an%’ → all names from list
Example 7: Use WHERE Statement With CONTAINS Operator
The CONTAINS functions similar to the LIKE operator. The syntax of the contains operator is different. You don’t need to specify percent % sign.
In the below example rows will be selected based on the model variable where “auto” words are found.
/* where clause with contains operator */
data selected_cars;
set cars;
where upcase(model) contains 'AUTO';
proc print;
run;
Example 8: Use WHERE Statement With Sounds-like Operator
This sounds-like operator is very handy when you’re completely not sure about the exact pattern you are trying to compare.
Sometimes you can make spelling mistakes but it can be taken care of by a sounds-like operator.
In the below example we want to select TOYOTA cars. The following WHERE statement in which we have misspelt TOYOTA name but still it produces the correct results.
- where upcase(make) =* ‘TYOTA’;
- where upcase(make) =* ‘TYOTO’;
- where upcase(make) =* ‘TOYOTO’;
/* where statement with Sounds-like operator */
data selected_cars;
set cars;
where upcase(make) =* 'TYOTA';
proc print;
run;
Example 9: Use Where Statement With TRUE Dummy Expression
There would be some circumstances where you want to add a where clause but don’t want to specify any condition then you can use this dummy, always TRUE condition 1=1. It selects all the rows.
It can be used in the PROC SQL or SQL Joins with WHERE clause.
/* where statement with always true dummy expression */
data selected_cars;
set cars;
where 1=1; /* selects all the rows*/
run;
Example 10: Use Where Statement With IS MISS/ IS NOT Missing
You can use the IS MISSING/IS NOT MISSING condition on both numeric as well as character variables to compare and select the rows.
In our example cars dataset doesn’t have any missing values on any of the variables but you can run the following codes to observe the output.
The IS MISSING and IS NOT MISSING checks applied on the MSRP variable for the demonstration purpose.
/* is missing */
data selected_cars;
set cars;
where MSRP is missing;
run;
/* is not missing */
data selected_cars;
set cars;
where MSRP is not missing;
run;
Example 11: WHERE Clause In PROC SQL Procedure
The examples we have seen above with the data step can be used with the PROC SQL procedure as well.
Here is an example of how you can use proc sql to compare “make” and “type” variables for specified values and extract the rows to create a new dataset.
/* where clause in proc sql */
proc sql;
create table BMW_Sports_cars as select * from cars
where make='BMW' and type='Sports';
quit;
proc print data=BMW_Sports_cars;
run;
FAQ – How To Use WHERE Statement In SAS
A WHERE statement in SAS is used to subset data based on certain conditions. It allows you to select specific observations from a dataset.
You should use a WHERE statement in SAS when you want to analyze or manipulate a subset of your data based on specific criteria.
Yes, you can use multiple conditions in a WHERE statement in SAS. These conditions can be combined using logical operators like AND, OR, and NOT.
In a WHERE statement in SAS, you can use conditions that compare variables to constants, compare variables to other variables, or check if a variable is missing.