SUBSTR in SAS (The Ultimate Guide)

You can use the SUBSTR() function in SAS to read a part of a string. It can also be used to replace very specific given characters from a string. 

You pass the input string to be processed, it can be a char variable as well, then specify starting position to read and size – that’s how many characters to read.

SUBSTR in SAS is very useful when you’re dealing with unorganized input strings or extract very specific information from a string. 

Syntax:

SUBSTR(input string, start position, number of characters to read);

Two types of SUBSTR in SAS

  1. SUBSTR (right of =) Function
  2. SUBSTR (left of =) Function

1. SUBSTR (right of =) Function in SAS

This is the most common way to use substr function to extract characters from the string. As its name indicates, the substr function you use on the right hand side of the assignment operator.

Following examples will help you to understand how sub-string in SAS works by looking at different use cases.

How to read the First Four Characters From String

This is the classic example of using substr function in SAS. You can easily read first four characters by specifying starting position as 1 and number of characters to read as 4.

/* FIRST FOUR CHARACTERS */
data mystring;
       var_str = "This is my string";
       first_four_chars = substr(var_str, 1, 4);
       put first_four_chars=;
run;
proc print data=mystring; run;
Output
var_str = “This is my string”;
first_four_chars = substr(var_str, 1, 4);
This

How to read the last four characters from string

In substr function you can only specify starting hence first you need to figure out starting position for the last four characters. For that first you need to calculate the length of your string. It can be easily done using the length() function.

Length() Function:

length('input string');

Example:

Output
length(‘This is my string’); 17

Get The Last Four Characters

/* LAST FOUR CHARACTERS */
data mystring;
       var_str = "This is my string";
       str_length = length(var_str);
       last_four_chars  = substr(var_str, length(var_str)-3 ,4);
       put str_length=;
       put last_four_chars=;
run;
proc print data=mystring; run;
Output
var_str = “This is my string”;
last_four_chars = substr(var_str, length(var_str)-3 ,4);
ring

How to read the entire string except last two characters

To read a given string except the last two characters you can read spring from starting position 1 but can be limited by specifying the number of char to read.

The simplest way to limit characters to read is by calculating the total length of the string and subtracting the number of characters to skip + 1.

No of char to read= total length of a string - No of char to skip + 1

In this scenario this equation looks like this:

  • No of char to read = length(string) – 2 + 1
  • No of char to read = length(string) – 1

Example to read a string except the last two characters. 

/* All string except the last two chars */
data mystring;
       var_str = "This is my string";
       all_except_last_two_chars  = substr(var_str, 1, length(var_str)-1);
       put all_except_last_two_chars=;
run;
proc print data=mystring; run;
Output
var_str = “This is my string”;
all_except_last_two_chars = substr(var_str, 1, length(var_str)-1);
This is my stri

How to handle missing values while extracting from string

Substr is also being used while creating data by reading datalines or raw input files. There won’t be any issue using substr function until you encounter missing values or empty input string.

For example in the addr data set below observe the 3rd row where address is missing for Kim Karlsen. Let’s see what happens to substr function if you pass the address variable as an input string and extract postal code. 

/* Handle missing values while extracting from string */
data addr;
input Name $20. address $20.;
cards;
Jan Peter           NY PO-BOX USA 100023
Paul Andre          LA PO-BOX USA 220634
Kim Karlsen        
Laura Paul          DC PO-BOX USA 430692
;
run;

data addr_PostCode;
set addr;
PostCode=SUBSTR(address, length(address)-6, 7);
put PostCode=;
run;

Execute the above code and check out the logs. You will find some weird NOTE printed by SAS because of the length() and substr functions. 

The SAS can’t handle empty strings passed through the SUBSTR function along with calculating the length of the empty string, hence this NOTE. Depending on which environment you use, your program might fail and abrupt the execution. 

In SAS studio this code doesn’t fail but it does print the following NOTE in the log.

SOLUTION:

There is a solution to this problem. You can use the SUBSTRN function to handle this scenario. 

SUBSTRN function: It returns a substring, allowing  a result with a length of zero.

data addr_PostCode;
set addr;
PostCode=SUBSTRN(address, length(address)-6, 7);
put PostCode=;
run;
SUBSTRN function in SAS

How to get last N digits from a numeric variable

The SUBSTR() / SUBSTRN() function works only with character variables. For the numeric variable you need to first convert it into char string and then you can pass it through substr function as an input string.

In the following example employeeID is a numeric variable. You can use the put function to convert numeric employee ids into character employee ids. 

PUT(variable, informat);

With this converted string you can extract the last four digits but the answer you get is also in the char format. You might need to convert back into numeric format. You can use the input function to convert char into numeric.

INPUT(variable, format);

Example to Extract last 4 digits from a numeric variable

data employee;
input employeeID;
datalines;
1035423413
.
1205429432
1835413551
;
run;


data new_employee;
set employee;
	employeeID_char = put(employeeID, 10.); /* Convert numeric into char */
	last_four_digits = input(substrn(employeeID_char,length(employeeID_char)-3,4),8.);
drop employeeID_char;
run;
proc print data=new_employee;
	title 'Example: Get last 4 digits from a numeric variable';
run;
Extract last four digits from string in SAS using SUBSTR function.

2. SUBSTR (left of =) Function in SAS

The substr function used on the left hand side of the assignment operator gives you opportunity to replace mentioned characters from the given position.

The simplest example could be replacing “Code” with “Num:” from a sample string “Pin Code 411014”

Output
sample_str = “Pin Code 411014”;
SUBSTR(sample_str, 5, 4) = “Num:”;
Pin Num: 411014

SAS Code:

data example;
sample_str = "Pin Code 411014";
SUBSTR(sample_str, 5, 4) = "Num:";
run;

proc print data = example;
run;

Example 2: This is a bit of a complex example where we will do some conditional checks on the president variable and accordingly print a message using substr in SAS.

data example_2;
input name $20. president $4.;
datalines;
GEORGE WASHINGTON   YES
THOMAS JEFFERSON    YES
BENJAMIN FRANKLIN   NO
;
run;

Condition

  • If value on president variable starts with “N” then print name along with message ‘Was not President of the USA’
  • If value on president variable starts with “Y” then print name along with message ‘Was President of the USA’
data example_new;
 set example_2;
  if upcase(substr(president, 1, 1))= 'N' then
  text_msg=name || 'Was not President of the USA';
  if upcase(substr(president, 1, 1))= 'Y' then
  text_msg=name || 'Was President of the USA';
run;

proc print data=example_new; 
title "SUBSTR (left of =) function in SAS (example_new data set)";
run;
substr (left of=) function in sas

You can also achieve the same by using “=:” instead of substr function. It generates the same output as the previous example.

data example_new;
 set example_2;
  if upcase(president)=: 'N' then
  text_msg=name || 'Was not President of the USA';
  if upcase(president)=: 'Y' then
  text_msg=name || 'Was President of the USA';
run;

How to get the first character from the string

Use the first() function in SAS to extract the first character from the given string.

Syntax:

FIRST(‘string’);

SAS Code:

/* First() function */

data mystring;
       var_str = "This is my string";
       first_char = first(var_str);
       put first_char=;
run;

Example:

Output
first(“This is my string”);T

That’s all about the substr in SAS.


FAQ:

Can we pass numeric value to SUBSTR() function in SAS?

No. You can’t. SUBSTR() function in SAS only works with the character variable. Though you can first convert numeric variable into character variable using PUT() function and then you can pass it to substr function. It works after conversion.

How to extract last N digits from a numeric variable in SAS?

SUBSTR() function only works with the character variable. In order to extract last N digits you need to first convert numeric variable into char variable using PUT() function before passing it to substr function.

Here is the classic example of how to extract last 4 digits from a numeric variable in SAS.