The index function searches the specified string, from left to right in the given string and returns the position of the first character in the string.
If there are multiple occurrences of a specified string, the index function returns the first position of the first occurrence and ignores the rest of the occurrences.
The index function is mostly used where any specific character, keyword, or phrase you want to identify, and/or extract. It is very helpful when you read raw, scattered, and long text data. For example,
- Log files
- Unorganised text file
- Web scraped data
- Character variable on a SAS dataset
The above list is not limited. According to your use case and requirement you can use the Index function in multiple ways.
There are three variations of the Index function in SAS:
- INDEX
- INDEXC
- INDEXW
The syntax for all three functions are very similar but they produce somewhat different results. Here are there short descriptions:
INDEX v/s INDEXC v/s INDEXW in SAS
- INDEX – It searches a specified substring and returns the position of its first occurrence from the character string.
- INDEXC – It searches characters from a specified substring and returns the position of the first occurrence of any character from a given substring.
- INDEXW – It searches for the exact character, word, or substring mentioned in the specific string and returns the position of the first occurrence.
Let’s understand each one of them in more detail with multiple examples.
Index Function Syntax:
INDEX( source, excerpt)
- Source: This is your input string, or character variable that contains the substring which you want to search for.
- Excerpt: It specifies the character string, variable, or expression to search.
Index Function Examples
Let’s start with a simple example to understand how the Index function works in SAS. For the demonstration, let’s create a dataset named “INDEX_function”.
We’ll create a new numeric variable “position_nr” which will have the position number of a specified string. We will search for the string “free” in the given string “Learn SAS Code for free with me”
How does SAS calculate the character position?
It’s fairly simple and straightforward. It counts each character including the delimiter. Usually in the text string we have default blank as a delimiter.
/* INDEX function in SAS */
data INDEX_function;
position_nr=INDEX("Learn SAS Code for free with me", "free");
run;
proc print data=INDEX_function;
run;
Example 2:
Using the Index function directly on the char variable is even more easy and convenient if you want to search a specific string on the entire column in sas dataset.
We will use a dataset named “SHOES” available in the sashelp library. Observe the SHOES dataset: variable product. We will try to find and calculate the exact position where “Casual” is mentioned in the product name.
/* Calculate position */
data INDEX_function(keep=region product Casual_position_nr);
set sashelp.shoes;
Casual_position_nr=INDEX(product, "Casual");
run;
proc print data=INDEX_function;
run;
This dataset can be further analysed using multiple sas procedures to find meaningful information.
I’m using proc freq to demonstrate to you what you can do with the above output dataset “INDEX_function” generated using INDEX function.
The below query calculates the frequency on a given table INDEX_function using variable Casual_position_nr.
/* calculate frequency on Casual_position_nr */
proc freq data=INDEX_function;
tables Casual_position_nr;
run;
Here is the result and conclusion:
Conclusion:
Here are some conclusions I can draw from the output of the proc freq sas procedure ran on the dataset created using the INDEX function.
- 77.22% that is exactly 305 products which don’t have “Casual” in the product names.
- 11.39% that is 45 products in which “Casual” text appeared on product name from 7th position
- 11.39% that is 45 products in which “Casual” text appeared on product name from 9th position
INDEX v/s INDEXC v/s INDEXW
Along with the INDEX function you have its two more variations developed for their specific use cases. Here is the simple comparison in plain english:
- The INDEXC function searches for the first occurrence of any individual character that is present within the character string, whereas the INDEX function searches for the first occurrence of the character string as a substring.
- The INDEXW function searches for exact word in the string, whereas the INDEX function searches for patterns as separate words or as parts of other words. INDEXC searches for any characters that are present in the excerpts.
It’s easy-peasy to understand this concept with examples.
To show you the exact difference between these three INDEX functions in SAS, let’s go back and take the simplest example again.
We will use the demo string: “Learn SAS Code for free with me”
data INDEX_Overview;
demo_string="Learn sasCode for free with a sas Expert";
/* it searches first occurence of sas* match */
INDEX_fun=index(demo_string, "sas");
/* it searches first occurence of any characters from "s", "a", "s" */
INDEXC_fun=indexc(demo_string, "sas");
/* it searches first occurence of exact match for word "sas" */
INDEXW_fun=indexw(demo_string, "sas");
run;
PS: Whatever you pass as an excerpt which is “sas” in above example, it’s case sensitive. You can’t pass SAS or SaS and expect the same results.
You have to be very careful about passing a complete word when using INDEXW. A few characters from a word won’t work. By default it considers blank as a delimiter.
In case if you have a requirement to search multiple characters from a given string or character variable then you can use the INDEXC function by specifying all the characters you want to search for.
It will look for all the characters and return its first occurrence irrespective of the character order passed through the index function. In the below example indexc will search for characters “-” and “%”.
data INDEX_Overview;
demo_string="Learn %!sas-Code for free with a sas Expert";
INDEXC_fun=indexc(demo_string, "-", "%");
run;
USE CASE:
How to check if string or word or phrase is present in the character variable?
Here is the by far the best use case of the INDEX function. In fact you can use any variation of the INDEX functions according to your need.
Here I’m trying to search text “Dress” on the character variable “product” hence index function is the best option I can use it.
Below is the dataset created by filtering out the data which has text “Dress” in their product name using index function.
data INDEX_function(where=(index_nr >0));
set sashelp.shoes;
index_nr=index(product, "Dress");
run;
proc print data=index_function;
run;
You can play with the where clause to make it even more fun.
You can have fun playing with the WHERE clause such as to find any specific occurrence.
For example, search and filter out data where text “Dress” appears in 7th position on product variable.
FINDC and FINDW Functions:
Alternative for INDEX function is to use FINDC and FINDW functions in SAS that gives you more control and options to search in the string or char variable.
FAQ
To find the exact position of a given text or word, you can use INDEX, INDEXC, or INDEXW functions depending on your requirement.
There are many SAS functions you can use to calculate the position of a specified text, word, or character such as INDEX, INDEXW, INDEXC, FIND, FINDC, or FINDW, etc.