4 Ways to Create Index on SAS Dataset

An Index is the most important part of a sas dataset but it’s kind of optional. As a beginner you might ignore this but as soon as you learn more about SAS you’ll understand the importance of having Index on a SAS dataset.

Why to Create Index on SAS Dataset?

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. 

Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

In SAS, an index is an optional SAS file that enables you to directly locate and access specific observations in a dataset without processing every row sequentially.

It means if you try to extract data from a big dataset on an indexed column, your query executes faster and you get a quick result. 

A good index will allow your programs to quickly access the subset of SAS observations that you need from a large SAS data set. This will dramatically improve the speed and efficiency of your SAS programs.

What circumstances Indexes are being used?

Indexes are majorly used for three types of processing:

  • WHERE Statements in PROC Steps
  • Match Merging
  • BY Statement Processing in DATA/PROC Steps

When Indexes are created on the dataset, SAS automatically uses indexes to efficiently select cases if you use a WHERE clause in PROC Step.

In match merging SAS uses indexes to select the desired cases which are specified in one dataset that are also in the larger database.

SAS also automatically uses indexes in BY processing in DATA Steps and PROC Steps. It requires data to be sorted on input. BY processing is very effective when you want to use first. and last. In the program.

2 Types of Indexes in SAS

When you create an index, you mention which variable or variables data to be indexed within that dataset. An Indexed variable is called a key variable. There are two types of indexes you can create:

  1. Simple Index
  2. Composite Index

Simple Index

This is the most common index you create which consists of values of one variable. This variable can be a numeric or character and the index name should be the same as the variable name. 

For example, if you want to create a simple index on column EmployeeID then the simple index name must be the same “EmployeeID”.

Composite Index

A composite index is an index created on two or more variables with their values concatenated to form a single value. These variables can be numeric, or character, or in a combination. 

For example, you can create a composite index on columns FirstName, and LastName. You can give Index name anything you want.

When to Create Simple Index or Composite Index?

When you are deciding whether to create a simple index or a composite index, consider how you will access the data. If you often access data for a single variable, a simple index will do. But if you frequently access data for multiple variables, a composite index could be beneficial.

Unique Values

You can advance your index by specifying a unique option to guarantee that values on a simple index column or values on combination of columns in composite Index. This is very effective when you want SAS to reject all the duplicates while updating the dataset on the basis of index columns.

Missing Values

This is another option to advance your index by specifying the NOMISS option while creating an index. It will make sure missing values are not maintained by Index. Hence index requires less space.

How to Create an Index on SAS Dataset?

There are multiple ways you can create indexes on SAS dataset but here are top 4 methods to create indexes. It depends on your situation and scenario, you can decide which method to use to create indexes on your sas dataset.

  1. Create an Index on a New Dataset using the DATA Step
  2. Create an Index on a New Dataset using the PROC SQL
  3. Create an Index on an existing dataset using PROC SQL
  4. Create an Index on an existing dataset using PROC DATASET

Pretty simple. right?

Yes. It is. 

Let’s explore each of these methods with SAS code examples to understand SAS Indexes in an even better way.

1. Create an Index on a New Dataset using the DATA Step

There are majorly two ways to create index on SAS dataset. The first one is Data step and Proc sql. 

You create an index using SAS data step option index= . This option enables you to create a simple or composite index, or combination of simple & composite indexes on SAS dataset.

You can also use UNIQUE and NOMISS options to constraint the index.

Simple Index

A simple index is easy and commonly created index on SAS dataset. INDEX= option is used followed by variable name to create a simple index. A variable can be numeric or character and index name should be the same as variable name.

Let’s go through the index creation process with SAS code examples.

For demonstration purpose, we’re using sashelp.ZIPCODE dataset available in sashelp library. 

Here you create simple index “ZIP” on values on column “ZIP”. Since column name is ZIP, SAS gives the same name for index. If you’re explicitly specifying the index name then make sure it should match with column name.

data work.temp_zipcode (index=(ZIP));
set sashelp.ZIPCODE;
run;

proc contents data=work.temp_zipcode; quit;

The result of the proc contents procedure displays the SAS dataset index details.

Simple index

Composite Index

The composite index to be created based on values of multiple variables. It is also created using INDEX= option. You get the opportunity to name a composite index by your choice. 

A composite index starts with INDEX= followed by composite index name = followed by list of variables separated by space. 

Here is an example of how you can create a composite index in data step using index option. The Index StateCityZip created based on the values of columns State, City, and Zip.

data work.temp_zipcode (index=(StateCityZip=(State City ZIP)));
   set sashelp.ZIPCODE;
run;

proc contents data=work.temp_zipcode; quit;
Composite Index

Combination of Simple and Composite Index

SAS is flexible. It allows you to create multiple indexes in the combination of simple and composite indexes. One dataset can have multiple simple and composite indexes. 

It can be created as exactly as you create them individually, the only thing is you create them together. 

For example, a simple index named ZIP and a composite index named StateCityZip can be created as follows:

data work.temp_zipcode (index=(ZIP StateCityZip=(State City ZIP)));
   set sashelp.ZIPCODE;
run;

proc contents data=work.temp_zipcode; quit;

The result of the proc contents shows the details and confirms that you have successfully created indexes on a sas dataset.

Combination of simple and composite Index

Unique Index

If you want to constraint the values on specific columns then defining a unique index can be a good choice. It doesn’t matter what index you create, it can be a simple index or composite index, you can define it as a unique index. 

It means duplicate values are allowed on a unique index. In case if you have duplicate values on columns in which you want to create a unique index then you have to first remove duplicates

Let’s try to create a unique index on a simple index “ZIP” and a composite index “StateCityZip” by adding keyword UNIQUE after forward slash (/).

data work.temp_zipcode (index=(ZIP / unique StateCityZip=(State City ZIP) / unique ));
   set sashelp.ZIPCODE;
run;

proc contents data=work.temp_zipcode; quit;

Observe the output of proc content where all the indexes are marked as UNIQUE INDEX=YES.

Unique Index in SAS

No Missing Values Index

With the nomiss option you can exclude blank values appearing on the selected columns for index. You can just mention the NOMISS option after forward slash (/) as demonstrated in the below example.

NOMISS option: It does not mean the missing values cannot be added to the data set. It implies the missing values cannot be added to the index.

data work.temp_zipcode (index=(ZIP / unique / nomiss StateCityZip=(State City ZIP) / unique /nomiss));
   set sashelp.ZIPCODE;
run;

proc contents data=work.temp_zipcode; quit;

The output of proc contents reports the following information about the index:

  • Index names (ZIP, StateCityZip)
  • Columns on which indexes are built (ZIP, State City ZIP)
  • Options that are in effect (unique =YES, nomiss = YES)
Unique No Miss Index in SAS

2. Create an Index on a New Dataset using the PROC SQL

This is another way to create an index on a sas dataset using the proc sql procedure. It is also fairly easy to create a simple and composite index using INDEX= option.

You have to use INDEX= option in between create table and AS clause in proc sql. You can also mention unique and nomiss options while creating an index.

The following example demonstrates each category on how you create indexes on sas dataset using proc sql.

Simple Index

A simple index can be created on one column and it has to be named with the same column name. 

Here you create a simple index named “ZIP” on column ZIP.

proc sql;
   create table work.temp_zipcode (index=(ZIP)) as
select * from sashelp.ZIPCODE;
quit;

proc contents data=work.temp_zipcode; quit;

The proc contents reports that a simple index ZIP has been created on sas dataset.

Simple index using proc sql in SAS

Composite Index

As you know already, composite index is created on multiple columns. You have to mention all the columns on which you want to define a composite index. 

With the proc sql method, you need to mention index= option followed by composite index name= followed by list of column names separated by space.

In the below example, a composite index StateCityZip created on the columns State, City, and ZIP. 

proc sql;
   create table work.temp_zipcode (index=(StateCityZip=(State City ZIP))) as 
select * from sashelp.ZIPCODE;
quit;

proc contents data=work.temp_zipcode; quit;
Composite index using proc sql in SAS

Combination of Simple and Composite Index

SAS is flexible and gives you the opportunity to create multiple indexes in one go. It allows you to create multiple indexes in the combination of simple and composite indexes. One dataset can have multiple simple and composite indexes. 

It can be created as exactly as you create them individually, the only thing is you create them together using index= options followed by index name and column list.

For example, a simple index named ZIP and a composite index named StateCityZip can be created as follows.

proc sql;
   create table work.temp_zipcode (index=(ZIP StateCityZip=(State City ZIP))) as 
select * from sashelp.ZIPCODE;
quit;

proc contents data=work.temp_zipcode; quit;

The out of the proc contents confirms that you have successfully created ZIP and StateCityZip indexes on a sas dataset work.temp_zipcode

Simple index using proc sql

Unique Index

If you want to constraint the values on specific columns then defining a unique index can be a good choice. It doesn’t matter what index you create, it can be a simple index or composite index, you can define it as a unique index. 

It means duplicate values are allowed on a unique index. You have to make sure you have unique values on columns or combinations of columns on which unique index you want to create.

Let’s try to create a unique index on a simple index “ZIP” and a composite index “StateCityZip” by adding keyword UNIQUE after forward slash (/).

proc sql;
  create table work.temp_zipcode (index=(ZIP /unique StateCityZip=(State City ZIP) /unique )) as
select * from sashelp.ZIPCODE;
quit;

proc contents data=work.temp_zipcode; quit;

The output of proc content shows that the indexes are marked as UNIQUE INDEX=YES.

Unique index using proc sql in SAS

No Missing Values Index

You can easily exclude blank values appearing on the index columns by using the NOMISS keyword. You can just mention the NOMISS option after forward slash (/) as demonstrated in the below example.

NOMISS option: It does not mean the missing values cannot be added to the data set. It implies the missing values cannot be added to the index.

proc sql;
   create table work.temp_zipcode (index=(ZIP /unique /nomiss StateCityZip=(State City ZIP) /unique/nomiss )) as
select * from sashelp.ZIPCODE;

quit;

proc contents data=work.temp_zipcode; quit;

The output of proc contents reports the following information about the index:

  • Index names (ZIP, StateCityZip)
  • Columns on which indexes are built (ZIP, State City ZIP)
  • Options that are in effect (unique =YES, nomiss = YES)
Unique No MISS index using proc sql in SAS

3. Create an Index on an existing dataset using PROC SQL

In the previous method you saw how to create an index on a new dataset using proc sql. Now you’ll learn how to create an index on an existing dataset using the PROC SQL procedure.

It is fairly straightforward to create an index on an existing dataset. In case if you’re creating a unique index then make sure you do not have duplicate values on that specified column otherwise you first delete duplicate data and then create a unique index.

Simple Index

In the proc sql you can define an index using keyword “CREATE INDEX” followed by index name and “ON” keyword followed by dataset name.

Here is an example to create a simple index on existing dataset work.temp_zipcode. For demonstration, we first create a copy of the dataset in the work library.

data work.temp_zipcode;
set sashelp.ZIPCODE;
run;



proc sql;
create index ZIP on work.temp_zipcode;
quit;

proc contents data=work.temp_zipcode; quit;
Create Simple index on existing dataset using proc sql in SAS

Composite Index

A composite index on existing dataset can be created in proc sql using the INDEX option. The only difference here is, while listing the column names add commas between them.

Let’s create a composite index named “StateCityZip” on columns State, City, Zip on existing dataset work.temp_zipcode

proc sql;
   create index StateCityZip on work.temp_zipcode (State, City, ZIP);
quit;

proc contents data=work.temp_zipcode; quit;
Create composite index on existing dataset using proc sql in SAS

Combination of Simple and Composite Index

Similarly, you can create a simple index and a composite index together in one proc sql on an existing dataset. 

Create ZIP – a simple index and StateCityZip –a composite index on work.temp_zipcode dataset using below code.

proc sql;

create index ZIP on work.temp_zipcode;
    create index StateCityZip on work.temp_zipcode (State, City, ZIP);

quit;

proc contents data=work.temp_zipcode; quit;
Combination of simple and composite index on existing dataset using proc sql in SAS

Unique Index

Using exactly the same as above code just by adding a unique keyword you can create a unique index on existing sas dataset. 

PS: Before creating a unique index on the dataset, make sure you have unique values on the specified column or group of columns.

In case of any duplicate data you have to first delete the duplicate rows before attempting to create a unique index on the existing dataset.

In this example we have unique values on the ZIP column as well as unique values on combination of State, City, ZIP, hence we’re good to go to create an unique index.

proc sql;

create unique index ZIP on work.temp_zipcode;
create unique index StateCityZip on work.temp_zipcode (State, City, ZIP);

quit;

proc contents data=work.temp_zipcode; quit;
Create unique index on existing dataset using proc sql in SAS

4. Create an Index on an existing dataset using PROC DATASET

This is another way to create an index on an existing dataset using PROC DATASET. This method is widely being used especially in the SAS DI STUDIO jobs. You can observe the code generated SAS DI studio job where most probably you can see the proc dataset is being used to create indexes.

Simple Index

In the proc dataset method you first mention library reference. A MODIFY keyword followed by dataset name to be added which tells SAS that on which dataset to create index.

And then “CREATE INDEX” statement followed by column name. For demonstration, we first create a copy of the dataset in the work library and then create a simple index on ZIP column.

data work.temp_zipcode;
set sashelp.ZIPCODE;
run;
proc datasets lib=work;
   modify temp_zipcode;
   index create ZIP;
run;

proc contents data=work.temp_zipcode; quit;
Simple index

Composite Index

proc datasets lib=work;

    modify temp_zipcode;

    index create ZIP;

    index create StateCityZip=(State City ZIP);

run;

proc contents data=work.temp_zipcode; quit;
Combination of simple and composite Index

Unique Index and No missing value index

Creating a unique index on existing data is fairly easy using proc dataset. You just have to mention the UNIQUE keyword at the end of the index create statement. To exclude blank values from index you need to mention nomiss keyword at the end. 

Let’s create ZIP –a simple unique index and StateCityZip –a composite unique index by using proc datasets.

proc datasets lib=work;

    modify temp_zipcode;

    index create ZIP /unique nomiss;

    index create StateCityZip=(State City ZIP) /unique nomiss;

run;

proc contents data=work.temp_zipcode; quit;
Unique No Miss Index in SAS

How to Remove Index in SAS

So far we have seen how you can create indexes in SAS using multiple methods. Now it’s time to learn how to delete indexes created on a sas dataset.

Before deleting an index on a sas dataset you should now index details available on that particular dataset. 

Know Index Details:

Use proc contents procedure to get all the details about indexes available on sas dataset. 

proc contents data=work.temp_zipcode; quit;

There are two ways you can delete indexes from a sas dataset. The first one is using proc dataset and second one is using proc sql.

Two ways to delete Index from a SAS dataset:

  1. PROC DATASET
  2. PROC SQL

Delete Index using PROC DATASET

Get the index name by running above proc contents query and list them after “INDEX DELETE” statements. 

proc datasets lib=work;

modify temp_zipcode;

index delete ZIP;

index delete StateCityZip;

run;

proc contents data=work.temp_zipcode; quit;
SAS delete index using proc dataset

Do you want to delete all the indexes available on a sas dataset? 

It’s even better.

You don’t need to mention the index names individually. You just use the _ALL_ option. SAS will delete all the available indexes from a dataset.

proc datasets lib=work;

modify temp_zipcode;

index delete _all_;

run;

proc contents data=work.temp_zipcode; quit;

Delete Index using PROC SQL

You can also use proc sql to delete indexes from a dataset. The only thing here is you have to list all selected indexes which you want to delete and add them after the DROP INDEX statement.

proc sql;

drop index ZIP on work.temp_zipcode;
drop index StateCityZip on work.temp_zipcode;

quit;

proc contents data=work.temp_zipcode; quit;
SAS drop index on sas dataset

Practical Tips for Creating Indexes

  1. The name of the simple index must match with the name of the column on which you want to create the index.
  2. The name of the composite index cannot be the same as the name of one of the columns in the table.
  3. If you use two columns to access data regularly, such as a First_Name column and a Last_Name column from an EMPLOYEE database, then you should create a composite index for the columns.
  4. Keep the number of indexes to a minimum to reduce disk space and update costs.
  5. Use indexes for queries that retrieve a relatively small number of rows (less than 15%).
  6. In general, indexing a small table does not result in a performance gain.
  7. In general, indexing on a column with a small number (less than 6 or 7) of distinct values does not result in a performance gain.
  8. For tables that have a primary key integrity constraint, do not create more than one index that is based on the same column as the primary key.

Free SAS Course

Learn SAS Code — The only programming course available on the internet which you need to master Data Analytics, Business Intelligence (BI) and Cloud technology.

100% free.

Unlock Free SAS Tutorials

FAQ

Why to create indexes on SAS dataset?

Indexes are created on SAS dataset to make data extraction query as quicker as possible. Though you have to have indexes created on the same columns on which your query apply filter and extract sub-set of a big dataset.

Is it compulsory to create index on SAS dataset?

No. It is not compulsory to create index on a SAS dataset but it’s highly recommended specially when you’re accessing a huge dataset. Data retrieval becomes faster if you have indexes created on the same dataset. 

What are the major benefits of creating an index?

Faster data extraction: Indexes are used to retrieve data from the database more quickly. It’s used to quickly locate data without having to search every row in a dataset every time a SAS dataset is accessed.

 

Indexes can be created using one or more columns of a dataset, providing the basis for both rapid random lookups and efficient access of ordered records.