In this article you’ll learn “everything” about the SAS Data sets in less than 5 minutes. You’ll also learn parts of the SAS Data set: data portion, descriptor portion, built-in data sets, special data sets, and SAS views.
What Is SAS Data Set?
A SAS data set is a SAS file stored in a SAS library that SAS creates and processes. A SAS data set contains data values that are organised as a table of observations (rows) and variables (columns) that can be processed by SAS software.
A SAS data set also contains descriptor information such as the data types and lengths of the variables, as well as which engine was used to create the data.
A SAS data set can be one of the following SAS data file or SAS view.
- SAS data file: It contains both the DATA and the DESCRIPTOR information. SAS data files have a member type of DATA.
- SAS view: It contains ONLY the DESCRIPTOR information that points to DATA stored elsewhere. SAS views have a member type of VIEW.
In the SAS world, rows are called as observations and columns are referred to as variables. Here’s what SAS data set looks like in SAS:
Column (or variable)
Each column represents a variable in the SAS table presentation. In the above image Name, Sex, Age, Height, and Weight are the columns or variables.
Observation (or rows)
Each row represents an observation in SAS dataset’s table presentation. In the above image “Name=Alfred, Sex=M, Age=14, Height=69, Weight=112.5” is referred to as the first observation.
Similarly, “Name=Alice, Sex=F, Age=13, Height=56.5, Weight=84” is referred to as the second observation, and so on.
The physically SAS data files are stored under the SAS library
Here is an example of how SAS data set files are stored under the BASE SAS library as data-set-name with the extension of [dot]sas7bdat files.
SAS Index (Optional)
An index is a separate file that you can create for a SAS data file in order to provide direct access to specific observations.
The index file has the same name as its data file and a member type of INDEX. Indexes can provide faster access to specific observations, particularly when you have a large data set.
Parts Of SAS Data Set
There are two parts of the SAS data set: Descriptor portion and Data portion.
1. Descriptor Portion
The descriptor information for a SAS data set makes the file self-documenting. That is, each data set can supply the attributes of the data set and of its variables.
Once the data is in the form of a SAS data set, you do not have to specify the attributes or the variables in your program statements. SAS obtains the information directly from the data set.
Descriptor information includes the number of observations, the observation length, the date that the data set was last modified, and other facts. Descriptor information for individual variables includes attributes such as name, type, length, format, label, and whether the variable is indexed.
You can easily print the descriptor information using the “proc contents” SAS procedure. Here is the illustration of how descriptor information looks like of a demo data set mylib.class
/* sas library */
libname mylib "/home/u61950255/data/mylib";
/* view descriptor information */
proc contents data=mylib.class;
run;
2. Data Portion
The data portion simply consists of all the data values of the SAS data set. The SAS data values are arranged in the form of a table where rows are referred to as observations and columns are referred to as variables.
Below is the illustration of the same with the demo data set MYLIB.CLASS.
/* view data portion */
proc print data=mylib.class;
quit;
In the above data set “Alfred”, “Alice”, “F”, “M”, “14”, “69”, “112.5”, “84” are the data values of a SAS data set named CLASS.
The Name, Sex, Age, Height, and Weight are the variables and one entire row with data values referred to as observation.
SAS Views
A SAS view is a type of SAS data set that retrieves data values from other files. A SAS view contains only descriptor information such as the data types and lengths of the variables (columns).
A SAS view also contains information that is required for retrieving data values from other SAS data sets or from files that are stored in other software vendors’ file formats. SAS views are of member type VIEW.
In most cases, you can use a SAS view as if it were a SAS data file. When you create a view it’s registered under SAS library and you can locate the view metadata with SAS-view-name with extension of [dot]sas7bvew file. Please note this is not a data file. It has only descriptor information.
SAS View: Descriptor and Data Portion
The descriptor and data portion information of SAS view are the same as SAS data set. The only difference here is how they both get stored physically and their data types. You can say, SAS views are part of or subset of SAS data sets.
The proc contents procedure can also be used on SAS views to print descriptor information of a SAS view. Here is the simplest example.
/* create a demo SAS view from SAS data set */
proc sql;
create view mylib.class_v as
select * from mylib.class;
quit;
/* print descriptor information */
proc contents data=mylib.class_v;
run;
Special SAS Data Sets
There are three types of special SAS data sets: Null Data Sets, Default Data Sets, and Automatic Naming Convention
1. Null SAS Data Sets
If you want to execute a DATA step but do not want to create a SAS data set, you can specify the keyword _NULL_ as the data set name.
The following statement begins a DATA step that does not create a data set:
data _null_;
Using _NULL_ causes SAS to execute the DATA step as if it were creating a new data set, but no observations or variables are written to an output data set.
This process is very efficient when you don’t need to create an output dataset, especially when you’re dealing with large data sets.
2. Default Data Sets
SAS remembers the most recently created (the last) SAS data set through the reserved name _LAST_.
When you execute a SAS DATA step or SAS PROC step without specifying an input data set, by default, SAS refers to the last created data set. It uses the _LAST_ data set.
3. Automatic Naming Convention
This feature is referred to as the DATAn naming convention.
If you do not specify a SAS data set name or the reserved name _NULL_ in a DATA statement while creating a data set, SAS automatically creates data sets with the names DATA1, DATA2, and so on, to successive data sets in the mentioned SAS library.
Here is the simple example that demonstrates how new SAS data sets are created through automatic naming convention.
/* Automatic Naming Convention */
/* it creates a new data set with name DATA1 */
data;
set sashelp.class;
run;
/* it creates another new data set with name DATA2 */
data;
set sashelp.class;
run;
SAS Built-In Data Sets
These Data Sets are already available in the installed SAS software. When you log in into SAS Studio, all the built-in data sets are default available to everyone. These data sets can be accessed through SAS library references.
Built-In Libraries With Data Sets:
- SASHELP
- MAPS
- MAPSSAS
- STPSAMP
You have the read only access, it means you can’t modify those data sets. You can read, analyse, perform calculations, and use it in the report.
Here’s how SAS built-in data sets look like and can be accessed at any time as soon as you initiate a SAS session.
FAQ – SAS Data Sets
A SAS data set is a structured file containing data organized in rows and columns, similar to a table or spreadsheet. It consists of one or more observations (rows) and variables (columns) that hold data values. SAS data sets serve as the fundamental unit for data storage, manipulation, and analysis within the SAS environment.
SAS Data Sets can be created in several ways. They can be generated by importing external data files into SAS, created through data manipulation and transformation using SAS procedures or DATA steps, or generated as output from statistical analyses or data processing tasks within SAS. Once created, SAS data sets can be modified, appended, or combined to suit analytical requirements.
A SAS data set comprises two main components: variables and observations. Variables represent the columns of the data set and hold the actual data values, while observations represent the rows and contain the values for each variable. Additionally, SAS data sets may include other metadata such as variable labels, formats, and informats, which provide additional information about the data contained within the data set.