How To Change Encoding (WLATIN1 to UTF-8) In SAS

SAS reads and writes external files using the current session encoding. SAS assumes that the external file has the same encoding as the session encoding.

But sometimes you get some weird characters because of the encoding mismatch of the systems. 

UTF-8 is part of the Unicode coded character set. UTF-8 is the preferred and most-used encoding, and it is the recommended encoding for using Unicode with operating systems like Linux.

You can migrate your data from SAS using other encoding such as WLATIN1 to SAS using UTF-8 encoding in order to support multilingual data and to support SAS Viya.

How To Check Encoding Set In SAS 

The ENCODING system option is set explicitly in all SAS Foundation sasv9.cfg configuration files.

You can also use PROC OPTIONS to print locale and encoding details. This procedure prints a lot of information but we are only interested in knowing the encoding hence you can use the group=languagecontrol option.

				/* check encoding in sas*/

proc options group=languagecontrol;
run;
			
How To check encoding in SAS

Encoding Of A SAS Dataset

You can also check the encoding set on any specific dataset by using PROC CONTENTS procedure. It prints an encoding value set on that particular dataset.

You can run the below code to know what encoding has been set on the sashelp.class dataset.

				/* check encoding set on sashelp.class dataset */

proc contents data=sashelp.class; 
run;
			
How To check encoding of SAS dataset using proc contents

How To Change Encoding Of SAS Dataset

The encoding of a SAS dataset can be set while creating the dataset using the data step. You can use the encoding=option in the data step while creating a dataset.

The following sample dataset work.class will be created to demonstrate how to change the encoding of a SAS dataset.

				/* create sample dataset */
data work.class;
	set sashelp.class;
run;

/* check encoding set on work.class dataset*/
proc contents data=work.class; 
run;

			
SAS dataset to change encoding in SAS

Use encoding= option in data statement to change encoding

In the above output you can see the “utf-8 Unicode (UTF-8)” encoding is present on the work.class dataset. 

Now you can change the UTF-8 encoding to WLATIN1 for dataset work.class by recreating it using the encoding option in the data step.

Here is the code that changes encoding from UTF-8 to WLATIN1.

				/*use encoding= option in data statement */
data work.class (encoding='wlatin1');
	set sashelp.class;
run;

/* check encoding */
proc contents data=work.class; 
run;
			
Change Encdoing from UTF-8 to WLATIN1 on SAS dataset

How To Change Encoding of Character Data Values

If you see any weird characters in your dataset variable values or macro variable values derived from the dataset, it’s worth checking out the encoding used in SAS. 

The common encodings for Western European character data are ISO 8859-1 (LATIN1) and Windows cp1252 (WLATIN1).

If your character data is encoded as LATIN1 or WLATIN1, there are several factors that you must consider when you migrate to UTF-8. This section focuses on WLATIN1 characters, which are a superset of LATIN1 characters.

WLATIN1 Code Page

For errors or incorrect data that result from migrating characters that are 1 byte in WLATIN1 but 2–4 bytes in UTF-8, you can use the KCVT function to convert the data values into UTF-8. 

From the above information I know the character ‘ø’ is a WLATIN1 character which can be converted into UTF-8 by using the KCVT function. You can use the function KCVT to do the transcoding for the data. 

The following example converts ‘ø’ which is in WLATIN1 into UTF-8

				/* convert WLATIN1 char values into UTF-8 */

 data _null_;
    text = kcvt('ø', 'wlatin1', 'utf-8');
    put "converted value(wlatin1 to utf-8): " text;
 run; 
			
Char Values Converted From LATIN1 to UTF-8

Similarly you can convert the UTF-8 char value “ø” into WLATIN1 using the following code.

				/* convert UTF-8 char values into WLATIN1 */ 

  data _null_;
    text = kcvt('ø', 'utf-8', 'wlatin1');
    put "converted value(utf-8 to wlatin1): " text;
 run; 
			
Char Values Converted From UTF-8 to LATIN1

How To Change Encoding For Entire Variable Values In SAS

If you have a situation where character variables are getting some characters which are not matching in your system encoding then some different and weird char appears.

It’s always recommended to apply the KCVT function on an entire variable to convert from one encoding to another.

In the below example we are converting character variable values from WLATIN1 To UTF-8.

If you’re not sure on which variable to apply the KCVT function, I would recommend applying on all the character variables which you think possibly you could get different encoded characters in the SAS dataset.

In the following example let’s say, you are getting some other encoded characters on name variable in sashelp.class dataset. 

Now you want to have a separate dataset by correcting the encoding issue. You can apply KCVT function on “name” variable to convert wlatin1 encoding to utf-8. 

				 /* How To Change Encoding for entire variable in dataset */

data cars;
	set sashelp.class;
	name=kcvt(name,'wlatin1', 'utf-8' );
run;

			

FAQ – How To Change Encoding In SAS

What is encoding in SAS? 

Encoding in SAS refers to the set of rules that determine how data is stored and interpreted as readable characters.

Why would I need to change encoding in SAS? 

Changing encoding in SAS can be necessary when working with data in different languages or when moving data between systems with different default encodings.

What is WLATIN1 and UTF-8 encoding in SAS? 

WLATIN1 is a type of encoding in SAS that supports Western European languages, while UTF-8 is a universal encoding that supports all languages.

Can I change the encoding of an existing SAS dataset? 

Yes, you can change the encoding of an existing SAS dataset by creating a new dataset with the desired encoding.

Does changing encoding in SAS affect my data? 

Changing encoding in SAS can affect your data. It’s important to ensure that the new encoding can accurately represent all the characters in your data.