Unicode Characterset in Oracle

Before starting this post let’s have an idea about unicode. Unicode is a Universal encoding scheme which is designed to include far more characters than the normal character set, in fact, Unicode wants to be able to list ALL characters. So, with unicode support in oracle data from any languages can be stored and retrieved from oracle.
Oracle supports unicode within many of the character sets starting from Oracle 7.
Below is the list of character sets that is used to support unicode in oracle.

 

Database Character Set Statement of Direction

A list of character sets has been compiled in Table A-4, "Recommended ASCII Database Character Sets" and Table A-5, "Recommended EBCDIC Database Character Sets" that Oracle Corporation strongly recommends for usage as the database character set. Other Oracle-supported character sets that do not appear on this list can continue to be used in Oracle Database 10g Release 2, but may be desupported in a future release. Starting with the next major functional release after Oracle Database 10g Release 2, the choice for the database character set will be limited to this list of recommended character sets for new system deployment. Customers will still be able to migrate their existing databases in the next major functional release after Oracle Database 10g Release 2 even if the character set is not on the recommended list. However, Oracle suggests that customers migrate to a recommended character set as soon as possible. At the top of the list of character sets Oracle recommends for all new system deployment is the Unicode character set AL32UTF8.

Choosing Unicode as a Database Character Set

Oracle Corporation recommends using Unicode for all new system deployments. Migrating legacy systems eventually to Unicode is also recommended. Deploying your systems today in Unicode offers many advantages in usability, compatibility, and extensibility. Oracle Database’s comprehensive support enables you to deploy high-performing systems faster and more easily while utilizing the advantages of Unicode. Even if you do not need to support multilingual data today or have any requirement for Unicode, it is still likely to be the best choice for a new system in the long run and will ultimately save you time and money as well as give you competitive advantages. See Chapter 6, "Supporting Multilingual Databases with Unicode" for more information about Unicode.

Choosing a National Character Set

A national character set is an alternate character set that enables you to store Unicode character data in a database that does not have a Unicode database character set. Other reasons for choosing a national character set are:

  • The properties of a different character encoding scheme may be more desirable for extensive character processing operations.

  • Programming in the national character set is easier.

SQL NCHAR, NVARCHAR2, and NCLOB datatypes have been redefined to support Unicode data only. You can use either the UTF8 or the AL 16UTF16 character set. The default is AL16UTF16.

 

Character Sets

http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#i635016

Oracle-supported character sets are listed in the following sections according to three broad categories.

In addition, common character set subset/superset combinations are listed. Some character sets can only be used with certain data types. For example, the AL16UTF16 character set can only be used as an NCHAR character set, and not as a database character set.

 

Can You use AL16UTF16 as NLS_CHARACTERSET?

No, AL16UTF16 can only be used as NLS_NCHAR_CHARACTERSET in 9i and above. Trying to create a database with  a AL16UTF16 NLS_CHARACTERSET will fail.

(Source of this answer is REPETTAS WORDPRESS BLOG )

 

1) AL24UTFFSS:

This character set was the first Unicode character set supported by Oracle. The AL24UTFFSS encoding scheme was based on the Unicode 1.1 standard, which is now obsolete. This unicode character set was used between oracle version 7.2 to 8.1.

 

2) UTF-8:

UTF8 was the UTF-8 encoded character set in Oracle8 and 8i. It followed the
Unicode 2.1 standard between Oracle 8.0 and 8.1.6, and was upgraded to Unicode
version 3.0 for oracle versions 8.1.7, 9i, 10g and 11g. If supplementary characters are inserted into in a UTF8 database encoded with Unicode version 3.0, then the actual data will be treated as 2 separate undefined characters, occupying 6 bytes in storage. So for fully support of supplementary characters use AL32UTF8 character set instead of UTF8.

3) UTFE:

UTFE has the same properties as UTF8 on ASCII based platforms. As of UTF8 it is used in different oracle versions.

 

4) AL32UTF8:

This is the UTF-8 encoded character set introduced in Oracle9i.
In Oracle 9.2 AL32UTF8 implemented unicode 3.1,
in 10.1 it implemented the Unicode 3.2 standard,
in Oracle 10.2 it supports the Unicode 4.01 standard and
in Oracle 11.1 it supports the Unicode 5.0.
AL32UTF8 was introduced to provide support for the newly defined supplementary characters. All supplementary characters are stored as 4 bytes in AL32UTF8. As while designed UTF-8 there was no concept of supplementary characters therefore UTF8 has a maximum of 3 bytes per character.

5) AL16UTF16: This is the first UTF-16 encoded character set in Oracle. It was introduced in Oracle9i as the default national character set (NLS_NCHAR_CHARACTERSET). It also provides support for the newly defined supplementary characters. All supplementary characters are stored as 4 bytes.
As with AL32UTF8, the plan is to keep enhancing AL16UTF16 as
necessary to support future version of the Unicode standard.
AL16UTF16 cannot be used as a database character set (NLS_CHARACTERSET), it is only used as the national character set (NLS_NCHAR_CHARACTERSET).
Like, AL32UTF8 In Oracle 9.0 AL16UTF16 implemented unicode 3.0,
in Oracle 9.2 it implemented unicode 3.1,
in 10.1 it implemented the Unicode 3.2 standard,
in Oracle 10.2 it supports the Unicode 4.01 standard and
in Oracle 11.1 it supports the Unicode 5.0.

Reblog this post [with Zemanta]
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: