Oracle, Unicode and NVARCHAR2, oh my

I'm the first to admit that I'm by no means an Oracle guru.  I'm familiar enough to connect to the database and use it, but don't ask me any design or configuration information that applies specifically ro Oracle (as opposed to SQL databases in general).  As a result, I wouldn't normally comment on anything associated with Oracle, for fear of looking plain silly.  Still, this little tidbit caught more than just me, so I feel a little safer.

I was trying to insert a string into an NVARCHAR2(64) database field.  The statement itself was pretty straightforward (and modified slightly, for simplicity).

INSERT INTO table (key, stringfield) VALUES (1, 'this is a string longer than 32 characters')

Since the code was coming from .NET, I wasn't surprised that the values being inserted were Unicode.  What did surprise me was the 'value too long' exception that was thrown when this statement was executed.  It seems that when you define an Oracle field as NVARCHAR2(64), you are really only allowing for a maximum of 32 Unicode characters.  In order words, the 64 is the number of bytes taken up by the field, not the number of Unicode characters that can be stored.  Talk about a mental disconnect.  I took about 15 minutes longer to address the problem then I might have while made sure I wasn't missing something obvious.  So consider yourself warned.

Comments

  • bruce October 1, 2004 12:22 PM

    Actually sometimes an NVARCHAR(60), say, will only allow 20 characters. It depend, I think, in what encoding Oracle is using internally for the characters, but with UTF-8 many characters (i.e. CJK chars) need 3 bytes each.

  • bruce January 11, 2005 8:00 PM

    Erm, using Oracle 10g, I can confirm that nvarchar2(100) gives you up to 100 characters (200 bytes). Can't say if this is also true in 9i though

    Re: previous poster: Oracle uses 2-bytes internally AFAIK - it does not use UTF-8 for internal storage (something to do with collating makes the 2-byte option compelling, brain too faded to recall now though)

  • bruce November 24, 2005 4:48 AM

    To overcome this, you have to modify your create statement:
    create table table (
    key number(10,0),
    nvarchar2(64 char)
    )

    Please note the word CHAR after 64. It tells Oracle you want to be 64 characters in length, regardless of how many bytes the encoding of the characters will take.

  • bruce November 24, 2005 8:28 AM

    You're assuming, of course, that I *get* to make those choices. :)

  • bruce January 4, 2006 8:54 PM

    Question: if the characterset is using full 4-byte representations of characters, then would you need to define NVARCHAR2(128) for 32 characters?

  • bruce January 5, 2006 12:51 AM

    From everything I've seen, the answer is 'yes'. The characterset defines the size of each character and the length of the NVARCHAR2 definition needs to be adjusted accordingly.

  • bruce January 19, 2006 11:09 AM

    To determine how length will be computed, check the appropriate parameter in the NLS_DATABASE_PARAMETERS table. If you ask "select value from NLS_DATABASE_PARAMETERS where parameter = 'NLS_LENGTH_SEMANTICS' " and the result is BYTE, then your answer is correct, I think. Of course, if the semantics are CHARACTER instead, then that is another story.

  • bruce January 19, 2006 11:30 AM

    This *may* help clarify (or not). I discovered this seemingly relevant text within the following URL:

    http://www.oracle.com/technology/oramag/oracle/03-nov/o63tech_glob.html

    (shamelessly copied from and willingly attributed to the original authors )

    To make it easy to allocate proper storage for Unicode values, Oracle9i Database introduced character semantics. You can now use a declaration such as VARCHAR2(3 CHAR), and Oracle will set aside the correct number of bytes to accommodate three characters in the underlying character set. In the case of AL32UTF8, Oracle will allocate 12 bytes, because the maximum length of a UTF-8 character encoding is four bytes (3 characters * 4 bytes/character = 12 bytes). On the other hand, if you're using AL16UTF16, in which case your declaration would be NVARCHAR2(3), Oracle allocates just six bytes (3 characters * 2 bytes/character = 6 bytes). One difference worth noting is that for UTF-8, a declaration using character semantics allows enough room for surrogate characters, whereas for UTF-16 that is not the case. A declaration such as NVARCHAR2(3) provides room for three UTF-16 code units, but a single supplementary character may consume two of those code units.

  • bruce March 29, 2006 3:48 AM

    Hmm,
    I got the following problem:

    My column definition says NVARCHAR2(100).

    I use JDBC to insert and retrieve a JAVA String into and from the column.

    Code executes correctly (no runtime errors). However, sometimes, when I compare the inserted and retrieved Java Strings I get .equals() returning false. NOTE: I am using Java Strings with 'strange' characters, e.g. characters outside ASCII set.

    Perhaps a NVARCHAR2(100 CHAR) will do... I will check this

  • bruce March 31, 2006 8:11 AM

    How can I retrieve what encoding my Oracle uses?

  • bruce August 3, 2006 5:00 AM

    Hi i tried enter some chineese character in my oracle database... my oracle version is 10g characterset is
    NLS_NCHAR_CHARACTERSET
    AL16UTF16

    but these chineese characters are not stored as it is.. instead it shows "??????"

    can anyone clarify this..........

  • bruce September 11, 2006 9:21 AM

    I am facing a similar problem. I have the datatype nvarchar2(2000) without the 'char' keyword mentioned. It only allows me to insert 1000 characters. But when I use concatenation i.e sql 'CONCAT' function it does allow me to insert more than 1000 characters. Database is Oracle 9.The storage is in terms of bytes as found by using the select value from NLS_DATABASE_PARAMETERS where parameter = 'NLS_LENGTH_SEMANTICS' query which returned BYTE.

  • bruce September 22, 2006 1:37 AM

    I can create table like this

    CREATE TABLE SAN_ID(ID NVARCHAR2(20 CHAR));

    I AM USING ORACLE 10G 10.1.0.2

  • bruce September 22, 2006 1:37 AM

    Sorry it would be .....

    I can not create table like this

    CREATE TABLE SAN_ID(ID NVARCHAR2(20 CHAR));

    I AM USING ORACLE 10G 10.1.0.2

  • bruce September 26, 2006 8:05 AM

    Stored it as a BLOB or CLOB

  • bruce September 26, 2006 8:05 AM

    Stored it as a BLOB or CLOB

  • bruce September 28, 2006 1:27 AM

    Is it possible to create table in oracle 10g 10.1.0.2 like below...

    CREATE TABLE SAN_ID(ID NVARCHAR2(20 CHAR));

  • bruce November 21, 2006 6:09 AM

    There is an interesting article on all that unicode stuff at:

    http://www.oracle.com/technology/tech/globalization/pdf/TWP_AppDev_Unicode_10gR2.pdf

    The main point here is that Oracle Client does some conversions if the client character set (NLS-parameter) differs from the database character set (set up at datbase creation time).

    hth, Walter

Leave a Comment

(required) 
(optional)
(required) 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS