Can an N function cause problems with existing queries?

We use Oracle 10g and Oracle 11g .

We also have a layer for automatically creating queries from pseudo-SQL code written in .net (something like SqlAlchemy for Python).

Currently, our layer wraps any string in single quotes ' and, if it contains characters other than ANSI, it automatically composes UNISTR with special characters written as unicode bytes (for example, \00E0 ).

Now we have created a method for performing several inserts with the following construction:
INSERT INTO ... (...) SELECT ... FROM DUAL UNION ALL SELECT ... FROM DUAL ...

This algorithm can produce queries in which the same string field is sometimes passed as 'my simple string' and sometimes UNISTR('my string with special chars like \00E0') as UNISTR('my string with special chars like \00E0') .

The described condition calls a ORA-12704: character set mismatch .

One solution is to use the INSERT ALL construct, but it is very slow compared to what is currently in use.

Another solution is to tell our layer to put N before any line (except for those that are already wrapped in UNISTR ). It's simple.

I just want to know if this could affect existing queries.

Note: all of our fields in the database are either NCHAR or NVARCHAR2 .


Oracle ref: http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch7progrunicode.htm

+7
oracle unicode bulkinsert
source share
3 answers

Basically what you are asking is there a difference between how the string is stored with or without function N.

You can simply check for yourself:

 SQL> create table test (val nvarchar2(20)); Table TEST created. SQL> insert into test select n'test' from dual; 1 row inserted. SQL> insert into test select 'test' from dual; 1 row inserted. SQL> select dump(val) from test; DUMP(VAL) -------------------------------------------------------------------------------- Typ=1 Len=8: 0,116,0,101,0,115,0,116 Typ=1 Len=8: 0,116,0,101,0,115,0,116 

As you can see the same as the side effect.

The reason it is so beautiful is due to the unicode elegance

If you're interested, this is a good video explaining this.

https://www.youtube.com/watch?v=MijmeoH9LT4

+2
source share

I assume you got the error "ORA-12704: character set mismatch" because your data inside quotation marks is considered char, but your fields are nchar, so char is mapped using different encodings, one uses NLS_CHARACTERSET , the other NLS_NCHAR_CHARACTERSET .

When you use the UNISTR function, it converts the data from char to nchar (in any case, which also converts encoded values ​​to characters) as Oracle docs say:

"UNISTR takes as argument a text literal or expression that resolves character data and returns it in a national character set."

If you explicitly convert values ​​using N or TO_NCHAR you get only NLS_NCHAR_CHARACTERSET values ​​without decoding. If you have values ​​encoded in this way "\00E0" , they will not be decoded and will be considered unchanged.

So, if you have an insert, for example:

  insert into select N'my string with special chars like \00E0', UNISTR('my string with special chars like \00E0') from dual .... 

your data in the first input field will be: 'my string with special chars like \00E0' not 'my string with special chars like à' . This is the only side effect that I know of. Other requests should already use the NLS_NCHAR_CHARACTERSET encoding, so this should not be a problem with an explicit conversion.

And by the way, why not just insert all the values ​​like N'my string with special chars like à' ? Just encode them in UTF-16 (I assume you are using UTF-16 for nchars) first if you use different encoding in top level software.

+1
source share
  • using the n function - you already have the answers.

If you have a chance to change the encoding of the database, it will really simplify your life. I was working on huge production systems and found a tendency for everyone to simply move to AL32UTF8 due to lack of storage space and the hassle of internationalization slowly becoming painful memories of the past.

It seemed to me that the easiest way is to use AL32UTF8 as the encoding of the database instance and just use varchar2 everywhere. We read and write standard Java unicode strings via JDBC as binding variables without any harm or script.

Your idea to build a huge SQL attachment text may not scale well for several reasons:

  • there is a fixed length of the maximum allowed SQL statement, so it will not work with 10,000 inserts
  • binding variables are recommended (and then you don't have n'xxx vs unistr mess either)
  • The idea of ​​creating a new SQL statement is dynamically very resource intensive. It does not allow Oracle to cache any execution plan for anything, and it will force Oracle to parse your looong statement every time it is called.

What you are trying to achieve is mass insertion. Use JDBC batch mode for the Oracle driver to do this at light speed, see for example: http://viralpatel.net/blogs/batch-insert-in-java-jdbc/

Note that insertion speed is also affected by triggers (which must be met) and foreign key constraints (which must be checked). Therefore, if you intend to insert more than a few thousand rows, consider disabling triggers and foreign key restrictions and enable them after insertion. (You will lose trigger calls, but checking constraints after insertion may affect.)

Also consider rollback segment size. If you insert a million records, this will require a huge rollback segment, which is likely to cause a major replacement on the media. This is a good rule to commit after every 1000 entries.

(Oracle uses version control instead of general locks, so a table with uncommitted changes is always readable. 1000 records record a speed of about 1 commit per second - slow enough to use write buffers, but fast enough to not interfere with other people, wanting to update the same table.)

-one
source share

All Articles