Character occupies 6 bytes

We are trying to save the line below, which is actually a name in db, we make some api call and get this name:

株式会社 エ ス · ダ ブ リ ュ ー · コ ミ ュ ニ ケ ー シ ョ ン ズ

When saving through our code (as in servlet-hibernate-database), we get an error message:

Caused by: java.sql.BatchUpdateException: ORA-12899: value too large for column "NAME_ON_ACCOUNT" (actual: 138, maximum: 100) 

it's 23 characters, but it looks like it takes 6 bytes per character, which would make it only 138.

Below code gives me 69:

 byte[] utf8Bytes = string.getBytes("UTF-8"); System.out.println(utf8Bytes.length); 

And that gives me 92:

 byte[] utf8Bytes = string.getBytes("UTF-32"); System.out.println(utf8Bytes.length); 

I will definitely check the NLS_CHARACTERSET and look at the I / O classes, but have you ever seen a character with 6 bytes? Any help would be greatly appreciated.

+6
source share
2 answers

It probably contains HTML objects in a string. Like 燃 or possibly a URL style, %8C%9A . Or maybe UTF7, for example [Ay76b . (I have compiled these values, but your actual ones will be similar). It is always painful to rely on a character-encoded structure because its authors were most likely American or European, both of which are fairly simple ANSI, where one byte equals one character. If you manage to understand your encoding and convert it to real UTF8 or even UTF16, in this particular case it will take up less space.

+3
source

You probably literally have:

 \u682a\u5f0f\u4f1a\u793e\u30a8\u30b9\u30fb\u30c0\u30d6\u30ea\u30e5\u30fc\u30fb\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u30ba 

Cm:

 "\u682a\u5f0f\u4f1a\u793e\u30a8\u30b9\u30fb\u30c0\u30d6\u30ea\u30e5\u30fc\u30fb\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u30ba".length(); //23, or 69 UTF-8 bytes 

Vs:

 "\\u682a\\u5f0f\\u4f1a\\u793e\\u30a8\\u30b9\\u30fb\\u30c0\\u30d6\\u30ea\\u30e5\\u30fc\\u30fb\\u30b3\\u30df\\u30e5\\u30cb\\u30b1\\u30fc\\u30b7\\u30e7\\u30f3\\u30ba".length(); //138, or 138 UTF-8 bytes 
0
source

All Articles