How to make a substring for a UTF8 string in java?

Question

How to make a substring for a UTF8 string in java?

Suppose I have the following line: Rückruf ins Ausland I need to insert it into a database with a maximum size of 10. I executed a normal substring in java and extracted that Rückruf line in , which is 10 characters long . When he tries to insert this column, I get the following oracle error:

java.sql.SQLException: ORA-12899: value is too large for column "WAEL". "TESTTBL". "DESC" (actual: 11, maximum: 10) The reason for this is that the database has the AL32UTF8 character set, so ü will take 2 characters.

I need to write a function in java that executes this substring, but taking into account that ü takes 2 bytes, so the returned substring in this case should be Rückruf i (9 characters). Any suggestions?

+6

java substring oracle

Wael Jul 16 '15 at 13:32

source share

7 answers

You can calculate the correct length of a String in java by converting the string to an array of bytes.

See the following code for an example:

 System.out.println("Rückruf i".length()); // prints 9 System.out.println("Rückruf i".getBytes().length); // prints 10

If the current encoding is not UTF-8, replace the code:

 System.out.println("Rückruf i".length()); // prints 9 System.out.println("Rückruf i".getBytes("UTF-8").length); // prints 10

If necessary, you can replace UTF-8 with the encoding that you want to check for the length of the string in this encoding.

+2

Davide Lorenzo MARINO Jul 16 '15 at 13:39

source share

If it should be Java, you can parse the string in bytes and trim the length of the array.

  String s = "Rückruf ins Ausland"; byte[] bytes = s.getBytes("UTF-8"); byte[] bytes2 = new byte[10]; System.arraycopy(bytes, 0, bytes2, 0, 10); String trim = new String(bytes2, "UTF-8");

+2

Domk Jul 16 '15 at 13:57

source share

I think that the best option in this case would be a substring at the database level, with the Oracle SUBSTR function directly in SQL QUERY.

For instance:

 INSERT INTO ttable (colname) VALUES (SUBSTR( ?, 1, 10 ))

Where the exclamation mark indicates the SQL parameter sent through JDBC.

+1

aleroot Jul 16 '15 at 13:39

source share

The following are horribly indirectly moving around the entire line using the full Unicode code point, so also char pairs (surrogate codes).

 public String trim(String s, int length) { byte[] bytes = s.getBytes(StandardCharsets.UTF_8); if (bytes.length <= length) { return s; } int totalByteCount = 0; for (int i = 0; i < s.length(); ) { int cp = s.codePointAt(i); int n = Character.charCount(cp); int byteCount = s.substring(i, i + n) .getBytes(StandardCharsets.UTF_8).length; if (totalByteCount + byteCount) > length) { break; } totalByteCount += byteCount; i += n; } return new String(bytes, 0, totalByteCount); }

It can still be optimized.

+1

Joop eggen Jul 16 '15 at 14:09

source share

You need the encoding in the database to match the encoding for java strings. Alternatively, you can convert the string using something like this and get a length that matches the encoding in the database. This will give you the exact number of bytes. Otherwise, you still hope that the encodings will match.

  String string = "Rückruf ins Ausland"; int curByteCount = 0; String nextChar; for(int index = 0; curByteCount + (nextChar = string.substr(index,index + 1)).getBytes("UTF-8").length < trimmedBytes.length; index++){ curByteCount += nextChar.getBytes("UTF-8").length; } byte[] subStringBytes = new byte[10]; System.arraycopy(string.getBytes("UTF-8"), 0, subStringBytes, 0, curByteCount); String trimed = new String(subStringBytes, "UTF-8");

That should do it. In addition, shootln't truncate the multibyte character in the process. The database is assumed to be UTF-8 encoded. Another assumption is that the string should actually be truncated.

0

Carlos Bribiescas Jul 16 '15 at 13:42

source share

Hello to all ASCII characters less than 128. You can use the code below.

 public class Test { public static void main(String[] args) { String s= "Rückruf ins Ausland"; int length =10; for(int i=0;i<s.length();i++){ if(!(((int)s.charAt(i))<128)){ length--; } } System.out.println(s.substring(0,length)); } }

You can copy the paste and check if it meets the requirements of yuror or if something breaks somewhere.

0

Kulbhushan singh Jul 16 '15 at 13:49

source share

Giovanni · Accepted Answer · 2015-07-16T13:46:55+0000

If you want to trim data in Java, you should write a function that truncates the string using the db encoding used, something like this test case:

package test; import java.io.UnsupportedEncodingException; public class TrimField { public static void main(String[] args) { //UTF-8 is the db charset System.out.println(trim("Rückruf ins Ausland",10,"UTF-8")); System.out.println(trim("Rüückruf ins Ausland",10,"UTF-8")); } public static String trim(String value, int numBytes, String charset) { do { byte[] valueInBytes = null; try { valueInBytes = value.getBytes(charset); } catch (UnsupportedEncodingException e) { throw new RuntimeException(e.getMessage(), e); } if (valueInBytes.length > numBytes) { value = value.substring(0, value.length() - 1); } else { return value; } } while (value.length() > 0); return ""; } }

How to make a substring for a UTF8 string in java?

More articles: