How to find language from encoding in java

I have a component that should be able to parse and process any XML file specified by the user. The xml file may contain Timestamp values, such as March 12, 2012 5:00 pm. Therefore, the user must provide a Timestamp template acceptable to SimpleDataFormat. We use the template and SimpleDateFormat to analyze the timestamp values ​​as follows:

 SimpleDateFormat sdt = new SimpleDateFormat(inputTimestampPattern);
 Date date = sdt.parse(inputTimestampString);

But we get a ParseException, as shown below for one specific file.

java.text.ParseException: Unsurpassed date: "04-6 \ u57d6 -12 18.54: 57.169000 \ u548c \ u601c"

We got this exception when we launched the component in Japanese with the input file containing the Chinese Timestamp template. The JVM locale is Japanese, so SimpleDateFormat tries to parse the timestamp string suggesting Japanese, and fails. The xml file has encoding information similar to this:

  <?xml version="1.0" encoding="gbk"?>

If we somehow determine Locale from the encoding value, then we can create a local SimpleDateFormat object that will fix this problem. So my question is: can we get language information from the encoding? I do not ask for the exact language. Even if there is a way to get a small set of possible locales during encoding, I can try everything until one of them throws an exception. Is there an API in Java that helps here?

Or is there a better way to solve this problem?

+4
1

​​ XML, , , "encoding =" gbk "" - . , ,

0

All Articles