Java.net.URI pinches special characters in the main part

I have a URI string as shown below:

http: //www.christlichepartei%F6sterreichs.at/steiermark/

I create an instance of java.lang.URI with this line, and it succeeds, but when I want to get the host, it returns null. Opera and Firefox also choke on this URL if I enter it exactly as shown above. But should the URI class throw a URISyntaxException if it is not valid? How can I find that a URI is illegal then?

It also behaves the same when I decode a string using URLDecoder, which gives

http: // www .christlicheparteiösterreichs.at / steiermark /

Now it is accepted by Opera and Firefox, but java.net.URI still does not like it. How can I handle this URL?

thanks

+4
source share
3 answers

Java 6 has an IDN class for working with internationalized domain names. This creates a URI with an encoded host name:

 URI u = new URI("http://" + IDN.toASCII("www.christlicheparteiösterreichs.at") + "/steiermark/"); 
+3
source

The correct way to encode non-ASCII characters in host names is called "Punycode . "

+2
source

The URI throws a URISyntaxException when you select the appropriate constructor:

 URI someUri=new URI("http","www.christlicheparteiösterreichs.at","/steiermark",null); 

java.net.URISyntaxException: Invalid character in host name at index 28: http: // www .christlicheparteiösterreichs.at / steiermark

You can use IDN to fix:

 URI someUri=new URI("http",IDN.toASCII("www.christlicheparteiösterreichs.at"),"/steiermark",null); System.out.println(someUri); System.out.println("host: "+someUri.getHost())); 

Conclusion:

http://www.xn--christlicheparteisterreichs-5yc.at/steiermark

host: www.xn--christlicheparteisterreichs-5yc.at

UPDATE regarding chicken egg problem:

You can let the url do the job:

 public static URI createSafeURI(final URL someURL) throws URISyntaxException { return new URI(someURL.getProtocol(),someURL.getUserInfo(),IDN.toASCII(someURL.getHost()),someURL.getPort(),someURL.getPath(),someURL.getQuery(),someURL.getRef()); } URI raoul=createSafeURI(new URL("http://www.christlicheparteiösterreichs.at/steiermark/readme.html#important")); 

This is just a quick snapshot, it does not check all the problems associated with converting a URL to a URI. Use it as a starting point.

+2
source

All Articles