Parsing a URI node in Java when it contains umlauts

I am trying to parse a host from a URI containing the character "ü" in the host, like this:

String host = new java.net.URI("http://füllmethodentafel.de").getHost();

However, the host will be empty. It works with other URIs. Any ideas why this is not working?

+4
source share
1 answer

java.net.URIcan only handle URLs conforming to RFC 2396 . This RFC requires the following rules:

  hostport      = host [ ":" port ]
  host          = hostname | IPv4address
  hostname      = *( domainlabel "." ) toplabel [ "." ]
  domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
  toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

where alphanumbasically [a-zA-Z0-9]. Type characters are ünot included.

URIcan handle PunycodeURLs, for example http://www.xn--hostwith-e6a.com/, which is equivalent http://www.hostwithü.com/. Useful for this java.net.IDN.

String host = "www.hostwithü.com";
String toASCII = IDN.toASCII(host);
System.out.println(toASCII);
// www.xn--hostwith-e6a.com
+5
source

All Articles