java.net.URI and java.net.URL do not work for many modern URLs. java.net.URI adheres to RFC 2396, which is a really old standard. java.net.URL sometimes does a good job, but if you work with URLs found in the wild, this will fail for many cases.
To solve these problems, I wrote galimatias , a library for parsing and normalizing URLs for Java. It will work with almost any URL that you can imagine (basically, if it works in a web browser, the galimates will analyze it correctly). And it has a very convenient API.
You can get it at: https://github.com/smola/galimatias
smola source share