How to decompose a URL into its component parts in Java?

My requirements are pretty simple, but I have a lot to do, so I'm looking for a reliable solution.

Is there a good lightweight library for decomposing URLs into their component parts in Java? I mean hostname, query string, etc.

+4
source share
5 answers

Take a look at java.net.URL . This has methods for exactly what you are trying to do.

Host Name: getHost()
Query string: getQuery()
Snippet / ref / anchor: getRef()
Path: getPath()

+3
source

I always forget the URI format, so here it is:

 <scheme>://<userinfo>@<host>:<port><path>#<fragement> 

And here is an example:

 URI uri = new URI ("query:// jeff@books.com :9000/public/manuals/appliances?stove#ge"); 

The following will happen:

  • uri.getAuthority() will return " jeff@books.com :9000"
  • uri.getFragment () will return "ge"
  • uri.getHost () will return "books.com"
  • uri.getPath () will return "/public/manuals/appliances"
  • uri.getPort () will return 9000
  • uri.getQuery () will return "stove"
  • uri.getScheme () will return "query"
  • uri.getSchemeSpecificPart () will return "// jeff@books.com :9000/public/manuals/appliances?stove"
  • uri.getUserInfo () will return "jeff"
  • uri.isAbsolute () will return true
  • uri.isOpaque () will return false

I found this blog handy: Exploring Java APIs: URIs and URLs

+5
source

java.net.URI and java.net.URL do not work for many modern URLs. java.net.URI adheres to RFC 2396, which is a really old standard. java.net.URL sometimes does a good job, but if you work with URLs found in the wild, this will fail for many cases.

To solve these problems, I wrote galimatias , a library for parsing and normalizing URLs for Java. It will work with almost any URL that you can imagine (basically, if it works in a web browser, the galimates will analyze it correctly). And it has a very convenient API.

You can get it at: https://github.com/smola/galimatias

+1
source

Look at the getter methods of the class.

You have everything you need.

0
source
 URL.getProtocol() URL.getHost() URL.getPort() 

And so on.

0
source

All Articles