How to parse URIs like this in Java

I am trying to parse the following URI: http://translate.google.com/#zh-CN | en |

but received the following error message:

java.net.URISyntaxException: Illegal character in fragment at index 34: http://translate.google.com/#zh-CN|en|你 at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parse(URI.java:3028) 

He has a problem with "|" character, if I get rid of "|", the last Chinese char does not cause any problems, what is the correct way to handle this?

My method is as follows:

  public static void displayFileOrUrlInBrowser(String File_Or_Url) { try { Desktop.getDesktop().browse(new URI(File_Or_Url.replace(" ","%20").replace("^","%5E"))); } catch (Exception e) { e.printStackTrace(); } } 

Thanks for the answers, but the BalusC solution seems to work only for the url instance, my method should work with any url I pass to it, as if it knew where the starting point is to cut the url into two parts and only encode the second part?

+7
java uri parsing
source share
7 answers

The pipe symbol is considered unsafe for use in URLs. You can fix this by replacing | with its encoded hex equivalent, which will be "% 7C"

However, replacing individual characters in a URL is a fragile solution that doesn’t work very well given that there can potentially be many different characters in any given URL that may need to be replaced. You are already replacing spaces, quotes, and pipes ... but what about brackets and accents and quotation marks? Or question marks and ampersands, which may or may not be real parts of the URL, depending on how they are used?

Thus, the best solution would be to use a language tool to encode URLs rather than manually executing it. For Java, use URLEncoder , as shown in the example in BalusC's answer to this question.

+13
source share

URLEncoder's solution does not work for me, perhaps because it encodes only everything. I tried to use apache HttpGet and it throws an error with the url as a string encoded this way.

In my case, this strange code was correct:

 URL url = new URL(pageURLAsUnescapedString); URI uri = new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), url.getRef()); 

Somehow url.toURI does not work like that. URI constructors work in two ways: if you use one with one String parameter, the constructor pretends that the provided uri is correctly escaped (and therefore an error, the same thing happens with the String HttpGet constructor); if you use a multi-line URI constructor for multiple lines, then the class handles all unrecoverable very well (and HttpGet has a different constructor that accepts a URI). Why does URL.toURI () not do this? I have no clue ...

Hope this helps someone, it took me a few hours to figure this out.

+10
source share

Isn't it better to use URLEncoder than to selectively encode material?

+7
source share

You must use java.net.URLEncoder to url the request using UTF-8 . You do not need a regular expression for this. You don’t want the regex to cover all these thousands of Chinese characters, right?;)

 String query = URLEncoder.encode("zh-CN|en|你", "UTF-8"); String url = "http://translate.google.com/#" + query; Desktop.getDesktop().browse(new URI(url)); 
+6
source share

Given Federico's answer and Marek's answer , you need to do the following:

 URL url = new URL(pageURLAsUnescapedString); // URI constructor expects the path, query string and fragment to be decoded. // If we do not decode them, we will end up with double-encoding. String path = url.getPath(); if (path != null) path = URLDecoder.decode(path, "UTF-8"); String query = url.getQuery(); if (query != null) query = URLDecoder.decode(query, "UTF-8"); String fragment = url.getRef(); if (fragment != null) fragment = URLDecoder.decode(fragment, "UTF-8"); URI uri = new URI(url.getProtocol(), url.getAuthority(), path, query, fragment); 
+3
source share

Encode your url first, please use the following example, then pass the url to the method

  JSONObject json = new JSONObject(); json.put("name", "vaquar"); json.put("age", "30"); json.put("address", "asasbsa bajsb "); System.out.println("in sslRestClientGETRankColl"+json.toString()); String createdJson=json.toString(); createdJson= URLEncoder.encode(createdJson, "UTF-8"); 

// method call now displayFileOrUrlInBrowser (createdJson);

 public static void displayFileOrUrlInBrowser(String File_Or_Url) { try { Desktop.getDesktop().browse(File_Or_Url); } catch (Exception e) { e.printStackTrace(); } } 
0
source share

Ok, I found how to do this, for example:

 try { Desktop.getDesktop().browse(new URI(File_Or_Url.replace(" ","%20").replace("^","%5E").replace("|","%7C"))); } catch (Exception e) { e.printStackTrace(); } 
-one
source share

All Articles