Jsoup connect does not work correctly when the link has Turkish letters

I am using Jsoup to get html from websites. I use

 String url="http://www.example.com"; Document doc=Jsoup.connect(url).get(); 

this code to get html. But when I use some Turkish letters in the link, for example:

 String url="http://www.example.com/?q=Türkçe"; Document doc=Jsoup.connect(url).get(); 

Jsoup sends the request as follows: "http://www.example.com/?q=Trke"

So, I cannot get the correct result. How can I solve this problem?

+6
source share
3 answers

A working solution , if the encoding is UTF-8 , just use

 Document document = Jsoup.connect("http://www.example.com") .data("q", "Türkçe") .get(); 

with the result

 URL=http://www.example.com?q=T%C3%BCrk%C3%A7e 

For custom encoding, this can be used:

 String encodedUrl = URLEncoder.encode("http://www.example.com/q=Türk&#231e", "ISO-8859-3"); String encodedBaseUrl = URLEncoder.encode("http://www.example.com/q=", "ISO-8859-3"); String query = encodedUrl.replace(encodedBaseUrl, ""); Document doc= Jsoup.connect("http://www.example.com") .data("q", query) .get(); 
+5
source

Unicode characters are not allowed in URLs as per specification . We are used to seeing them, because browsers display them in address bars, but they do not go to servers.

You must url your path before passing it to JSoup . Jsoup.connect("http://www.example.com").data("q", "Türkçe") , as suggested by MariuszS, does this only

+2
source

I found this on google: http://turkishbasics.com/resources/turkish-characters-html-codes.php Perhaps you can add it like this:

  String url="http://www.example.com/?q=Türk&#231e"; Document doc=Jsoup.connect(url).get(); 
+1
source

All Articles