WebClient.DownloadString results in garbled characters due to encoding problems, but the browser is fine.

The following code:

var text = (new WebClient()).DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20")); 

results in a text variable that contains, among other things, a string

"$ κ $ - Minkowski space, scalar field and the Lorentz invariance problem"

However, when I visit this URL in Firefox, I get

$ ΞΊ $ -Mink space, scalar field, and the Lorentz invariance problem

which is actually correct. I also tried

 var data = (new WebClient()).DownloadData("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"); var text = System.Text.UTF8Encoding.Default.GetString(data); 

but it gave the same problem.

I don’t know where to blame. Is the feed encoded in UTF8 and the browser is smart enough to understand this, but not WebClient ? Is UTF8 encoding correct, but does WebClient not work correctly? What can I do to reduce this?

+69
unicode utf-8 webclient
Aug 21 '11 at 8:10
source share
1 answer

He does not lie. Before calling DownloadString, you must first set the Webclient encoding.

 using(WebClient webClient = new WebClient()) { webClient.Encoding = Encoding.UTF8; string s = webClient.DownloadString("http://export.arxiv.org/api/query?search_query=au:Freidel_L*&start=0&max_results=20"); } 

As for why your alternative is not working, this is due to misuse. It should be:

 System.Text.Encoding.UTF8.GetString() 
+150
Aug 21 '11 at 11:31
source share



All Articles