How to use file_get_contents () with non-English characters in a URL?

Question

How to use file_get_contents () with non-English characters in a URL?

I get this error when I try to access non-English (Unicode) URLs using the PHP file_get_contents() function. The URL was: http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE%E0%B5% 86% E0% B4% B8% E0% B5% 8D% E0% B4% B8% E0% B4% BF

I have this error:

Warning: file_get_contents (http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE% E0% B5 % 86% E0% B4% B8% E0% B5% 8D% E0% B4% B8% E0% B4% BF) [function.file-get-contents]: stream failed to open: HTTP request failed! HTTP / 1.0 403 Forbidden ..
Fatal error: call of find () member function for non-object in G: \ xampp \ htdocs \ codes \ htmlParse1.php on line 8

Is there a restriction on the file_get_contents() function? Does it use only English URLs?

+4

url php unicode file-get-contents

Jenson m john Jan 20 '13 at 18:59

source share

2 answers

If 403 Forbidden exists, the connection should work. This is just a warning that the web server responded with a status code of 403. Wikipedia refuses to download without a valid user agent:

Scenarios should use the User-Agent informative string with contact information or they may be blocked by IP without notification.

The second error should consist of the following lines processing the result (String object) of your call to file_get_contents(...) .

Edit:. Try setting up your user agent, for example. ini_set('user_agent', 'wikiPHP'); before executing the request. This should work fine.

+1

Concurrenthashmap Jan 20 '13 at 19:25

source share

Baba · Accepted Answer · 2013-01-20T20:09:27+0000

You are missing header information, such as a user agent. I would advise you to just use Just use curl

 $url = 'http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE%E0%B5%86%E0%B4%B8%E0%B5%8D%E0%B4%B8%E0%B4%BF'; $ch = curl_init($url); // initialize curl handle curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17"); curl_setopt($ch, CURLOPT_REFERER, "http://ml.wikipedia.org"); curl_setopt($ch, CURLOPT_ENCODING, "UTF-8"); $data = curl_exec($ch); print($data);

Live Curl Demo

If you must use file_get_content

 $options = array( 'http'=>array( 'method'=>"GET", 'header'=>"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n" . "Cookie: centralnotice_bucket=0-4.2; clicktracking-session=M7EcNiC2Zcuko7exVGUvLfdwxzSK3Boap; narayam-scheme=ml\r\n" . "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17" ) ); $url = 'http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B4%AF%E0%B4%A3%E0%B5%BD_%E0%B4%AE%E0%B5%86%E0%B4%B8%E0%B5%8D%E0%B4%B8%E0%B4%BF'; $context = stream_context_create($options); $file = file_get_contents($url, false, $context); echo $file ;

Live file_get_content Demo

How to use file_get_contents () with non-English characters in a URL?

More articles: