Prevent downloading from a remote source if the file size is larger than the specified size

Say I want XML files to be downloaded only from up to 10 MB from a remote server.

Something like

$xml_file = "http://example.com/largeXML.xml";// size= 500MB //PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB /*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/ $dom = new DOMDocument(); $dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/); 

How can this be achieved? .... Any idea or alternative? or the best approach to achieve this will be greatly appreciated.

I checked PHP: remote file size without downloading the file , but when I try something like

 var_dump( curl_get_file_size( "http://www.dailymotion.com/rss/user/dialhainaut/" ) ); 

I get string 'unknown' (length=7)

When I try to use get_headers as suggested below, Content-Length is not in the headers, so this will not work reliably either.

Please advise how to determine length and not send it to DOMDocument if it exceeds 10MB

+2
php domdocument filesize
Apr 21 '16 at 6:30
source share
3 answers

Well, finally, we work. Heading resolution will obviously not work in general. In this solution, we open the file descriptor and read the XML line by line until we reach the threshold $ max_B. If the file is too large, we still have the overhead of reading it up to the 10 MB mark, but it works as expected. If the file is less than $ max_B, it continues ...

 $xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/"; //$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml"; $fh = fopen($xml_file, "r"); if($fh){ $file_string = ''; $total_B = 0; $max_B = 10485760; //run through lines of the file, concatenating them into a string while (!feof($fh)){ if($line = fgets($fh)){ $total_B += strlen($line); if($total_B < $max_B){ $file_string .= $line; } else { break; } } } if($total_B < $max_B){ echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...'; //proceed $dom = new DOMDocument(); $dom->loadXML($file_string); //NOTE the method change because we're loading from a string } else { //reject echo 'File too big! Max size = '.$max_B.' bytes.'; } fclose($fh); } else { echo '404 file not found!'; } 
+2
Apr 21 '16 at 6:56
source share

10MB is 10485760 B. If the content length is not specified, it will use the curl available with php5. I got this source from somewhere in SO, but couldn't remember it .:

 function get_filesize($url) { $headers = get_headers($url, 1); if (isset($headers['Content-Length'])) return $headers['Content-Length']; if (isset($headers['Content-length'])) return $headers['Content-length']; $c = curl_init(); curl_setopt_array($c, array( CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'), )); curl_exec($c); return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD); } } $filesize = get_filesize("http://www.dailymotion.com/rss/user/dialhainaut/"); if($filesize<=10485760){ echo 'Fine'; }else{ echo $filesize.'File is too big'; } 

.

Check out the demo here.

+1
Apr 21 '16 at 6:44
source share

Edit: The new Answer is a bit workaround:
You cannot check Dom Elements Length, BUT, you can make a header request and get the file size from the URL:

 <?php function i_hope_this_works( $XmlUrl ) { //lets assume we fk up so we set size to -1 $size = -1; $request = curl_init( $XmlUrl ); // Go for a head request, so the body of a 1 gb file will take the same as 1 kb curl_setopt( $request, CURLOPT_NOBODY, true ); curl_setopt( $request, CURLOPT_HEADER, true ); curl_setopt( $request, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() ); $requesteddata = curl_exec( $request ); curl_close( $request ); if( $requesteddata ) { $content_length = "unknown"; $status = "unknown"; if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) { $status = (int)$matches[1]; } if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) { $content_length = (int)$matches[1]; } // you can google status qoutes 200 is Ok for example if( $status == 200 || ($status > 300 && $status <= 308) ) { $result = $content_length; } } return $result; } ?> 

Now you can get each file size by URL only with

 $file_size = i_hope_this_works('yourURLasString') 
-one
Apr 21 '16 at 6:40
source share



All Articles