Download PDF or .doc and protection

I have a script that allows the user to upload text files (PDF or doc) to the server, then the plan is to convert them to the source text. But until the file is converted, it will be in a raw format, which makes me worry about viruses and any unpleasant things.

Any ideas what I need to do to minimize the risk of these unknown files. How to check whether it is cleared, or even in the format that it claims, and that it does not break the server.

+4
source share
6 answers

As I commented on Aerik, but this is really the answer to the question.

If you have PHP> = 5.3, use finfo_file() . If you have an older version of PHP installed, you can use mime_content_type() (less reliable) or download the Fileinfo extension from PECL.

Both of these functions return the mime file type (by looking at the data type inside them). For PDF, it should be

 text/pdf 

For the word doc, this can be several things. Usually it should be

 application/msword 

If your server is running * nix, make sure that the files you save are not executing. Even better: save them in a folder inaccessible to the web server. You can still write code to access the files, but someone who requests a web page will not be able to access them at all.

+4
source

If you have ever opened or executed any user-uploaded file on a server, you should expect your server to be at risk now.

Even jpg can contain php executable. If you include or require file in any way in your script, that could also endanger your server. The image you stumble upon the website was like that ...

 header ('Content-type: image / jpeg');
 header ('Content-Disposition: inline; filename = "test.jpg"');

 echo file_get_contents ('/ some_image.jpg');
 echo '<? php phpinfo ();  ?> ';

... which you save and repost on your server like that ...

 $ q = $ _GET ['q'];  // pretend this is sanitized for the moment
 header ('Content-type:' .mime_content_type ($ q));
 header ('Content-Disposition: inline; filename = "'. $ _ GET ['q']. '"');

 include $ q;

... will execute phpinfo() on your server. Then, users of your site can simply save the image to your desktop and open it using notepad to see your server settings. Simply converting the file to a different format will cancel this script and should not run any actual virus attached to the file.

It may also be best to do a virus scan at boot time. You should be able to make a built-in system command for verification and analyze its output to find out if it will find anything. Your site users should always check the files they upload.

Otherwise, even a virus containing a user-uploaded file just sitting on your server should not do anything harm ... as far as I know.

+2
source

Hum - imho you don’t need to worry about the type of document or anything like that; if you use a good converter to convert to raw text, this one should perform these checks without server crashes.

As you know from your client computer, servers should always be protected from viruses and attacks - so a new downloaded file must be checked before processing it.

I have never seen a web application perform these own checks - isn't it?

+1
source

IMHO, until something tries to execute it, it's just a file. However, you can definitely check (but not rely, as explained below) on the file extension, and also examine the file formats to see if there are any specific byte sequences in the file header that you could check.

+1
source

If you are viewing a PDF file, you can do nothing but antivirus and pray that it maliciously catches the generated PDF file.

Conversion software is usually not intended, so if you just convert it and view the output in text format, you should be somewhat safer.


Oh, you are worried about the server. Just do not execute the downloaded files ...

+1
source

there are 3 security methods in the downloaded file: best: put the file on another server, the most secure is better: put them outside your WWW folder, this means that no authority can access them at the URL, and you must use readfile () or get_content for reading and displaying last files: putting files on the WWW and using .htaccess in a folder that prevent others from executing the file or putting unknown files is what I do when uploading files; put them from the root of the website and rename them, even save the fake name in the database and create a real file name according to the algorithm.

after downloading the file outside the web root, you can access it, as I am here. Here is the contents of the caleed getfile.php file:

  <?php define('DS', DIRECTORY_SEPARATOR); //fake name of file $uniqueid = $_GET['uniqueid']; //file extension $ext = $_GET['ext']; if (isset($_GET['dir'])) //check address doenot contain .. $addrss = str_replace('..', '_', $_GET['dir']); $baseaddress = '..' . DS . 'foldername outside of web root'; if ((isset($_GET['uniqueid']) and strlen($uniqueid) === 32) and ( isset($_GET['ext']) and strlen($ext) === 3 )) { $path = $baseaddress . DS . $addrss . DS; $path .= md5($uniqueid . $uniqueid . $uniqueid . $ext.'*#$%^&') .'.'. $ext; if (file_exists($path)) { //you can check for all your accessible extension i just use for img switch ($ext) { case 'jpg': $content_type = 'image/jpeg'; break; case 'png': $content_type = 'image/png'; break; case 'gif': $content_type = 'image/gif'; break; } header('Content-type: ' . $content_type . ' '); $file = readfile($path); } 

in the src file or anywhere you need to show the file (this is for my images):

 <img src="/getfile.php?uniqueid=put fake file name here&amp;ext=put extension here&amp;dir=put rest of file address here" > 

Hope this helps you. Feel free to ask more questions.

0
source

Source: https://habr.com/ru/post/1311162/


All Articles