Detection if user deletes the same file twice in browser window

I want to allow users to drag and drop images from their desktop into the browser window, and then upload these images to the server. I want to download each file only once, even if it appears in the window several times. For security reasons, the information from the File object available to JavaScript is limited. According to msdn.microsoft.com , only the following properties can be read:

  • name
  • lastModifiedDate

(Safari also provides size and type ).

The user can delete two images with the same name and the last modified date from different folders in the browser window. There is a very small but finite probability that the two images are really different from each other.

I created a script that reads in the raw dataURL of each image file and compares it with the files that were previously reset in the window. One of the advantages of this is that it can detect identical files with different names.

It works, but it seems redundant. It also requires a huge amount of data to store. I could improve this (and add to overkill) by making a hash of the url data and saving it instead.

I hope there may be a more elegant way to achieve my goal. What can you offer?

 <!DOCTYPE html> <html> <head> <title>Detect duplicate drops</title> <style> html, body { width: 100%; height: 100%; margin: 0; background: #000; } </style> <script> var body var imageData = [] document.addEventListener('DOMContentLoaded', function ready() { body = document.getElementsByTagName("body")[0] body.addEventListener("dragover", swallowEvent, false) body.addEventListener("drop", treatDrop, false) }, false) function swallowEvent(event) { // Prevent browser from loading the dropped image in an empty page event.preventDefault() event.stopPropagation() } function treatDrop(event) { swallowEvent(event) for (var ii=0, file; file = event.dataTransfer.files[ii]; ii++) { importImage(file) } } function importImage(file) { var reader = new FileReader() reader.onload = function fileImported(event) { var dataURL = event.target.result var index = imageData.indexOf(dataURL) var img, message if (index < 0) { index = imageData.length console.log(dataURL) imageData.push(dataURL, file.name) message = "Image "+file.name+" imported" } else { message = "Image "+file.name+" imported as "+imageData[index+1] } img = document.createElement("img") img.src = imageData[index] // copy or reference? body.appendChild(img) console.log(message) } reader.readAsDataURL(file) } </script> </head> <body> </body> </html> 
+5
source share
1 answer

Here is a suggestion (I have not seen the mention in your question):

Create a Blob URL for each file object in the FileList object that will be stored in the URL Store browsers, saving their URL String.

Then you pass this URL string to the webworker (separate stream), which uses FileReader to read each file (access via the Blob URL string) in the divided sections, reuse a single fixed-size buffer (almost like a circular buffer), to calculate the hash file (there are simple / fast portable hashes like crc32, which can often be simply combined with vertical and horizontal checksums in the same loop (also portable pieces over pieces).
You can speed up the process by reading 32-bit (unsigned) values ​​instead of 8-bit values ​​using the corresponding "bufferview" (which is 4 times faster). System entity is not important, do not waste resources on it!

Upon completion, the webworker then transfers the hash of the file to the main thread / application, which then simply performs your comparison with the matrix [[fname, fsize, blobUrl, fhash] /* , etc /*] .

Pro
A reusable fixed buffer significantly reduces memory consumption (at any level that you specify), webworker improves performance by using an additional stream (which does not block your main browser stream).

Con
You still need a spam server for browsers with javascript disabled (you can add a hidden field to the form and set its value using javascript as a validation tool with javascript support to reduce server load). However .. even then .. you still need a server-side backup to protect against malicious input.

Utility
So ... no net profit? Well .. if the likelihood is that the user can upload duplicate files (or just use them in a web application) than you saved at the waist, just to check. This is a pretty (environmental / financial) victory in my book.


Extra
Hashes are prone to collision, period. To reduce the (realistic) chance of a collision, you should choose a more advanced hash algo (most of them are easily portable in chunked mode). The obvious tradeoff for more advanced hashes is a higher code size and lower speed (higher CPU utilization).

0
source

All Articles