How to programmatically evaluate the likelihood that an image is "personal"?

Question: I want to programmatically find image files of a "personal" nature. What characteristics can shared files have (or disadvantage compared to other image files)?

So far I just use:

filesystem.allowExt("jpg"); filesystem.allowExt("JPG"); filesystem.allowExt("jpeg"); filesystem.allowExt("JPEG"); 

(file.size > 750000 && file.size < 750000000) // bytes

(!file.name.compare(0, 4, "DSC_") && !file.name.compare(0, 4, "IMG_")) // no raw camera filenames

 float ratio = img.getWidth() / img.getHeight(); if (ratio < 1.8 && ratio > .555555556) // filter out really wide or tall images 

Other things that I think might work:

  • Filter out anything that matches linear or scanned text.
  • Use images with a high display ratio in file size, that is, optimized for display on the screen.
  • Filter images with large blocks of the same color
  • Locate folders in the file path with names such as "pers" 'private' or "old faxes"

Of course, are there any more interesting, complex or funny things that these files have? (Can you get the number of times the file was opened programmatically?)

Reference Information. So, I do the art of privacy and disclosure. The idea is that it works on the performer’s personal computer and has access to their personal data (yes, I mean what you think I mean *), and displays image files from my user account in various formats with errors mixed together with other visual effects during the performance of the play. (Yes, of course, they would know what the software does).

They will look (partially) sort of like, but, you know, more interesting: enter image description here

How effective performance involves manipulating things behind the scenes, and I want to maximize the ratio of "personal" files shown, you know, vacation shots, random icons in application support folders, web design components, etc. What are some ways I can separate “personal” files from the rest? Obviously, there is no way to do this with 100% accuracy, this is not what I am looking for. Just, on average, what attributes these files will / will not have.

* How conservative is SO in relation to such discussions, anyway? I am not trying to become political or make people uncomfortable, and I feel that this is an interesting issue that we can discuss here.

+4
source share
1 answer

You can train the neural network of image classification. Prepare two sets of images: one set of all personal images and one set of all non-personal images.

Then use software like Nueroph to train your neural network to find out the difference. After training, you can let the network decide if the image is suitable for the categories you created.

+2
source

All Articles