Digital Image Compression

Currently, we are required by law to digitize all financial documents in our company and submit them for evaluation every 3 months.

Since this is sensitive data, we decided to take matters into our own hands and create a kind of digital data archiver. The tool works fine, but after 7 months of use, we begin to worry about the disk space used by these images.

Here is information about the number of documents digitized:

  • 15 thousand documents scanned and archived per day, with a final size of PNG + - 860 KB: 15,000 * 860 kilobits = 1.53779984 gigabytes.
  • 30 work days per month: 1.53779984 gigabytes * 30 = 46.1339952 gigabytes
  • Waiting for disk space after 1 year: 46.1339952 gigabytes * 12 = 553.607942 gigabytes

So far, we have occupied 424 gigabytes of disk space, excluding backups. We use PNG as an image format, but I would like to know if anyone has tips on a better image compression algorithm or alternative strategies for PNG compression, even more or even more efficient ways of archiving images to save disk space.

Any help would be appreciated, thanks.

+4
source share
3 answers

You will be better off with DjVu , a relatively new format that was specifically designed to compress scanned documents. It works well for bitones, shades of gray, and color documents. It combines the separation of foreground and background with a complex wavelet compression scheme. If you get a commercial version, I believe that you can also get your OCR'd documents so you can search for them, but there is an open source version called DjVuLibre .

+3
source

Presumably, these documents do not have to be constantly online. If so, from the information you provide, I see no reason why you will need to change the workflow.

PNG is a wide format lossless compression (zlib) format that I assume you are using. If you do not need lossless compression, a good ole JPEG will give you denser compression due to a slight loss in quality if you set compression ratios correctly. JPEG2000 may be another alternative, depending on your software stack. LZW-compressed TIFF does not have much advantages over PNG, except for support for 16 bits per pixel, which you probably do not need. Other options include proprietary special codecs (such as MrSID), which offer extremely good compression of extremely large files for the price.

Since these are scanned documents, I think I think that PDF is a โ€œnaturalโ€ format for encoding them. PDF offers various compression options depending on the contents of the files. But I would not go to great lengths to fix something that didn't break.

If you think about how much you are spending on disk space now, 1.5 GB per day is nothing. The drive is cheap and constantly cheaper. Just buy three new 1TB USB drives (primary / backup / backup) every 6 months at a total cost of $ 240 or something else. Even backing up to tape is not unreasonable.

+2
source

500 GB per year is not so much, and hard drives are cheaper every year

0
source

Source: https://habr.com/ru/post/1311682/


All Articles