Is GridFS fast and reliable for production?

I am developing a new website, and I want to use GridFS as a repository for all user downloads, because it offers many advantages over regular file system repositories.

Tests with GridFS served by nginx indicate that it is not as fast as the regular file system served by nginx.

Test with nginx

Is there anyone who uses GridFS already in the production environment or will use it for a new project?

+72
mongodb nginx gridfs
Aug 05 '10 at 8:53
source share
5 answers

I use gridfs to work on one of our servers, which is part of a site comparing prices with honorable traffic statistics (about 25,000 visitors per day). There are not many bars, 2 gigs on the server, and even the processor is not very fast (Core 2 Duo 1.8Ghz), but the server has enough storage space: 10Tb (sata) in the raid 0 configuration. The work performed by the server is very simple:

Each product of our price comparator has an image (about 10 million products in accordance with our db product), and working with servers consists in uploading an image, resizing it, storing it on gridfs and delivering it to the visitors browser ... if it is not present in to the grid ... or ... delivers it to the visitors browser if it is already stored in the grid. Thus, this could be called a "traditional cdn scheme."

We have saved and processed 4 million images on this server since its launch and launch. The size and contents of the repository is done with a simple php script ... but for sure, a python script or something like java can be faster.

Current data size: 11.23g

Current storage size: 12.5 g

Indexes: 5

Index Size: 849.65m

About reliability: It is very reliable. Server does not load, index size is OK, requests are fast

About speed: for sure, it is not as fast as local file storage, maybe 10% slower, but fast enough to be used in real time, even when the image needs to be processed, which in our case is very dependent on php. Reduced time maintenance and development: it has become so easy to delete one or more images: just request db with a simple delete command. Another interesting thing: when we rebooted our old server with local file storage (there are so many files in thousands of folders), it sometimes freezes for hours because the system performed file integrity checks (it really took hours ...). We no longer have this problem with gridfs, our images are now stored in large chunks of mongodb (2gb files)

So ... in my opinion ... Yes, gridfs is fast and reliable to be used for production.

+101
Apr 11 2018-11-11T00:
source share

As already mentioned, this may not be as fast as a regular file system, but then it gives you advantages over ordinary file systems , which, in my opinion, are worth abandoning bit rate for.

Ultimately, with edging, you can reach a point, however, GridFS storage actually becomes faster, rather than a regular file system and one node.

+12
05 Sep '10 at 14:09
source share

The mdirolf nginx-gridfs module is great and fairly easy to configure. We use it in production at paint.ly to serve all the paintings, and so far there have been no problems.

+5
Nov 26 '10 at 0:02
source share

Improvements in work on larger databases - the new system that we are developing, the mongo did not fail, and the 7TB GridFS repair looks like it would take 130 hours.

Because of this, I think I will look at switching to OpenStack Swift or Ceph. However, until then it was good. And the nginx-gridfs module is sweet.

+4
Feb 24 '14 at 23:00
source share

I do not recommend using gridfs unless you know what you are doing. GridFS is just an abstraction layer that breaks files into pieces and stores files in two collections. More files - more overhead. If you expect the files to be the same size, not exceeding 32 M or so, you will be in the right direction. Do not try to store large files on gridfs. Why?

  • Drivers in different languages ​​can read the entire file (for example, chunks) when reading a small part of the file.
  • Changing the file can affect all fragments and increase database load. If your file system grows, you will have to decide to outline gridfs. Be careful! Matching is not guaranteed when splinters are initialized!

If you are thinking about downloading a loaded project, think about downloading files to documents directly (if the size is 16M or less) or select another clusterfs file, and also specify the file name / inode in your logic.

Hope this helps.

+2
Feb 03 '13 at 18:33
source share



All Articles