Serve as my text from a file system instead of a database?

I am working on a content management application in which the data stored in the database is extremely versatile. In this particular case, the container has many resources, and these resources are mapped to some kind of digital asset, be it an image, a movie, a downloaded file, or even plain text.

I’ve been arguing with a colleague for a week, because in addition to storing images, etc. - they would like to store text assets in the file system and have an application looking for the file location (from the database) and read in a text file (from the file system) before contacting the client application.

Common sense seemed to yell at me that it was ridiculous, and if we try to look for something from the database, we could also save the text in the database column and serve it along with the row lookup. Database Search + File IO seemed sound uncontrollably slower than just database search. After some time back and forth, I decided to run some tests and found the results a bit surprising. There seems to be very little coherence when it comes to benchmark times. The only clear winner in the tests was pulling a large set of data from the database and repeating the results to display a text asset, however pulling objects out of one database and displaying their textual content seems like a neck and a neck.

Now I know the limitations of running tests, and I'm not sure that I even use “tests” correctly (for example, entries in the file system are ridiculously faster than entries in the database did not know this!). I think my question is for confirmation. Is file I / O comparable with repository / text search in a database? Did I miss part of the argument here? Thanks in advance for your opinions / advice!

Quick work on what I'm using: This is a Ruby on Rails application using Ruby 1.8.6 and Sqlite3. I plan on moving the same code base to MySQL tomorrow and see if the benchmarks are the same.

+4
source share
4 answers

I think your test results will depend on how you store text data in your database. If you store it as a LOB, then behind the scenes it is stored in a regular file. With any kind of LOB, you pay Database lookup + File IO anyway.

VARCHAR is stored in a table space

Conventional text data types (VARCHAR et al.) Are very limited in size in typical relational database systems. Something like 2000 or 4000 (Oracle) sometimes 8000 or even 65536 characters. Some databases support long text but they have serious flaws and are not recommended .

LOBs are file system object references

If your text is larger, you need to use the LOB data type (e.g. CLOB in Oracle).

LOBs usually work as follows: A database only stores a reference to a file system object. A file system object contains data (for example, text data). This is very similar to what your colleague suggests, except that the DBMS does the hard work of managing links and files.

Bottom line: If you can save your text in VARCHAR, then go to it. If you do not have two options: use LOB or save the data in a file specified in the database. Both are technically similar and slower than using VARCHAR.

+1
source

The main benefit you get from using the file system is that the database will control concurrent access. Suppose that 2 processes need to change the same text at the same time, synchronization with the file system can lead to race conditions, while you will have no problems at all with everyone in the database.

+3
source

I have done this before. This is a mess, you need to constantly maintain the file system and database, so programming becomes more complicated as you expected. My advice is to either solve the whole file system solution, or the whole database solution, depending on the data. It is noteworthy that if you need a lot of queries, search for conditional data, then go to the database, otherwise fs. Please note: the database cannot be optimized for storing large binary files. However, remember that if you use both options, you will have to synchronize them, and this does not make for an elegant and pleasant (for programs) solution. Good luck

0
source

At least if your problems come from the “performance side”, you can use a “ no SQL ” solution like Redis (like through Ohm) or CouchDB ...

0
source

All Articles