Database vs flat files

The company I work with is trying to switch a product that uses a flat file format in a database format. We process quite large data files (e.g. 25 GB / file) and they update very quickly. We need to run queries that randomly process data, as well as in an adjacent way. I try to convince them of the benefits of using a database, but some of my colleagues seem reluctant to do so. So I was wondering if you guys can help me here with some reasons or links to posts about why we should use databases or at least to explain why flat files are better (if any).

+70
database file
Mar 01
source share
10 answers
  • Databases can handle task requests, so you don’t need to go over files manually. Databases can handle very complex queries.
  • Databases can handle indexing tasks, so if tasks like retrieving a record with id = x can be VERY fast
  • Databases can handle multi-processor / multi-threaded access.
  • Databases can handle network access
  • Databases can keep track of data. Integrity.
  • Databases can easily update data (see 1))
  • Databases are reliable
  • Databases can handle transactions and concurrent access
  • Databases + ORMs allow you to manipulate data very friendly to programmers.
+84
Mar 01 '10 at 15:37
source share

This is the answer I gave some time ago:

It completely depends on the needs of the application. Often, direct access to text files / binary files can be extremely fast, efficient, and also provides you with all the options for accessing files in your OS file system.

In addition, your programming language most likely already has a built-in module (or it is easy to create) for a specific analysis.

If you need a lot of extras (INSERTS?) And serial / little access, little / no concurrency, files are the way to go.

On the other hand, when your requirements for concurrency, inconsistent read / write, atomicity, atomic permissions, your data are relational in nature, etc., you will be better off with a relational or source database.

With SQLite3, you can achieve a lot that is extremely lightweight (up to 300 KB), ACID compatible, written in C / C ++ and ubiquitous (if it is not already included in your example programming language -for Python-, of course, there is one available). This can be useful even for database files up to 140 terabytes or 128 terabytes in size (a link to the database size ), or maybe more.

If your requirements are much greater, even discussion will not be, go to a full-fledged RDBMS.

As you say in a comment that a “system” is just a bunch of scripts, you should take a look at pgbash .

+39
Mar 01 '10 at 15:53
source share

Do not create it if you can buy it.

I recently heard this quote, and it really seems appropriate. Ask yourself about this ... How much time has been spent processing part of your application? I suspect that quite a bit of time has been spent optimizing this code for performance. If you used the relational database all the time, you would spend significantly less time processing this part of your application. You would have more time for the true “business” aspect of your application.

+6
Mar 01 '10 at 15:41
source share

They are faster; if you do not load the entire flat file into memory, the database will provide quick access in almost all cases.

They are safer; Databases are easier to securely back up. they have mechanisms to check for file corruption that are not in flat files. Once the corruption in your flat file migrates to your backups, you're done, and you still don't know that.

They have more options; Databases can allow many users to read / write at the same time.

It is much more difficult to work with them when they are configured.

+5
Mar 01 '10 at 15:49
source share

Databases completely.

However, if you still need to store files, you have no way to use the new RDBMS (e.g. Oracle, SQLServer, etc.) than to view the XML.

XML is a structure file format that offers you the ability to store things as a file, but gives you the power of a query over a file and the data inside it. XML files are easier to read than flat files, and they can be easily converted using XSLT for even greater readability. XML is also a great way to migrate data if you must.

I highly recommend the DB, but if you can't go this route, XML will be second.

+4
Mar 01
source share

What about a non-relational (NoSQL) database like Amazon SimpleDB, Tokio Cabinet, etc.? I heard that Google, Facebook, LinkedIn use them to store their huge data sets.

Can you tell us if your data is structured, if your scheme is correct, if you need easy replication, if access time is important, etc.?

+3
01 Mar.
source share

What types of files are not mentioned. If these are media files, continue with flat files. Perhaps you just need a database for tags and some way to associate "external BLOBs" with entries in the database. But if you need a full-text search, there is no other way but to switch to a full database.

Another thing is that your file system can provide a ceiling on the number of physical files.

+3
Mar 01 '10 at 15:47
source share

The SQL ad hoc capabilities for the query are sufficient for me. Thanks to the good layout and indexing on the tables, it is fast and efficient and will have good performance.

+2
Mar 01 '10 at 15:36
source share

If you load files into memory each time you download, use a database. Just like that.

This assumes that your colleges already have a program for processing file requests. If not, use the database.

+2
Apr 08 '13 at 6:31
source share

The difference between the database and flat files is shown below:

  • The database provides more flexibility, while a flat file provides less flexibility.

  • A database system ensures data consistency, while a flat file cannot ensure data consistency.

  • The database is more secure over flat files.
  • Database support for DML and DDL, while flat files cannot support them.

  • Less data redundancy in the database, while data redundancy in flat files.

+2
Dec 25 '17 at 4:55 on
source share



All Articles