Slow performance when writing a large binary file

Question

Slow performance when writing a large binary file

In one of our programs, we create records and save them in a binary file. Once the write operation is complete, we read this binary. The problem is that if this binary is less than 100 MB, then its performance is good enough, but as soon as this file gets larger, its performance drops.

So, I was thinking of splitting this large binary (> 100 MB) into smaller ones (<100 MB). But it looks like this solution is not gaining performance. So, I just thought, what could be the best approach to solving this scenario?

It will be a great help from you guys to comment on this.

thanks

+6

c ++

Manish Jan 19 '10 at 6:19

source share

4 answers

JRL · Answer 1 · 2010-01-19T06:24:28+0000

Perhaps you could use Sqlite instead .

Matthieu M. · Answer 2 · 2010-01-19T10:32:27+0000

It is always difficult to give accurate answers just by looking at the system, but have you really tried to check the actual bandwidth?

As a first solution, I would simply recommend using a dedicated drive (so there are simultaneous read / write actions from other processes) and fast. Thus, it will be only a small cost of upgrading the equipment, and we all know that hardware is usually cheaper than software;) You can even switch to a RAID controller for maximum throughput.

If you are still limited by disk bandwidth, there are new technologies out there using Flash technology: USB keys (although this may not seem very professional), or "new" solid state drives can provide more bandwidth than mechanical drives.

Now, if the approach to disks is not fast enough or you cannot get good SSDs, you have other solutions, but they are related to software changes, and I suggest them from the top of my hat.

Approach to the socket: the second utility listens on the port, and you send data there. On a local computer, this is relatively fast, and you also parallelize the work, so even if the size of the data grows, you will still start to process pretty quickly.
Approach to memory mapping: write to the selected area in live memory and read this utility from this area ( Boost.Interprocess can help, there are other solutions).

Note that if the reading is sequential, I find it more “natural” to try the “pipe” approach (ala Unix) so that both processes run simultaneously. In a traditional handset, data may not get to disk at all.

Shame, is it not true that in this age of overwhelming computing power we are still struggling with our IO drive?

stacker · Answer 3 · 2010-01-19T06:45:21+0000

If your application reads sequential data migration to the database, this will not help improve performance. If random access is used, you should consider moving data to the database, especially if different indexes are used. You should check to see if enough resources are available if they are fully loaded into memory. Virtual memory management can affect performance (swapping, paging). Depending on your OS setup, the limit for file buffers may be reached. The file system itself can be fragmented. To get a better answer, you must provide information about the hardware, os, memory, and file system. And the way to use your data file. How can you get tips on tuning the kernel, etc.

APC · Answer 4 · 2010-01-19T07:00:16+0000

So what is the search engine here? How does your application know which of the smaller files to search for entries? If you split a large file without performing any form of keyword search - indexing, partitioning - you did not solve the problem, just reinstall it.

Of course, if you have implemented some form of indexing, then you have begun the path of creating your own database.

Without knowing more about your application, we would be reckless to advise. Perhaps a solution would be to apply a solution for an RDBMS. Perhaps NoSQL's approach would be better. Perhaps you need a text indexing and search engine.

So...

How often should your application retrieve records? How does he decide which records to get? What is your definition of bad work? Why did you (your project) decide to use flat files and not a database in the first place? What reports are we talking about?

Slow performance when writing a large binary file

More articles: