Listing over 10 million records from Oracle With C #

I have a database containing over 100 million records. I am executing a query containing over 10 million records. This process takes too much time, so I need to reduce this time. I want to save the resulting list of records as a csv file. How can I do this as quickly and optimally as possible? Wait for your offers. Thanks.

+8
performance c # oracle
source share
4 answers

I assume your query is already bound to the rows / columns you need and makes good use of indexing.

On this scale, the only critical thing is that you are not trying to immediately load it into memory; so forget about things like DataTable and most low-fat ORMs (which usually try to associate strings with an identifier manager and / or change manager). You will need to use either raw IDataReader (from DbCommand.ExecuteReader ), or any API that builds an unbuffered iterator on top of this (there are several of them, I'm prone to dapper). For writing CSV, a raw data reader is probably fine.

Beyond this: you cannot do it much faster since you are limited in bandwidth. The only way to speed things up is to create a CSV file on the database server so that there is no network overhead.

+11
source share

Most likely you need to do this in C #. This is the bulk data upload / export area (commonly used in Stereo data scenarios).

Many (free) tools (I think even Toad by Quest Software) will do this more efficiently and effectively than you can write on any platform.

I have a hunch that in fact this is not necessary for the end user (a simple observation is that the department secretary does not actually need to send copies of this instance, it is too large to be useful in this way).

I suggest using the right tool for the job. And whatever you do

  • donot rolls your own data type conversions
  • use CSV with quoted literals and think about avoiding double quotes inside these
  • think of regional options (IOW: always use InvariantCulture for export / import!)
+5
source share

"This process takes too much time, so I need to reduce this time."

This process consists of three subprocesses:

  • Receive> 10 m records
  • Writing Records to a File
  • Transferring records over the network (my presumption is that you are working with a local client with a remote database).

Any or all of these problems can be a bottleneck. Thus, if you want to reduce the total time spent on what you need to find out where the time is spent. You probably need to measure the C # code to get the metrics.

If it turns out that this is a problem, then you will need to configure it. Indexes will not help here, since you are extracting a large piece of the table (> 10%), therefore, it will help increase the performance of a full table scan. For example, increasing memory to avoid disk sorting. A parallel query can be useful (if you have an Enterprise Edition and you have enough processors). Also check that the problem is not a hardware problem (spindle matching, ingenious interconnections, etc.).

Can there be a problem with writing to a file? Perhaps your disk is slow for some reason (for example, fragmentation), or perhaps you agree with other processes that are written to the same directory.

Transferring large amounts of data over the network is obviously a potential bottleneck. Are you sure that you send only ready-made data to the client?

Alternative architecture: use PL / SQL to write records to a file on the dataserver, using mass collection to retrieve managed batches of records, and then transfer the file to where you need it at the end, via FTP, possibly compressing it first.

+2
source share

The real question is why you need to read so many rows from the database (and so much of the basic dataset). There are many approaches that should make this scenario workaround; synchronous processing, message queuing, and pre-consolidation are obvious.

Leaving this for a while ... if you consolidate data or sift it, then the implementation of the main part of the PL / SQL logic saves the need to drag and drop data over the network (even if it's just on localhost, there is still a lot of overhead). Again, if you just want to upload it to a flat file , implementing this in C # does not do you any good.

+1
source share

All Articles