Stream (.NET) Manages Best Practices

The question is entitled โ€œflowโ€ because the question below is a concrete example of the more general doubt that I have about flows:

I have a problem that makes two decisions, and I want to know the best:

  • I download the file, save it to disk (2 min), read it and write the contents to the database (+ 2 min).
  • I upload the file and write the contents directly to the database (3 min).

If the write to the database fails, I will have to boot again in the second case, but not in the first case.

What's better? What would you use?

+4
source share
8 answers

To answer Jekke in more detail:

Depending on the file system, there are many cases of failure (you must create the correct file name, make sure that the file system is not full, make sure that the file can be opened and written by you, but not by someone else, what about simultaneous use, etc. d.).

The only advantage of writing to a file that I can think of is that you will find out that the download was completed successfully before doing anything with the database. If you can keep the contents in memory, do it instead. If you canโ€™t and really insist that you donโ€™t go to the database if the download is interrupted, at least use the built-in .NET support to help you with complex bits (like IsolatedStorageFileStream).

+2
source

If only increased latency actually kills you, I usually choose option 1, if there is no good reason why you do not need data in the file system (for example, security problems, capacity, ...).

Or perhaps Option 3 proposed by Max Schmeling , save to the file system at the same time as writing to the database.

Disk space is cheap, and it is often useful to have a backup copy of the downloaded data (for example, to check for changes in the database record code as evidence of the contents of the downloaded data ...).

+3
source

I would suggest that if writing to the database fails due to something in the contents of the file, it will always fail no matter how many times I try to write the same content to the database. In this case, the only solution is to (fix and) re-upload the file anyway. If you are not writing to the database due to something in the database, you are having more problems than having to re-upload the file.

Go to option number 2.

+2
source

There is no reason why step 2 should take two minutes twice. When downloading a file, you can transfer it through variables in memory along the path to the database.

If you have no good reason to store the file system in a file, in most cases I would go with No. 2.

+1
source

I donโ€™t understand what qualifiers you added regarding the time or you need to download the file twice, but if the system is memory-bound, caching the load to disk and then sending it to the database can really be your only option (if your data provider can accept the stream) .

EDIT: in the original post, the author describes writing directly to the database as a two-step process, which, I believe, should be 1. Load the file into a variable, 2. Put the contents of the variable in DB. If he goes directly to the database in option 2, I agree that the best way to go.

+1
source

I would go with option two. There should not be crashes very often, and when you can just restart. If for some reason you need a local copy in the file system, then you do not download, save, read or send to the database ... just download and send to the database at the same time as saving to the file system.

+1
source

I would choose option 3. Save it to disk and save the URI in the database. I have never been a fan of storing files in a database.

+1
source

I would say that the option mentioned in the topic of my blog post about blobstreams has not yet been mentioned (except for comments): create a pipeline for processing streams that load and interpret the required file. Then use the code to read the interpreted records from this composite stream and perform the necessary inserts / updates in your database within one transaction (for each file / record in accordance with your functional requirements).

In this scenario, the Stream classes are used. This would mean that you would never have the whole file anywhere on disk or in memory at the same time during processing. As you mentioned, downloading a file takes a few minutes; it can be large. Can your system use intermediate storage of a complete file (possibly several times: memory and disk)? Even if several files are processed at the same time?

In addition, if in practice you find out that the chain is not reliable enough for you, and you would like to be able to temporarily store the downloaded file on disk, and really want to repeat it without downloading it again, this is easy. All you need is an additional Stream in the pipeline, which will check whether the file is already needed in the cache of "already downloaded files" (in some folder, in isolated storage, whatever) and return the bytes to this, and not to the actual actually looping the Stream download into your processing pipeline.

+1
source

All Articles