I would say that the option mentioned in the topic of my blog post about blobstreams has not yet been mentioned (except for comments): create a pipeline for processing streams that load and interpret the required file. Then use the code to read the interpreted records from this composite stream and perform the necessary inserts / updates in your database within one transaction (for each file / record in accordance with your functional requirements).
In this scenario, the Stream classes are used. This would mean that you would never have the whole file anywhere on disk or in memory at the same time during processing. As you mentioned, downloading a file takes a few minutes; it can be large. Can your system use intermediate storage of a complete file (possibly several times: memory and disk)? Even if several files are processed at the same time?
In addition, if in practice you find out that the chain is not reliable enough for you, and you would like to be able to temporarily store the downloaded file on disk, and really want to repeat it without downloading it again, this is easy. All you need is an additional Stream in the pipeline, which will check whether the file is already needed in the cache of "already downloaded files" (in some folder, in isolated storage, whatever) and return the bytes to this, and not to the actual actually looping the Stream download into your processing pipeline.
source share