How to speed up batch processing in Coldfusion?

One day, I get a large data file that my client loads and needs to be processed via CMFL. The problem is that if I put the processing on the CF page, then after 120 seconds it will start in timeout. I managed to move the processing code to CFC, where there did not seem to be a timeout problem. However, sometime during processing, this causes the ColdFusion to crash and needs to be restarted. There are a number of database queries (5 or more, a mixture of updates and samples) required for each line (8,000+) of the file that I am viewing, as well as other logic provided by me in the form of CFML.

My question is what would be the best way to get through this file. One caveat, I cannot move the file to the database server and completely process it using the database. However, would it be more efficient to pass each line to a stored procedure that took care of everything? There will still be a lot of challenges in the database, but nothing compared to what I have now. Also, what would be the best way to provide feedback to the user about what part of the file has been processed?

Edit: I am launching CF 6.1

+4
source share
7 answers

I just did a similar thing and often used CF for data analysis.

1) Maintain a file upload table (parent table). For each file you upload, you should be able to keep a list of each file and what status it is (downloaded, processed, unprocessed)

2) Temp table to store all the lines of the data file. (child table) . Import the entire data file into a temporary table. Trying to do all this in memory will inevitably lead to some errors. Each row in this table will refer to a file upload table entry above.

3) Save processing status . For each line of the data file that you enter, set the tag "process / unprocessed". That way, if it breaks, you can start from where you left off. When you go through each line, set it as "processed".

4) Transaction - use cftransaction, if possible, do it all at once or at least one line at a time (with your 5 requests). Thus, if something is booming, you do not have one row of data that is half calculated / processed / updated / tested.

5) Once you are finished processing, set the file name entry in the table in step 1 so that it is "processed"

Using the above approach, if something fails, you can set it so that it starts from where it was stopped, or at least have a clearer way to start an investigation or in the case of a failure in your data at worst case. You will have a clear way to show the user the status of the current download processing, where it is located and where it was stopped if there was an error.

If you have any questions, let me know.

Other thoughts:

  • You can increase timeouts, give the VM more memory, put it in 64 bits, but all of them will only increase the capacity of your system. It is a good idea to do this for a call and to do it in combination with the above.

  • Java has some neat file handling libraries that are available as CFCS. if you encounter a lot of speed problems, you can use one of them to read it in a variable and then into the database

  • If you are playing with XML, do not use xml parsing. It works great for small files and is suitable when everything gets bigger. It says some cfc (check riaforge, etc.) that wrap some excellent java libraries for parsing XML data. Then you can create cfquery manually, if necessary, with this data.

+6
source

It’s hard to say, no more detailed information, but from what you said, I’m shooting three ideas.

Firstly, there are so many database operations, perhaps you are generating too much debugging. Make sure that the following settings are disabled in the administrator’s Debug Output settings.

  • Include reliable exception information
  • Enable AJAX Debug Log Window
  • Debug output request

The second thing I would like to do is look at these database queries and make sure they are optimized. Make sure that the selected events happen with pointers, etc.

Thirdly, I suspect that a file posted in memory is probably suboptimal.

I would try to loop through a file using file looping:

<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line"> <!--- Code to go here ---> </cfloop> 
+4
source

Have you tried the event gateway? I believe that these streams do not obey the same timeout parameters as the page request streams.

+1
source

SQL Server Integration Services (SSIS) is the recommended tool for complex ETL operations (Extract, Transform, and Load), which is what it looks like. (It can be configured to access files on other servers.) Perhaps the question is, can you work with the interface between Cold Fusion and SSIS?

0
source

If you can upgrade to cf8 and use the cfloop file = "" file, which will give you more speed and the file will not be put into memory (which is probably the cause of the crash).

Depending on the situation you are facing, you can also use cfthread to speed up processing.

0
source

Currently, an event gateway is the only way to get around the HTTP request cycle timeout limits. CF is not able to process CF pages offline, i.e. No command line call (one of my biggest concerns regarding CF is very little processing).

It is best to use an event gateway or rewrite the parsing logic in direct Java.

0
source

I needed to do the same, Ben Nadel wrote a bunch of great articles using the java io file so you can read files faster, write files, etc.

Really helped improve the performance of our imported csv application.

0
source

All Articles