Many requests in json generation task

So, I have a build task that is going to archive a ton of data in our database in JSON.

To better understand what is happening; X has 100 s Ys, and Y has 100 s Zs, etc. I create a json file for each X, Y and Z. But each X json file has an array of identifiers for the child Ys from X, and also Ys stores an array of child Zs ..

This is more complicated than in many cases, but you should get an idea of โ€‹โ€‹the complexity associated with this example, I think.

I used ColdFusion, but it seems to be a bad choice for this task because it crashes due to memory errors. It seems to me that if he deleted requests from memory that are no longer referenced during the execution of the task (i.e. garbage collection), then the task should have enough memory, but afaict ColdFusion does not do garbage collection at all and should do it after completion request.

So, I'm looking for either tips on how best to complete my task in CF, and recommendations for using other languages.

Thanks.

+1
source share
2 answers

Eric, you are absolutely right that the ColdFusion garbage collection does not delete the request information from memory until the end of the request, but I have described it in sufficient detail in another SO question . In short, you end up in OoM Exceptions when you loop requests. You can prove this with a tool like VisualVM to generate a bunch of heaps at runtime, and then run the resulting dump via the Eclipse Memory Analysis Tool (MAT). What MAT will show you is a big hierarchy, starting with an object with the name (I don't do this) CFDummyContent , which contains, among other things, links to cfquery and cfqueryparam . Please note that trying to change it to stored processes or even interacting with the database through JDBC does not matter.

So. What kind. For. Whether there is a?

It took me a while to figure out, but you have 3 options in increasing order of difficulty:

Using cfthread is as follows:

 <cfloop ...> <cfset threadName = "thread" & createUuid()> <cfthread name="#threadName#" input="#value#"> <!--- do query stuff ---> <!--- code has access to passed attributes (eg #attributes.input#) ---> <cfset thread.passOutOfThread = somethingGeneratedInTheThread> </cfthread> <cfthread action="join" name="#threadName#"> <cfset passedOutOfThread = cfthread["#threadName#"].passOutOfThread> </cfloop> 

Please note that this code does not use asynchronous processing, therefore immediate merging after each call to the thread, but rather a side effect that cfthread works in its own request area, regardless of page,

I will not close ColdFusion gateways here. HTTP chain-chain means performing an increment of work, and at the end of the increment, it starts a request for the same algorithm that tells it to execute the next increment.

Basically, all three approaches allow you to collect these memory links in the middle of the process.

And yes, for those who ask, errors have been raised with Adobe, see the question referred to. In addition, I believe this issue is specific to Adobe ColdFusion, but have not tested Railo or OpenDB.

Finally, need piercing. I spent a lot of time tracking this, fixing it in my own large code base, and some others listed in the question mentioned also have. AFAIK Adobe does not acknowledge that the problem was resolved in order to fix it. And, yes, this is a mistake, simple and simple.

+4
source

1) If you have debugging turned on, coldfusion will support your requests until the page is completed. Unplug it!

2) You may need a structDelete () query variable so that it can be garbage collected, otherwise it can be stored as long as there is an area that has a link to it. <cfset structDelete(variables,'myQuery') />

3) Cfquery retrieves the entire ResultSet into memory. In most cases, this is normal. But for reporting a large set of results, you do not want this. Some JDBC drivers support the fetchSize setting, which in read-only forward mode, allows you to get multiple results at a time. So you can deal with thousands and thousands of lines without memory. I just created a 1 GB csv file in ~ 80 seconds using less than 100 mb of heap. This requires access to Java. But he kills two birds with one stone. This reduces the amount of data entered at one time by the JDBC driver, and since you work directly with the ResultSet, you do not get into the cfloop problem mentioned above. Of course, this is not for those who do not have some Java cuts.

You can do it something like this (you need cfusion.jar in your build path):

 import java.io.BufferedWriter; import java.io.FileWriter; import java.sql.ResultSet; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import au.com.bytecode.opencsv.CSVWriter; import coldfusion.server.ServiceFactory; public class CSVExport { public static void export(String dsn,String query,String fileName) { Connection conn = null; Statement stmt = null; ResultSet rs = null; FileWriter fw = null; BufferedWriter bw = null; try { DataSource ds = ServiceFactory.getDataSourceService().getDatasource(dsn); conn = ds.getConnection(); // we want a forward-only, read-only result. // you may want need to use a PreparedStatement instead. stmt = conn.createStatement( ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY ); // we only want to go forward! stmt.setFetchDirect(ResultSet.FETCH_FORWARD); // how many records to pull back at a time. // the hard part is balancing memory usage, and round trips to the database. // basically sacrificing speed for a lower memory hit. stmt.setFetchSize(256); rs = stmt.executeQuery(query); // do something with the ResultSet, for example write to csv using opencsv // the key is to stream it. you don't want it stored in memory. // so excel spreadsheets and pdf files are out, but text formats like // like csv, json, html, and some binary formats like MDB (via jackcess) // that support streaming are in. fw = new FileWriter(fileName); bw = new BufferedWriter(fw); CSVWriter writer = new CSVWriter(bw); writer.writeAll(rs,true); } catch (Exception e) { // handle your exception. // maybe try ServiceFactory.getLoggingService() if you want to do a cflog. e.printStackTrace(); } finally() { try {rs.close()} catch (Exception e) {} try {stmt.close()} catch (Exception e) {} try {conn.close()} catch (Exception e) {} try {bw.close()} catch (Exception e) {} try {fw.close()} catch (Exception e) {} } } } 

Figuring out how to pass parameters, logging, turning this into a background process (hint: extend the stream), etc. - These are individual problems, but if you are looking at this code, it should not be too complicated.

4) Maybe look at Jackson to generate your json. It supports streaming and in combination with fetchSize and BufferedOutputStream you should be able to use memory.

+5
source

All Articles