So, for some research work, I need to analyze a ton of raw motion data (currently almost gigantic data and growth) and spit out quantitative information and graphs.
I wrote most of it using Groovy (with JFreeChart for charting), and when performance became a problem, I rewrote the main parts in Java.
The problem is that analysis and plotting takes about a minute, while downloading all the data takes about 5-10 minutes. As you can imagine, this becomes very annoying when I want to make small changes to the graphs and see the result.
I have a couple of ideas to fix this:
Upload all the data to the SQLite database.
Pros: It will be fast. I can run SQL to get aggregated data if I need to.
Cons: I have to write all this code. In addition, for some graphs, I need access to each data point, so downloading several hundred thousand files, some parts may be slow.
Java RMI to return an object. All data is loaded into a single root object, which, when serialized, is about 200 megabytes. I'm not sure how long it will take to pass a 200meg object through RMI. (same customer).
I would have to start the server and load all the data, but this is not very important.
The main pro: recording takes less time
Start the server that loads the data and runs the Groovy script on command in the vm server. All in all, this seems like a better idea (for implementation time and performance, as well as for other long-term benefits)
What I would like to know is that other people have solved this problem?
Post-analysis (3/29/2011): a couple of months after writing this question, I had to study R in order to run some statistics. Using R was much simpler and faster for analyzing and aggregating data than what I did.
In the end, I ended up using Java to run pre-aggregation, and then did the rest in R. R was much easier to create pretty charts than using JFreeChart.
java groovy
Reverend gonzo
source share