More efficient than a double-nested ArrayList?

I am creating a back-end Java component that processes a moderate amount of data every day. We have a POJO, let's call it Widget , it has about 10 properties on it. My software needs to process groups from Widget lists: essentially there are other processes (completely different systems) that combine their own List<Widget> and then send them to my software. My software actually gets a workaround POJO that looks like this:

 public class Payload { private List<Widget> widgets; // <-- what I want private String guid; // GUID; my software doesn't need this private boolean fizz; // again, my software doesn't need this ... many other properties that I don't care about } 

My software combines all of these List<Widget> , each of which is created by different systems, and then processes them together in one large batch.

I previously selected ArrayList<ArrayList<Widget>> as the data structure for this batch of Widget lists. There will be about 500,000 List<Widget> groups (external ArrayList ), and each List<Widget> will have about 5 Widget each; for a total of ~ 2.5 million Widget in the internal ArrayList .

In a recent code review, some technical leaders told me that I chose the wrong data structure for this batch widget. They told me that I had to use HashMap<String,List<Widget>> because it is more efficient and easier to work with. The hashmap key is the GUID contained in Payload , which is provided by my software. Not that I need a GUID for any reason, it just serves as a key to keep ~ 500,000 List<Widget> individual - what I need to do.

It made me wonder: who is right?!? The only operations we do in this data structure are “add” (in the case of ArrayList , just adding a Widget or List<Widget> via add(...) ) and then “reading” (I have to go through my software through each Widget and check it for a subject.

 for(List<Widget> widgetList : myDoublyNestedArrayOfWidgets) { for(Widget widget : widgetList) { ... } } 

These are the only operations we need: add the scattered List<Widget> to some large “batch” data structure, and then at a later time examine them all and do everything with each Widget . This software runs on some amplified servers with lots of memory and processing power.

So I ask: ** Is ArrayList<ArrayList<Widget>> right choice, HashMap<String,List<Widget>> or something else ... and why?

+4
source share
6 answers

So I ask: is ArrayList<ArrayList<Widget>> right choice, HashMap<String,List<Widget>> or something else ... and why?

At the very end, it is important that your software solves the problem that it must solve.

A HashMap is more expensive than an ArrayList, and if you don't need access to data with a key, an ArrayList is probably the best choice. Also, the code you need to write for processing seems simpler and more efficient when using an ArrayList.

By the way, the presence of ArrayList<ArrayList<Widget>> or HashMap<String,List<Widget>> smells a bit. Perhaps what you are modeling is an ArrayList<WidgetGroup> , and the WidgetGroup contains a List<Widget> (with all the other properties that you might not need at the moment). But if your WidgetGroup just contains an ArrayList, don't introduce this new class (keep it simpler).

It made me wonder: who is right?!?

Between your decision and your reviewer, I personally prefer yours.

But you can keep it for yourself and follow the "technological conclusions". If this is their role, it is a decision that matters, and their responsibility to provide these decisions. (And the guy who pays your checks is always right)

+3
source

There is a noun that you continue to use but is missing from your data model: Package . If you really care about keeping them in your batches and keeping your code readable, encapsulate them in a package class:

  class Batch {
     String guid;
     List & ltWidget> widgets;
 }

And if you don't need lots, then could you just smooth them all into one List<Widget> ?

+2
source

A hash map is not more efficient or easier to work than a list of arrays. The change may be justified if at some point you need to find the package using its GUID key.

A hash map is less efficient than a list of arrays, since resizing means the need to reassess the hash codes and reallocate the data to fairly random memory locations. On the other hand, resizing the array copies the contents from the old array to the new one linearly, which is much more convenient for the processor cache.

Using a hash map is not easy. To access the records, you need to go through a set of map records that violates the law of Demeter .

+1
source

Perhaps a built-in (built-in) database is what you finally want. Another possibility is something like JavaSpaces / NoSQL, the decoupling of delivery and processing. It depends.

0
source

It is clear from your question that you are doing this.

  • Read the data.
  • Add additional widgets.

The question is, how will changing the data structure from ArrayList<ArrayList<Widget>> to HashMap<String,List<Widget>> above the two actions.

1) Read : you have grouped them into 4 groups, therefore, using hashmap , you will store your groups using hashing, which really does not make sense for a small data set (groups in your case), so there is no need to use hashmap here.

2) Adding additional widgets : you will get access to the list that you are going to add, so again you are not reading. This does not stop using ArrayListObj.get(index) .

Now using ArrayList will always read widgets in sequence. What will not be done with hashmap , but in any case, I don’t think it’s your concern or is it ? hashmap

0
source

A hashmap would be more efficient if you had to access internal lists randomly, and the code using the hash map looked more elegant for reviewers who break out in the hives when they see nested loops. But, if you need to iterate over and visit each node, you will not do better than On ^ 2. You can type them into the database, but it will bring you nothing but complexity. It is more elegant as a hash. Of course, all of this assumes that you have a memory to hold all 2.5 million widgets at once. If you need to do this, then some kind of SQL DB or NoSQL will probably be better.

0
source

All Articles