Collection Class Performance in Java

Everything,

I look at many sites that publish performance information for various Collection classes for various actions, that is, adding an item, searching and deleting. But I also notice that they all provide different environments in which the test was conducted, i.e. OS, memory, threads, etc.

My question is, is there any website / material that provides the same performance information based on the best test environment? that is, configurations should not be a problem or catalyst for the poor performance of any particular data structure.

[Updated]: Example: HashSet and LinkedHashSet have O (1) complexity to insert an element. However, the Bruce Eckel test claims that the insertion will take longer for the LinkedHashSet than for the HashSet [http://www.artima.com/weblogs/viewpost.jsp?thread=122295]. So should I still stick with the Big-Oh notation?

+6
java performance collections
source share
7 answers

Here are my recommendations:

  • First of all, don’t optimize :) Not that I was telling you about software development for crap, but just to focus on the design and quality of the code more than premature optimization. Assuming you have done this, and now you really need to worry about which collection is best for purely conceptual reasons, go to step 2
  • Indeed, do not optimize yet (roughly stolen from M.A. Jackson )
  • Good. Therefore, your problem is that, although you have theoretical formulas for the complexity of time for the best cases, worst cases and medium cases, you notice that people say different things, and that practical settings are a completely different thing from theory. So run your own tests! You can read so much, and while you do this, your code does not write by itself. After you finish with the theory, write your own test - for your real application, not for some non-local gadget for testing purposes, and see what really happens with your software and why. Then choose the best algorithm. This is empirical, it can be regarded as a waste of time, but this is the only way that really works flawlessly (until you reach the next point).
  • Now that you have done this, you have the fastest application. Until the next JVM update. Or any underlying component of the operating system depends on your specific performance bottleneck. Guess what? Perhaps your customers are different. This is fun: you have to be sure that your test is valid for others or in most cases (or have fun writing code for different occasions). You need to collect data from users. LOT. And then you need to do it again and again to understand what is happening, and if it is still persisting. And then rewrite your code again and again (The - now terminated - Windows 7 Engineering Blog) is actually a good example of how collecting user data helps make educated decisions to improve the user experience.

Or you can ... you know ... DO NOT optimize. Platforms and compilers will change, but good design should - on average - work quite well.

Other things you can also do:

  • Check out the JVM source code. This is very educational and you discover a herd of hidden things (I'm not saying that you should use them ...)
  • See that you need to work on what needs to be done on the TODO list? Yes, the one that is near the top, but which you always skip because it is too difficult or not fun enough. It is right there. Approach him well and leave optimization alone: ​​this is an angry child from the Pandora Box and the Moebius band. You will never get out of this, and you will deeply regret that you tried to cope with it.

Speaking , I don’t know why you need to increase productivity, so perhaps you have an important reason very .

And I'm not saying that collecting the right collection does not matter. Only those you know who choose for a particular problem, and that you looked at alternatives, then you did your job without feeling guilty. Collections usually have a semantic meaning, and as long as you respect it, everything will be all right.

+9
source share

In my opinion, all you need to know about the data structure is Big-O operations on it, and not subjective measures from different architectures. Different collections serve different purposes.

Map - Dictionaries
Set claim uniqueness
List provides grouping and iteration order preservation
Tree provide low-cost ordering and fast search across dynamically changing content requiring constant ordering

Edited to include bwawok instruction in an example of using tree structures

Update
From javadoc to LinkedHashSet

A hash table and associated list of the Set interface, with a predictable iteration order.

...

Performance is likely to be slightly lower than HashSet, due to the additional cost of maintaining a linked list, with one exception: iterating over LinkedHashSet takes time proportional to the size of the set, regardless of its capacity. HashSet iteration is likely to be more expensive, requiring time proportional to its throughput.

Now we have moved from the most general case of choosing a suitable data structure interface to a more specific implementation case. However, we still ultimately came to the conclusion that specific implementations are well suited for specific applications based on the unique, subtle invariant offered by each implementation.

+6
source share

What do you need to know about them and why? The reason that tests show that this JDK and hardware tuning is because they can (theoretically) play. What you should get from benchmarks is an idea of ​​how things will work. For the ABSOLUTE number, you will need to run it on its own code, doing its own thing.

The most important thing to know is Big O's time to perform various collections. Knowing that getting an element from an unsorted ArrayList is O (n), but getting it from a HashMap is O (1) HUGE .

If you are already using the correct selection for a given task, you are 90% there. The time when you need to worry about how fast you can, for example, receive items from the HashMap, should be quite rare.

Once you leave single-threaded land and go to multi-threaded land, you will need to start worrying about things like ConcurrentHashMap vs Collections.synchronized hashmap. Until you are multi-threaded, you just need not worry about it, and focus on which collection to use.

Upgrade to HashSet and LinkedHashSet

I have never found a use case where I need a Linked Hash Set (because if I care about the order, I have a List, if I wonder what O (1) gets, I prefer to use a HashSet. Actually, most code will use an ArrayList , HashMap or HashSet.If you need something else, you are in a "last resort" case.

+5
source share

Different collection classes have different functions for large output, but all that tells you is how they scale as they grow. If your set is large enough, then one with O (1) will be superior to one that has O (N) or O (logN), but there is no way to tell which value of N is the breakeven point, except in experiment.

As a rule, I just use the simplest possible thing, and then if it becomes a bottleneck, as evidenced by operations in this data structure that take up many percent of the time, then I will switch to something with a better big-O rating often either the number of items in a collection never approaches the breakeven point, or another simple way to solve a performance problem.

+4
source share

Both HashSet and LinkedHashSet have O (1) performance. The same with HashMap and LinkedHashMap (in fact, the first are implemented on the basis of a later one). It only talks about how these algorithms scale , and not how they actually execute. In this case, the LinkHashSet still works like a HashSet , but it must also always update the previous and next pointers to maintain order. This means that the constant (this is also important when talking about the actual performance of the algorithm) for the HashSet lower than the LinkHashSet .

Thus, since the two have the same Big-O, they scale essentially the same - that is, like the n changes, both have the same performance change, and with O (1) the performance does not change on average.

So now your choice is based on functionality and your requirements (which really should be what you consider first anyway). If you only need to quickly add and receive operations, you should always choose a HashSet . If you also need a sequential order - for example, last access or placement order, then you should also use the version of the Linked ... class.

I used the "linked" class in production applications, well LinkedHashMap . I used this in one case for a character similar to a table, so you need quick access to the characters and related information. But I also wanted to display information in at least one context in the order in which the user defined these characters (insertion order). This makes the conclusion more user friendly, as they can find things in the same order in which they were defined.

+1
source share

If I had to sort millions of lines, I would try to find another way. Perhaps I could improve my SQL, improve my algorithm, or perhaps write elements to disk and use the sort command of the operating system.

I have never had a case where collections in which the reason for my performance arises.

0
source share

I created my own experimentation using HashSets and LinkedHashSets. For add () and contains O (1) runtime, without taking into account many collisions. In the add () method for the associated hashset, I put the object in a user-created hash table, which is O (1), and then puts the object in a separate linked list in the account for the order. So the runtime, to remove an item from the linked set, you have to find the item in the hash table, and then search the linked list, which has order. So the runtime is O (1) + O (n), which is o (n) for remove ()

0
source share

All Articles