Insert Java collection: set against list

I am thinking of filling the collection with a large number of unique objects. How much does the cost of inserting in a set (like a HashSet) compare to a list (say ArrayList)?

I feel that re-eliminating in sets can cause a bit of overhead.

+8
java collections list set insert
source share
7 answers

There is no "duplication of exception", for example, compared to all existing elements. If you paste into a hash set, this is really a hash dictionary of items. There is no re-checking if there are already no elements with the same hash code. Given a reasonable (well-distributed) hash function, this is not so bad.

As Will noted, due to the structure of the HashSet dictionary, itโ€™s probably a bit slower than an ArrayList (unless you want to insert โ€œbetweenโ€ existing elements). It is also a bit more. I am not sure what a significant difference.

+10
source share

If you are sure that your data will be unique, use the List. You can use Set to enforce this rule.

Sets are faster than lists if you have a large data set, and the opposite is true for small data sets. I have not personally verified this statement.

What type of list?
Also, consider which list to use. LinkedLists are faster at adding, deleting elements.

ArrayLists are faster on random access ( for loops, etc.), but this can be circumvented using the Iterator LinkedList. ArrayLists are much faster: list.toArray() .

+4
source share

You are right: set structures are inherently more complex to recognize and eliminate duplicates. Regardless of whether this overhead is significant for your case, test it with a benchmark.

Another factor is memory usage. If your objects are very small, the memory overhead introduced by the established structure can be significant. In the most extreme case ( TreeSet<Integer> vs. ArrayList<Integer> ), the set structure may require more than 10 times the amount of memory.

+3
source share

If the goal is to make the elements unique, you should use the java.util.Set interface implementation. The java.util.HashSet and java.util.LinkedHashSet classes have O (alpha) (close to O (1) at best) complexity to insert, delete, and contain validation.

ArrayList has O (n) for an object (not an index) that contains a check (you need to scroll through the whole list) and an insert (if the insert is not at the tail of the list, you need to shift the entire underline of the array).

You can use LinkedHashSet , which maintains the order of insertion and have the same HashSet feature (occupy only a little more memory).

+2
source share

You need to compare specific implementations (e.g. a HashSet with an ArrayList ), because the abstract Set / List interfaces really don't say anything about performance.

Pasting into a HashSet is a fairly cheap operation if the hashCode() object to be inserted is normal. It will still be slightly slower than an ArrayList , because inserting it is a simple insertion into the array (assuming you insert at the end and there is still free space, I do not take into account the resizing of the internal array, since the same costs apply to the HashSet )

+1
source share

I do not think that you can make this judgment simply at the cost of creating a collection. Other things to consider:

  • Is the input dataset ordered? Is there a requirement that the output structure maintain the insertion order?
  • Is there a requirement that the structure of the output be ordered (or reordered) based on the values โ€‹โ€‹of the elements?
  • Will the structure of the output be subsequently changed? How?
  • Is there a requirement that the output structure be duplicated if other elements are subsequently added?
  • Do you know how many elements are likely to be in the input dataset?
  • Can you measure the size of the input dataset? (Or provided via an iterator?)
  • Does using space make sense?

All of this can affect your choice of data structure.

+1
source share

Java List:

If you do not have such a requirement, you should duplicate it or not. Then you can use List instead of Set.

A list is an interface within a collection. Which extends the Collection interface. and ArrayList, LinkedList is an implementation of the List interface.

When to use ArrayList or LinkedList

ArrayList:. If you have such a requirement, your data access mainly works in your application. Then you should go to ArrayList. because ArrayList implements the RtandomAccess interface, which is the token interface. due to the Marker interface, ArrayList has the ability to access data O (1) times. and you can use ArrayList for LinkedList where you want to get the data according to the insertion order.

LinkedList: If you have such a requirement that your main job is to insert or delete. Then you should use LinkedList over ArrayList. because in LinkedList insertion and deletion occur O (1) times, whereas in ArrayList it is O (n) time.

Java Set:

If you have a requirement in your application that you do not need duplicates. Then you should go to Set instead of List. Because Set does not store duplicates. Because Set works on the principle of Hashing. If we add an object to Set, then first it checks the hashCode object in the bucket, if it finds any hashCode present in it, bucked, then it will not add this object.

0
source share

All Articles