In Java (1.5 or later), what is the best way to get (any) element from a collection?

In the code below, I needed to get an element, any element, from toSearch. I could not find a useful method for defining the definition of Set to return only one (random, but not necessarily random) element of the set. So, I used the toArray () [0] technique (presented in the code below).

private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart) { Set<Coordinate> result = new LinkedHashSet<Coordinate>(); Set<Coordinate> toSearch = new LinkedHashSet<Coordinate>(); toSearch.add(coordinateStart); while (toSearch.size() > 0) { Coordinate coordinate = (Coordinate)toSearch.toArray()[0]; result.add(coordinate); toSearch.remove(coordinate); for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate)) { if (this.query.getCoordinateValue(coordinateAdjacent) == value) { if (!result.contains(coordinateAdjacent)) { toSearch.add(coordinateAdjacent); } } } } return result; } 

Another way I've seen is to replace " (Coordinate) toSearch.toArray () [0] " with " toSearch.iterator (). Next () ". Which technique, toArray () or iterator (), is most likely to run most quickly with minimal GC (Garbage Collection) effect?

My intuition (after compiling this question) is that the second method using Iterator will both execute faster and reduce overhead for GC. Given that I don’t know the implementation of the passed set (suppose that a HashSet or LinkedHashSet is most likely), how much overhead comes from each of the toArray () or iterator () methods? Any ideas on this would be greatly appreciated.

Questions (repeated from above):

  • Which method, toArray () or iterator (), is most likely to perform the fastest actions with minimal impact of GC (Garbage Collection)?
  • Given that I do not know the implementation of the transferred set (suppose that a HashSet or LinkedHashSet is most likely), how much overhead is incurred in each of the toArray () and iterator () methods?
+8
java performance iterator set toarray
source share
5 answers

toSearch.iterator().next() will be faster and less memory intensive since it does not need to copy any data, while toArray will select and copy the contents of the set into an array. This is independent of the actual implementation: toArray will always need to copy data.

+9
source share

From what I see, you do a quick search on the heels

The following is an example of how it can be implemented without using toArray:

  private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart) { final Set<Coordinate> visitedCoordinates = new LinkedHashSet<Coordinate>(); final Deque<Coordinate> deque = new ArrayDeque<Coordinate>(); deque.push(coordinateStart); while (!deque.isEmpty()) { final Coordinate currentVertex = deque.poll(); visitedCoordinates.add(currentVertex); for (Coordinate coordinateAdjacent : getAdjacentCoordinates(currentVertex)) { if (this.query.getCoordinateValue(coordinateAdjacent) == value) { if (!visitedCoordinates.contains(coordinateAdjacent)) { deque.add(coordinateAdjacent); } } } } return visitedCoordinates; } 

Implementation notes:

And now I'm worried that the implementation of the contains () method in LinkedList might do a full content check before returning a response.

You are right about a full scan (aka linear search). Nevertheless, in your case it is possible to have an additional set for tracking the peaks already visited (btw, this is actually your result!), Which would solve the problem with contains the method O (1) times.

Greetings

+1
source share

Here's how to implement it:

 private Set<Coordinate> floodFill(Value value, Coordinate start) { Set<Coordinate> result = new LinkedHashSet<Coordinate>(); LinkedList<Coordinate> toSearch = new LinkedList<Coordinate>(); toSearch.add(start); do { Coordinate coordinate = toSearch.removeFirst(); if (result.add(coordinate)) { for (Coordinate ajacent: getAdjacentCoordinates(coordinate)) { if (this.query.getCoordinateValue(adjacent) == value) { toSearch.add(adjacent); } } } } while (!toSearch.isEmpty()); return result; } 

Notes:

  • If you think about it, the toSearch data structure toSearch not contain unique elements.
  • Using LinkedList for toSearch means that there is an easy way to get an item and delete it at a time.
  • We can use the fact that Set.add(...) returns boolean to have the number of searches in the result set ... compared to using Set.contains() .
  • It is better to use a HashSet instead of a LinkedHashSet for the results ... if you do not need to know the order in which the coordinates were added by the fill.
  • Using == to compare instances of Value potentially a bit dodgy.
+1
source share

After Peter's answer, I copied the method and repeated it in accordance with his suggestions. It looks like this:

 private Set<Coordinate> floodFind2(Value value, Coordinate coordinateStart) { Set<Coordinate> result = new LinkedHashSet<Coordinate>(); Queue<Coordinate> toSearch = new LinkedList<Coordinate>(); toSearch.add(coordinateStart); while (!toSearch.isEmpty()) { Coordinate coordinate = toSearch.remove(); result.add(coordinate); for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate)) { if (getCoordinateValue(coordinateAdjacent).equals(value)) { if (!result.contains(coordinateAdjacent)) { if (!toSearch.contains(coordinateAdjacent)) { toSearch.add(coordinateAdjacent); } } } } } return result; } 

Moving from Set to Queue, my efficiency questions switched to a new conditional check that I had to add: " if (! ToSearch.contains (coordAdjacent)) . Using the Set interface, he silently stopped me from adding duplicates. Using the Queue interface, I have to check that I do not add duplicate.

And now I'm worried that the implementation of the contains () method in LinkedList might do a full content check before returning a response. So, comparing this method with the one I originally posted, which is likely to be more efficient (before I go to spend a lot of time on empirical testing)?

0
source share

So, below my last implementation includes feedback (mainly from Stephen, Cameron and Petro), which includes the complete resolution of the conflict toArray () [] - vs-interator (). next (). And I sprinkled with comments to more accurately discern what is happening and why. And to better clarify why I specifically followed Petro's initial tip, “Use the Track Set” (seconded by Cameron). And right after the code snippet I will compare it with other proposed solutions.

 private Set<Coordinate> floodFind3(Coordinate coordinate) { Set<Coordinate> area = new LinkedHashSet<Coordinate>(); //includes only area of value (which is the same as at coordinate) area.add(coordinate); Value value = getCoordinateValue(coordinate); //value upon which to expand area Set<Coordinate> checked = new LinkedHashSet<Coordinate>(); //every coordinate evaluated regardless of value checked.add(coordinate); Queue<Coordinate> candidates = new LinkedList<Coordinate>(); //coordinates evaluated, were of value, and are queued to iterate through their adjacents candidates.add(nordinate); while (!candidates.isEmpty()) { for (Nordinate coordinateAdjacent: this.query.getNordinates().getAdjacent(candidates.remove()).getOrthogonal()) { if (checked.add(coordinateAdjacent)) //only expands containing value and !value { if (getCoordinateValue(coordinateAdjacent) == value) { area.add(coordinateAdjacent); //only expands containing value candidates.add(coordinateAdjacent); //expands and contracts containing value } } } } return area; } 

I updated the method in several significant ways:

  • Another method parameter: I removed the parameter because it was inferred from the search, and fixed a possible logical problem when the starting coordinate points to the location containing the value.
  • The third collection tracks the search; area (Set), marked (Set) and candidates (Queue). Code comments clarify the specific use of each. Used by LinkedHashSet for reliable reproducibility when checking for errors and performance problems (http://stackoverflow.com/questions/2704597/iteration-order-of-hashset). After stable operation, I will most likely return to a faster implementation of HashSet.
  • Reordered test "check if it has already been evaluated" before the test "is the value" to visit each coordinate exactly once. This avoids re-viewing! Shift adjacent coordinate values ​​more than once. Also included is Stehen's clever dual use of the Set add () method. This becomes very important as the area for flooding becomes more labyrinth (snake / spider).
  • Hold "==" to check the value, forcing comparison of links. The value is defined as Java 1.5 Enum, and I did not want to depend on HotSpot both for the inline method of the .equals () method and for comparing it. If the value always changed from Enum, this choice could come back to bite me. Tivm to steven for that.

Petro and Stephan solutions visit the coordinates containing the value only once, but require a second revision of the coordinates containing the value! more than once, which can lead to quite a lot of repeating checks / values ​​for areas consisting of long labyrinth tunnels. Although “long labyrinth tunnels” can be seen as a pathological case, this is more typical of a particular domain for which I need this method. And my “2nd” solution (having a bad call to LinkedList contains ()) was questionable, like a real answer ({nod} for Stephen on this).

Thank you for all your feedback.

Further, there are many empirical tests with individual changes / changes of more than hundreds of millions of calls. This weekend I will clarify this question with details.

0
source share

Source: https://habr.com/ru/post/649836/


All Articles