Why does the Java ArrayList delete function seem to cost so little?

I have a function that manages a very large list exceeding about 250,000 items. For most of these elements, it simply replaces the element at position x. However, about 5% of them should remove them from the list.

Using LinkedList seemed to be the most obvious solution to avoid costly exemptions. However, naturally, accessing LinkedList by index is becoming slower over time. The cost here is minutes (and many of them).

Using Iterator over this LinkedList is also expensive, since it seems to me that a separate copy is needed to avoid problems with Iterator concurrency when editing this list. The cost here is minutes.

However, here, where my mind is a little blown up. If I switch to ArrayList, it starts almost instantly.

For a list with 297,515 items, removing 11,958 items and changing everything else takes 909 ms. I confirmed that the resulting list really has a size of 285557, as expected, and contains the updated information that I need.

Why is it so fast? I looked at the source for ArrayList in JDK6 and it seems to use the arraycopy function as expected. I would like to understand why ArrayList works so well when common sense seems to indicate that an array for this task is a terrible idea that requires shifting several hundred thousand elements.

+50
java performance optimization arraylist
May 23 '11 at 19:14
source share
5 answers

I checked the benchmark by trying each of the following list item filtering strategies:

  • Copy the items you need to the new list.
  • Use Iterator.remove() to remove unwanted elements from an ArrayList
  • Use Iterator.remove() to remove unwanted elements from LinkedList
  • Compact list in place (moving the desired items to the bottom positions)
  • Delete by index ( List.remove(int) ) on ArrayList
  • Delete by index ( List.remove(int) ) on LinkedList

Each time I populate the list with 100,000 random instances of Point and use a filter condition (based on the hash code) that would take 95% of the elements and reject the remaining 5% (the same proportion indicated in the question, but with a smaller list, because I did not have time to run a test on 250,000 items.)

And average time (on my old MacBook Pro: Core 2 Duo, 2.2 GHz, 3Gb RAM):

 CopyIntoNewListWithIterator : 4.24ms CopyIntoNewListWithoutIterator: 3.57ms FilterLinkedListInPlace : 4.21ms RandomRemoveByIndex : 312.50ms SequentialRemoveByIndex : 33632.28ms ShiftDown : 3.75ms 

Thus, removing items by index from LinkedList was more than 300 times more expensive than removing them from ArrayList , and probably somewhere between 6000-10000 times more expensive than other methods (which avoid linear searching and arraycopy )

There is not much difference between the four faster methods, but I ran again with these four with a 500,000 element list with the following results:

 CopyIntoNewListWithIterator : 92.49ms CopyIntoNewListWithoutIterator: 71.77ms FilterLinkedListInPlace : 15.73ms ShiftDown : 11.86ms 

I assume that with a large amount of cache memory becomes a limiting factor, so the cost of creating a second copy of the list becomes significant.

Here is the code:

 import java.awt.Point; import java.security.SecureRandom; import java.util.ArrayList; import java.util.Arrays; import java.util.Collection; import java.util.Iterator; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Random; import java.util.TreeMap; public class ListBenchmark { public static void main(String[] args) { Random rnd = new SecureRandom(); Map<String, Long> timings = new TreeMap<String, Long>(); for (int outerPass = 0; outerPass < 10; ++ outerPass) { List<FilterStrategy> strategies = Arrays.asList(new CopyIntoNewListWithIterator(), new CopyIntoNewListWithoutIterator(), new FilterLinkedListInPlace(), new RandomRemoveByIndex(), new SequentialRemoveByIndex(), new ShiftDown()); for (FilterStrategy strategy: strategies) { String strategyName = strategy.getClass().getSimpleName(); for (int innerPass = 0; innerPass < 10; ++ innerPass) { strategy.populate(rnd); if (outerPass >= 5 && innerPass >= 5) { Long totalTime = timings.get(strategyName); if (totalTime == null) totalTime = 0L; timings.put(strategyName, totalTime - System.currentTimeMillis()); } Collection<Point> filtered = strategy.filter(); if (outerPass >= 5 && innerPass >= 5) { Long totalTime = timings.get(strategyName); timings.put(strategy.getClass().getSimpleName(), totalTime + System.currentTimeMillis()); } CHECKSUM += filtered.hashCode(); System.err.printf("%-30s %d %d %d%n", strategy.getClass().getSimpleName(), outerPass, innerPass, filtered.size()); strategy.clear(); } } } for (Map.Entry<String, Long> e: timings.entrySet()) { System.err.printf("%-30s: %9.2fms%n", e.getKey(), e.getValue() * (1.0/25.0)); } } public static volatile int CHECKSUM = 0; static void populate(Collection<Point> dst, Random rnd) { for (int i = 0; i < INITIAL_SIZE; ++ i) { dst.add(new Point(rnd.nextInt(), rnd.nextInt())); } } static boolean wanted(Point p) { return p.hashCode() % 20 != 0; } static abstract class FilterStrategy { abstract void clear(); abstract Collection<Point> filter(); abstract void populate(Random rnd); } static final int INITIAL_SIZE = 100000; private static class CopyIntoNewListWithIterator extends FilterStrategy { public CopyIntoNewListWithIterator() { list = new ArrayList<Point>(INITIAL_SIZE); } @Override void clear() { list.clear(); } @Override Collection<Point> filter() { ArrayList<Point> dst = new ArrayList<Point>(list.size()); for (Point p: list) { if (wanted(p)) dst.add(p); } return dst; } @Override void populate(Random rnd) { ListBenchmark.populate(list, rnd); } private final ArrayList<Point> list; } private static class CopyIntoNewListWithoutIterator extends FilterStrategy { public CopyIntoNewListWithoutIterator() { list = new ArrayList<Point>(INITIAL_SIZE); } @Override void clear() { list.clear(); } @Override Collection<Point> filter() { int inputSize = list.size(); ArrayList<Point> dst = new ArrayList<Point>(inputSize); for (int i = 0; i < inputSize; ++ i) { Point p = list.get(i); if (wanted(p)) dst.add(p); } return dst; } @Override void populate(Random rnd) { ListBenchmark.populate(list, rnd); } private final ArrayList<Point> list; } private static class FilterLinkedListInPlace extends FilterStrategy { public String toString() { return getClass().getSimpleName(); } FilterLinkedListInPlace() { list = new LinkedList<Point>(); } @Override void clear() { list.clear(); } @Override Collection<Point> filter() { for (Iterator<Point> it = list.iterator(); it.hasNext(); ) { Point p = it.next(); if (! wanted(p)) it.remove(); } return list; } @Override void populate(Random rnd) { ListBenchmark.populate(list, rnd); } private final LinkedList<Point> list; } private static class RandomRemoveByIndex extends FilterStrategy { public RandomRemoveByIndex() { list = new ArrayList<Point>(INITIAL_SIZE); } @Override void clear() { list.clear(); } @Override Collection<Point> filter() { for (int i = 0; i < list.size();) { if (wanted(list.get(i))) { ++ i; } else { list.remove(i); } } return list; } @Override void populate(Random rnd) { ListBenchmark.populate(list, rnd); } private final ArrayList<Point> list; } private static class SequentialRemoveByIndex extends FilterStrategy { public SequentialRemoveByIndex() { list = new LinkedList<Point>(); } @Override void clear() { list.clear(); } @Override Collection<Point> filter() { for (int i = 0; i < list.size();) { if (wanted(list.get(i))) { ++ i; } else { list.remove(i); } } return list; } @Override void populate(Random rnd) { ListBenchmark.populate(list, rnd); } private final LinkedList<Point> list; } private static class ShiftDown extends FilterStrategy { public ShiftDown() { list = new ArrayList<Point>(); } @Override void clear() { list.clear(); } @Override Collection<Point> filter() { int inputSize = list.size(); int outputSize = 0; for (int i = 0; i < inputSize; ++ i) { Point p = list.get(i); if (wanted(p)) { list.set(outputSize++, p); } } list.subList(outputSize, inputSize).clear(); return list; } @Override void populate(Random rnd) { ListBenchmark.populate(list, rnd); } private final ArrayList<Point> list; } } 
+24
May 23 '11 at 21:10
source share

Copying an array is a fairly low operation. This is done at a very basic level (your own static Java method), and you have not yet reached the range when performance becomes really important.

In your example, you copy about 12,000 times an array of size 150,000 (on average). It does not take much time. I tested it here on my laptop and it took less than 500 ms.

Update . I used the following code to measure on my laptop (Intel P8400).

 import java.util.Random; public class PerformanceArrayCopy { public static void main(String[] args) { int[] lengths = new int[] { 10000, 50000, 125000, 250000 }; int[] loops = new int[] { 1000, 5000, 10000, 20000 }; for (int length : lengths) { for (int loop : loops) { Object[] list1 = new Object[length]; Object[] list2 = new Object[length]; for (int k = 0; k < 100; k++) { System.arraycopy(list1, 0, list2, 0, list1.length); } int[] len = new int[loop]; int[] ofs = new int[loop]; Random rnd = new Random(); for (int k = 0; k < loop; k++) { len[k] = rnd.nextInt(length); ofs[k] = rnd.nextInt(length - len[k]); } long n = System.nanoTime(); for (int k = 0; k < loop; k++) { System.arraycopy(list1, ofs[k], list2, ofs[k], len[k]); } n = System.nanoTime() - n; System.out.print("length: " + length); System.out.print("\tloop: " + loop); System.out.print("\truntime [ms]: " + n / 1000000); System.out.println(); } } } } 

Some results:

 length: 10000 loop: 10000 runtime [ms]: 47 length: 50000 loop: 10000 runtime [ms]: 228 length: 125000 loop: 10000 runtime [ms]: 575 length: 250000 loop: 10000 runtime [ms]: 1198 
+16
May 23 '11 at 19:33
source share

I think the performance difference probably comes down to the difference that ArrayList supports random access if LinkedList doesn't.

If I want to get (1000) an ArrayList, I specify a specific index to access it, however LinkedList does not support this, because it is organized using Node links.

If I call get (1000) LinkedList, it will iterate over the whole list until it finds index 1000, and this can be overly expensive if you have a large number of items in LinkedList.

+9
May 23 '11 at 19:22
source share

Interesting and unexpected results. This is just a hypothesis, but ...

On average, one of the elements of your removal of array elements will require moving half of your list (everything after it) back to one element. If each element is a 64-bit pointer to an object (8 bytes), then this means that you are copying 125,000 elements x 8 bytes to a pointer = 1 MB.

A modern processor can quickly copy an adjacent block with a memory capacity of 1 MB to RAM.

Compared to a linked list loop, for each access that requires comparisons and branches, and other unfriendly CPU actions, a copy of RAM is fast.

You should really try to do a comparative analysis of the various operations yourself and see how effective they are with different list implementations. Share your results here if you do!

+5
May 23 '11 at 19:25
source share

I will skip some of the details of the goal here, just to explain the fundamental difference.

To remove the Nth element from the list of M elements, the LinkedList implementation will move to that element, and then simply remove it and update the pointers of the N-1 and N + 1 elements, respectively. This second operation is very simple, but it is suitable for this element, which costs you time.

However, for ArrayList, access time is instantaneous since it is supported by an array, which means contiguous memory spaces. You can go directly to the desired memory address in general to do the following:

  • redistribute a new array of elements M - 1
  • put everything from 0 to N - 1 with index 0 in a new array of arrays
  • put all N + 1 in M ​​at index N into an array of arrays.

As you reflect on this, you will notice that you can even reuse the same array as Java, which ArrayList can use with predefined sizes, so if you delete the elements, you can also skip steps 1 and 2 and directly do step 3 and update your size.

Access to the memory is fast, and copying the block of memory is probably fast enough on modern hardware that switching to the N-position is too time-consuming.

However, if you use your LinkedList in such a way that it removes several elements that follow each other and track your position, you will see a win.

But it is clear that on a long list, performing a simple deletion (i) will be expensive.




To add salt of salt and spices to it:

+5
May 23 '11 at 19:37
source share



All Articles