Memory Effectiveness Issues (Collatz Engraving Sequence)

I was especially interested in the last few days (more from an algorithmic than a mathematical point of view) when examining the length of a given number. Gradient sequence ( Collatz hypothesis ). Implementing a recursive algorithm is probably the easiest way to calculate the length, but it seemed to me that this was an unnecessary waste of computation time. Many sequences overlap; take for example 3 Hailstone sequences:

3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

It has a length of 7; more specifically, achieving 7 requires 7 operations. If we take 6:

6 -> 3 -> ...

We will immediately notice that we have already calculated this, so we simply add the length of the sequence 3 instead of repeating all of these numbers, significantly reducing the number of operations needed to calculate the length of the sequence of each number.

I tried to implement this in Java using a HashMap (seemed appropriate given the probabilistic complexity of get / put O (1)):

 import java.util.HashMap; /* NOTE: cache.put(1,0); is called in main to act as the * 'base case' of sorts. */ private static HashMap<Long, Long> cache = new HashMap<>(); /* Returns length of sequence, pulling prerecorded value from * from cache whenever possible, and saving unrecorded values * to the cache. */ static long seqLen(long n) { long count = 0, m = n; while (true) { if (cache.containsKey(n)) { count += cache.get(n); cache.put(m, count); return count; } else if (n % 2 == 0) { n /= 2; } else { n = 3*n + 1; } count++; } } 

What seqLen will essentially do starts with a given number and works through that Hailstone sequence number until it encounters the number already in the cache , in which case it will add this to the current count value, and then register the value and the corresponding length of the sequence in the HashMap as a pair (key,val) .

I also had the following pretty standard recursive algorithm for comparison:

 static long recSeqLen(long n) { if (n == 1) { return 0; } else if (n % 2 == 0) { return 1 + recSeqLen(n / 2); } else return 1 + recSeqLen(3*n + 1); } 

The logging algorithm should, by all accounts, work rather quickly than the naive recursive method. However, in most cases it does not work so fast, and for large inputs it works more slowly. Executing the following code gives times that change significantly as the size of n changes:

 long n = ... // However many numbers I want to calculate sequence // lengths for. long st = System.nanoTime(); // Iterative logging algorithm for (long i = 2; i < n; i++) { seqLen(i); } long et = System.nanoTime(); System.out.printf("HashMap algorithm: %d ms\n", (et - st) / 1000000); st = System.nanoTime(); // Using recursion without logging values: for (long i = 2; i < n; i++) { recSeqLen(i); } et = System.nanoTime(); System.out.printf("Recusive non-logging algorithm: %d ms\n", (et - st) / 1000000); 
  • n = 1,000 : ~ 2ms for both algorithms
  • n = 100,000 : ~ 65 ms for Iterative logging, ~ 75 ms for recursive non-logging
  • n = 1,000,000 : ~ 500 ms and ~ 900 ms
  • n = 10,000,000 : ~ 14,000 ms and ~ 10,000 ms

At higher values, I get memory errors, so I can’t check if the pattern continues.

So my question is: why does the registration algorithm suddenly start taking longer than the naive recursive algorithm for large values ​​of n?


EDIT:

Destroying HashMaps in general and choosing a simple array structure (as well as deleting part of the service data to check whether the value is in the array or not) gives the desired efficiency:

 private static final int CACHE_SIZE = 80000000; private static long[] cache = new long[CACHE_SIZE]; static long seqLen(long n) { int count = 0; long m = n; do { if (n % 2 == 0) { n /= 2; } else { n = 3*n + 1; } count++; } while (n > m); count += cache[(int)n]; cache[(int)m] = count; return count; } 

Iterating over the entire cache size (80 million) now only takes 3 seconds, as opposed to 93 seconds using a recursive algorithm. The HashMap algorithm causes a memory error, so it cannot even be compared, but given its behavior at lower values, I have the feeling that it does not compare well.

+4
source share
1 answer

With the cuff, I guess it spends a lot of time redistributing the hash map. It looks like you start it empty and keep adding material to it. This means that as the size grows, he will need to allocate most of the memory to store your data and recalculate the hash for all elements, which is O (N). Try to pre-select the size that you expect from it. See https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html for more details.

+1
source

All Articles