I was especially interested in the last few days (more from an algorithmic than a mathematical point of view) when examining the length of a given number. Gradient sequence ( Collatz hypothesis ). Implementing a recursive algorithm is probably the easiest way to calculate the length, but it seemed to me that this was an unnecessary waste of computation time. Many sequences overlap; take for example 3 Hailstone sequences:
3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
It has a length of 7; more specifically, achieving 7 requires 7 operations. If we take 6:
6 -> 3 -> ...
We will immediately notice that we have already calculated this, so we simply add the length of the sequence 3 instead of repeating all of these numbers, significantly reducing the number of operations needed to calculate the length of the sequence of each number.
I tried to implement this in Java using a HashMap (seemed appropriate given the probabilistic complexity of get / put O (1)):
import java.util.HashMap; private static HashMap<Long, Long> cache = new HashMap<>(); static long seqLen(long n) { long count = 0, m = n; while (true) { if (cache.containsKey(n)) { count += cache.get(n); cache.put(m, count); return count; } else if (n % 2 == 0) { n /= 2; } else { n = 3*n + 1; } count++; } }
What seqLen will essentially do starts with a given number and works through that Hailstone sequence number until it encounters the number already in the cache , in which case it will add this to the current count value, and then register the value and the corresponding length of the sequence in the HashMap as a pair (key,val) .
I also had the following pretty standard recursive algorithm for comparison:
static long recSeqLen(long n) { if (n == 1) { return 0; } else if (n % 2 == 0) { return 1 + recSeqLen(n / 2); } else return 1 + recSeqLen(3*n + 1); }
The logging algorithm should, by all accounts, work rather quickly than the naive recursive method. However, in most cases it does not work so fast, and for large inputs it works more slowly. Executing the following code gives times that change significantly as the size of n changes:
long n = ... // However many numbers I want to calculate sequence // lengths for. long st = System.nanoTime(); // Iterative logging algorithm for (long i = 2; i < n; i++) { seqLen(i); } long et = System.nanoTime(); System.out.printf("HashMap algorithm: %d ms\n", (et - st) / 1000000); st = System.nanoTime(); // Using recursion without logging values: for (long i = 2; i < n; i++) { recSeqLen(i); } et = System.nanoTime(); System.out.printf("Recusive non-logging algorithm: %d ms\n", (et - st) / 1000000);
n = 1,000 : ~ 2ms for both algorithmsn = 100,000 : ~ 65 ms for Iterative logging, ~ 75 ms for recursive non-loggingn = 1,000,000 : ~ 500 ms and ~ 900 msn = 10,000,000 : ~ 14,000 ms and ~ 10,000 ms
At higher values, I get memory errors, so I canβt check if the pattern continues.
So my question is: why does the registration algorithm suddenly start taking longer than the naive recursive algorithm for large values ββof n?
EDIT:
Destroying HashMaps in general and choosing a simple array structure (as well as deleting part of the service data to check whether the value is in the array or not) gives the desired efficiency:
private static final int CACHE_SIZE = 80000000; private static long[] cache = new long[CACHE_SIZE]; static long seqLen(long n) { int count = 0; long m = n; do { if (n % 2 == 0) { n /= 2; } else { n = 3*n + 1; } count++; } while (n > m); count += cache[(int)n]; cache[(int)m] = count; return count; }
Iterating over the entire cache size (80 million) now only takes 3 seconds, as opposed to 93 seconds using a recursive algorithm. The HashMap algorithm causes a memory error, so it cannot even be compared, but given its behavior at lower values, I have the feeling that it does not compare well.