Efficient prime storage

Question

Efficient prime storage

For the library, I need to keep the first prime numbers to the limit L. This collection should have O (1) lookup time (to check if the number is prime or not) and this should be easy, given the number, to find the next prime number ( provided that it is less than L).

Given that L is fixed, the Eratostene sieve for list generation is excellent. Right now I am using a packed Boolean array to store a list that contains only entries for odd numbers between 3 and L (inclusive). It takes up (L-2) / 2 bits of memory. I would like to be able to statically increase L without using more memory.

Is there a data structure that uses less memory with similar properties? Or at least with a constant search time? (odd numbers can then be listed until we get a prime)

(the language I wrote this is Factor , but this question will be the same in any language that has built-in or simple programmable packed bitmaps)

+21

math data-structures primes factorization

Samuel Tardieu Jun 23 '09 at 12:59

source share

9 answers

An alternative to packed bitmaps and wheels - but equally effective in certain contexts - keeps the differences between successive strokes. If you leave number 2, as usual, all differences will be even. By storing the / 2 difference, you can get up to 2 ^ 40ish areas (just before 1999066711391) using byte variables.

Prime numbers up to 2 ^ 32 require only 194 MB, compared with 256 MB for a bitmap with a coefficient only. Iterating over delta-stored numbers is much faster than for wheeled storage, which includes the modulo-2 wheel, known as a bitmap for coefficients only.

For ranges from 1999066711391 and above, a larger cell size or variable-length storage is required. The latter can be extremely effective even if very simple schemes are used (for example, continue to add until bytes <255 are added, as in LZ4 - style compression), due to the extremely low frequency of gaps longer than 510/2.

For efficiency, it is best to divide the range into sections (pages) and manage them in a B-Tree style.

Entropy coding for differences (Huffmann or arithmetic coding) reduces the need for persistent storage by just under half, which is close to the theoretical optimum and better than lists or disks compressed using the best packers available.

If the data is stored uncompressed, then it is still much more compact than binary or text number files, an order of magnitude or more. With the B-Tree style index in place, it's easy to simply map sections to memory as needed and scroll through them at an incredible speed.

+8

DarthGizka Nov 10 '14 at 17:38

source share

At the moment, you are processing 2 as a special case, and then you have an array in which every odd number is mapped to an element in the array (while some odd numbers are primary). You could improve this by considering 2 and 3 as special cases recognizing that the remaining primes are of the form 6n + 1 or 6n-1 (i.e. for all primes p, where p> 3, p mod 6 = 1 or 5 ) This can be further generalized - see Wikipedia . For all primes p> 5, p mod 30 = 1, 7, 11, 13, 17, 19, 23 or 29. You can continue to use this and reduce the amount of memory needed due to the processing time (although it will still be O (1), only slower than O (1)).

+4

David Johnstone Jun 23 '09 at 13:28

source share

Perhaps a trie data structure that contains only primes is what you are looking for. Instead of using characters as indices, you can use integer numbers. Implementation of this Judy-Array s.

In addition, they do not meet your O (1) requirement, they are extremely effective for working with similar keys (like most parts) and look pretty fast using O (m) (m = key-length) maximum.

If you look at the prime in a pre-generated tree, you can go through the tree until you find it, or you are already on the node, which is next to the previous and next simple.

+2

Kosi2801 Jun 23 '09 at 13:27

source share

Given that memory is so cheap, I don’t think you can do much better in terms of speed than your existing circuit.

If there is a better solution, I would suggest that he use Prime Number Theor , which shows that as L gets larger, the limit

π (L) / (L / ln (L)) approaches 1.

Perhaps the best solution would be to have an adaptive packaging solution in the data structure like a skip list .

0

Jeff Moser Jun 23 '09 at 13:22

source share

How about any hash table?

You will need a very good hash function (something like n mod p , where p not a multiple of any of the lower primes q - select q high enough to minimize the number of collisions).

0

jwoolard Jun 23 '09 at 13:37

source share

What about spacing tree? http://www.geeksforgeeks.org/interval-tree/

It may not be O (1), but it is very fast. For example, O (log (p (n))), where p (n) is the number of primes up to the number n. Thus, you will have the necessary memory, it will be proportional to the number of primes, which will significantly reduce the cost of memory.

For example, suppose you find a stroke in say p1 and then the next in p2, insert an interval (p1, p2), etc., and when you run a search for any number in this range, it will return that interval, and you can return p2, which will be the answer in your case.

0

Abhishek Jebaraj Dec 01 '16 at 12:10

source share

If you can figure out which ones are Mersenne or other easily presented primes, you can save a few bits using this flag representation of the applicable numbers.

Also, how about saving numbers as a difference from the previous number? Then the size should not grow as fast (but the search will be slow). Combining with the above approach, you can save Mersenne primes and the difference from the last Mersen prime.

-2

l0b0 Jun 23 '09 at 13:13

source share

Check the topcoder manual for primes: http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=math_for_topcoders

-2

user847988 Feb 15 '15 at 20:55

source share

starblue · Accepted Answer · 2009-06-23 13:21

You can explicitly check for more primes to remove redundancy.

At the moment, you are doing this only for two, checking for divisibility into two explicitly, and then storing only for odd numbers whether they are basic.

For 2 and 3, you get leftovers from 0 to 5, of which only 1 and 5 are not divisible by two or three and can lead to a prime, so you are reduced to 1/3.

For 2, 3 and 5 you get 8 out of 30 numbers, which is convenient to store in bytes.

This is explained in more detail here .

Efficient prime storage

More articles: