Counting a number in a sequence

Question

Counting a number in a sequence

Having a sequence of n <= 10 ^ 6 integers, all not exceeding m <= 3 * 10 ^ 6, I would like to calculate how many pairs are copied in it. Two numbers are coprime if their greatest common divisor is 1.

This can be done trivially in O (n ^ 2 log n), but this is obviously a way to slow down, since the limit offers something closer to O (n log n). One thing that can be done quickly is the factorization of all numbers, as well as throwing out a lot of identical primes in each of them, but this does not lead to a significant improvement. I also thought about counting the opposite - pairs that have a common divisor. This can be done in groups - firstly, by counting all the pairs, that their smallest common prime divisor is 2, then 3, 5, etc., but it seems like a different dead end to me.

+8

algorithm primes counting

Cris Jul 17 '14 at 15:02

source share

5 answers

Further exploitation of the ideas that I mentioned in my question, I really managed to find a solution. Since some of you may be interested in this, I will briefly describe it. It works in O (m log m + n), I already implemented it in C ++ and tested it - it solves the biggest cases (10 ^ 6 integers) in less than 5 seconds.

We have n integers, all at most m. To begin with, Eratosthenes Sieve maps every integer to m so that it reduces the prime factor, allowing us to subtract any number not greater than m in O (log m). Then for all given numbers A [i], if there exists some prime number p, which divides A [i] by a power greater than one, we divide A [i] by it, because, asking the question, are two numbers coprime we can omit the exhibitors. This leaves us with all A [i] being the products of various primes.

Now suppose that we were able in a reasonable time to build a table T such that T [i] is the number of records A [j] such that I divides A [j]. This is somehow similar to the @Brainless approach taken in his second answer. To build table T quickly was a technique that I talked about in the comments below my question.

From now on, we will work on the principle of inclusion-exclusion. Having T, for each I we calculate P [i] - the number of pairs (j, k) such that A [j] and A [k] are divided by i. Then, to calculate the answer, we summarize all P [i], taking the minus sign in front of those P [i] for which I have an even number of prime divisors. Note that all prime divisors i are different, because for all other indices i P [i] is 0. Inclusion-exclusion each pair will be counted only once. To see it differently, take the pair A [i] and A [j], assuming that they have exactly k common prime divisors. Then this pair will be considered k times, then discounted kC2 times, calculated kC3 times, discounted kC4 times ... for nCk see Newton's symbol. Some mathematical manipulations make us see that the pair in question will be considered 1 - (1-1) ^ k = 1 times, which completes the proof.

The steps taken so far require O (m log log m) for the sieve and O (m) to calculate the result. The last thing to do is build an array T. We could for each A [i] just increase T [j] for all j dividing i. Since A [i] can have at most O (sqrt (A [i])) divisors (and in practice even less than this), we could construct T in O (n sqrt m). But we can do it better!

Take a two-dimensional array W. At each moment of time, the following invariant holds: if for each nonzero W [i] [j] we will increase the counter in table T by W [i] [j] for all numbers that divide i and also separate the exact exponents i in j are the smallest prime divisors of i, then T will be constructed properly. Since this may seem a bit confusing, take a look at it in action. In the beginning, to make the invariant true, for each A [i] we simply increase W [A [i]] [0]. We also note that a number not exceeding m can have at most O (log m) prime divisors; therefore, the total size of W is O (m log m). Now we see that the information stored in W [i] [j] can be “pushed forward” as follows: consider p as the (j + 1) -th first divisor of i, if it has one. Then some divisor I can either have p with exponent, as in i, or lower. The first of these cases - W [i] [j + 1] - we add another prime, which should be “fully taken” by the divisor. The second case of W [i / p] [j] as a divisor i that does not have p with the highest exponent, must also divide i / p. And this! Consider everything i in descending order, then j in ascending order. We “promote” information from W [i] [j]. See that if I have exactly j simple divisors, then the information from it cannot be pressed, but we really do not need it! If I have j prime divisors, then W [i] [j] basically says: increment by W [i] [j] contains only the index i in the array T. Therefore, when all the information was clicked on the "last lines" in each W [i], we go through these lines and complete the construction of T. Since each cell W [i] [j] was visited once, this algorithm takes O (m log m) and O (n) time at the beginning. This completes the construction. Here is the C ++ code from the actual implementation:

 FORD(i,SIZE(W)-1,2) //i in descending order { int v = i, p; FOR(j,0,SIZE(W[i])-2) //exclude last row { p = S[v]; //j-th divisor; S[v] - smallest prime divisor of v while (v%p == 0) v /= p; W[i][j+1] += W[i][j]; W[i/p][j] += W[i][j]; } T[i] = W[i].back(); }

In the end, I would say that I think that the array T can be built faster and easier than I showed. If anyone has a clear idea of how this can be done, I would appreciate all the feedback.

0

Cris Jul 18 '14 at 21:03

source share

Here's an idea based on a formula for the complete sequence 1..n found at http://oeis.org/A018805 :

 a(n) = 2*( Sum phi(j), j=1..n ) - 1, where phi is Euler totient function

We proceed to the sequence, S For each member of S_i :

for each of the simple coefficients p , S_i :
if the hash for p does not exist:
create a hash with index p that points to the set of all indices S except i ,
& counter> rest:
delete i in the existing set of indices and increase the counter

sort hashes for S_i simple coefficients by their counters in descending order. Beginning with
the largest counter (which means the smallest set), make a list of indices to i , which are also members of the next smallest set, until the sets are exhausted. Add the remaining number, indexes in the list to the total.

Example:

 sum phi' [4,7,10,15,21] S_0: 4 prime-hash [2:1-4], counters [2:1] 0 indexes up to i in the set for prime 2 total 0 S_1: 7 prime hash [2:1-4; 7:0,2-4], counters [2:1, 7:1] 1 index up to i in the set for prime 7 total 1 S_2: 10 prime hash [2:1,3-4; 5:0-1,3-4; 7:0,2-4], counters [2:2, 5:1, 7:1] 1 index up to i in the set for prime 2, which is also a member of the set for prime 5 total 2 S_3: 15 prime hash [2:1,3-4; 5:0-1,4; 7:0,2-4; 3:0-2,4], counters [2:2: 5:2, 7:1, 3:1] 2 indexes up to i in the set for prime 5, which are also members of the set for prime 3 total 4 S_4: 21 prime hash [2:1,3-4; 5:0-1,4; 7:0,2-3; 3:0-2], counters [2:2: 5:2, 7:2, 3:2] 2 indexes up to i in the set for prime 7, which are also members of the set for prime 3 total 6 6 coprime pairs: (4,7),(4,15),(4,21),(7,10),(7,15),(10,21)

0

גלעד ברקן Jul 19 '14 at 3:53

source share

I would suggest:

1) Use Eratosthene to get a list of sorted primes under 10 ^ 6.

2) For each number n in the list, get its prime factors. Associate it with another number f (n) as follows: say that the prime factors n are 3, 7 and 17. Then the binary representation of f (n) is:

 `0 1 0 1 0 0 1`

The first digit (0 here) is associated with a prime number 2, the second (1 here) is associated with a prime number 3, etc.

Therefore, 2 numbers n and m are coprime if f (n) and f (m) = 0 .

3) It is easy to see that there exists N such that for each n: f (n) <= (2 ^ N) - 1. This means that the largest number f (n) is less than or equal to a number whose binary representation is:

 `1 1 1 1 1 1 1 1 1 1 1 1 1 1 1`

Here N is the number 1 in the above sequence. Get this N and sort the list of numbers f (n). Call this list L. If you want to optimize: instead of sorting duplicates in this list, store a pair containing f (n) and the number of times f (n) is duplicated.

4) Iterations from 1 to N as follows: initialize i = 1 0 0 0 0 and at each iteration move the digit 1 to the right, while all other values will be stored at 0 (implement it using a bit shift).

At each iteration, iterating over L to get the number d (i) of the elements l in L, such that I and l! = 0 (be careful if you use the above optimization). In other words, for each I get the number of elements in L that are not coprime with i, and call this number d (i). Add general

 D = d(1) + d(2) + ... + d(N)

5) This number D is the number of pairs that are not coprime in the original list. Number of mutually simple pairs:

 M*(M-1)/2 - D

where M is the number of elements in the source list. The complexity of this method is O (n log (n)).

Good luck

-one

Brainless Jul 17 '14 at 17:03

source share

My previous answer was wrong, apologized. I suggest a modification here:

Once you get the prime divisors of each list number, match each prime p the number l (p) of the numbers in the list that has p as a divisor. For example, consider the prime number 5, and the number in the list, which can be divided by 5, is 15, 100 and 255. Then l (5) = 3.

To achieve this in O (n logn), go through the list and for each number in this list, sort it out into simple factors; for each prime factor p, increase it l (p).

Then the number of pairs that do not coincide and are divisible by p is

 l(p)*(l(p) - 1) / 2

Sum this number for all primes p, and you get the number of pairs in the list that are not coprime (note that l (p) may be 0). Let say that this sum is equal to D, then the answer

 M*(M-1)/2 - D

where M is the length of the list. Good luck

-one

Brainless Jul 18 '14 at 1:41

source share

j_random_hacker · Accepted Answer · 2014-07-21T21:12:47+0000

I came up with a slightly faster alternative based on your answer. At my PC, my C ++ implementation (below) takes about 350 ms to solve any instance of the problem; on my old laptop it takes a little more than 1 second. This algorithm avoids all division and modulation operations and uses only the O (m) space.

As with your algorithm, the main idea is to apply the exclusion principle by listing each number 2 <= i <= m, which does not contain repeating coefficients exactly once, and for each such i, by counting the number of input numbers that are divided by I and either add or subtract this from the total. The main difference is that we can make the counting part “stupid” by simply checking to see if each of the possible short-term s appears in the input , and it still only takes O (m log m) time.

How many times the innermost line is repeated c += v[j].freq; in countCoprimes() ? The body of the outer loop is executed once for each number 2 <= i <= m, which does not contain repeating simple factors; this number of iterations is trivially bounded above m. The inner loop advances I steps at a time in the range [2..m], so the number of operations that it performs during one iteration of the outer loop is limited to above m / i. Therefore, the total number of iterations of the innermost line is bounded above by the sum from π = 2 to m from m / i. M-factor can be shifted beyond the sum to get the upper bound

 m * sum{i=2..m}(1/i)

This sum is a partial sum in the harmonic series, and it is limited by the log (m) , therefore the total number of Internal iterations of the cycle is O (m log m).

extendedEratosthenes() designed to reduce constant factors by avoiding all divisions and preserving O (m) memory usage. All countCoprimes() really should know that the number 2 <= i <= m is equal to (a) whether it has repeating simple coefficients, and if it is not, (b) whether it has an even or odd number of main factors. To calculate (b), we can use the fact that the Sieve of Eratosthenes effectively “hits” any given self with its various primary factors in ascending order, so we can just flip slightly (the parity field in the struct entry ) to track, has whether I am an odd or even number of factors. Each number starts with a prod field of 1; for the record (a) we simply “knock out” any number that contains the square of the prime number as a factor, setting its prod field to 0. This field serves a double purpose: if v[i].prod == 0 , it means that I was found to have recurring factors; otherwise, it contains the product of (necessarily different) factors discovered so far. A useful (rather insignificant) utility for this is that it allows us to stop the main sieve cycle by the square root of m rather than go to m: now for any given i that has no repeating factors, or v[i].prod == i , and in this case we found all the factors for i or v[i].prod < i , and in this case I should have exactly one factor> sqrt (3000000), which we have not yet taken into account. We can find all of these remaining “big factors” with a second, non-nested loop.

 #include <iostream> #include <vector> using namespace std; struct entry { int freq; // Frequency that this number occurs in the input list int parity; // 0 for even number of factors, 1 for odd number int prod; // Product of distinct prime factors }; const int m = 3000000; // Maximum input value int n = 0; // Will be number of input values vector<entry> v; void extendedEratosthenes() { int i; for (i = 2; i * i <= m; ++i) { if (v[i].prod == 1) { for (int j = i, k = i; j <= m; j += i) { if (--k) { v[j].parity ^= 1; v[j].prod *= i; } else { // j has a repeated factor of i: knock it out. v[j].prod = 0; k = i; } } } } // Fix up numbers with a prime factor above their square root. for (; i <= m; ++i) { if (v[i].prod && v[i].prod != i) { v[i].parity ^= 1; } } } void readInput() { int i; while (cin >> i) { ++v[i].freq; ++n; } } void countCoprimes() { __int64 total = static_cast<__int64>(n) * (n - 1) / 2; for (int i = 2; i <= m; ++i) { if (v[i].prod) { // i must have no repeated factors. int c = 0; for (int j = i; j <= m; j += i) { c += v[j].freq; } total -= (v[i].parity * 2 - 1) * static_cast<__int64>(c) * (c - 1) / 2; } } cerr << "Total number of coprime pairs: " << total << "\n"; } int main(int argc, char **argv) { cerr << "Initialising array...\n"; entry initialElem = { 0, 0, 1 }; v.assign(m + 1, initialElem); cerr << "Performing extended Sieve of Eratosthenes...\n"; extendedEratosthenes(); cerr << "Reading input...\n"; readInput(); cerr << "Counting coprimes...\n"; countCoprimes(); return 0; }

Counting a number in a sequence

More articles: