Further exploitation of the ideas that I mentioned in my question, I really managed to find a solution. Since some of you may be interested in this, I will briefly describe it. It works in O (m log m + n), I already implemented it in C ++ and tested it - it solves the biggest cases (10 ^ 6 integers) in less than 5 seconds.
We have n integers, all at most m. To begin with, Eratosthenes Sieve maps every integer to m so that it reduces the prime factor, allowing us to subtract any number not greater than m in O (log m). Then for all given numbers A [i], if there exists some prime number p, which divides A [i] by a power greater than one, we divide A [i] by it, because, asking the question, are two numbers coprime we can omit the exhibitors. This leaves us with all A [i] being the products of various primes.
Now suppose that we were able in a reasonable time to build a table T such that T [i] is the number of records A [j] such that I divides A [j]. This is somehow similar to the @Brainless approach taken in his second answer. To build table T quickly was a technique that I talked about in the comments below my question.
From now on, we will work on the principle of inclusion-exclusion. Having T, for each I we calculate P [i] - the number of pairs (j, k) such that A [j] and A [k] are divided by i. Then, to calculate the answer, we summarize all P [i], taking the minus sign in front of those P [i] for which I have an even number of prime divisors. Note that all prime divisors i are different, because for all other indices i P [i] is 0. Inclusion-exclusion each pair will be counted only once. To see it differently, take the pair A [i] and A [j], assuming that they have exactly k common prime divisors. Then this pair will be considered k times, then discounted kC2 times, calculated kC3 times, discounted kC4 times ... for nCk see Newton's symbol. Some mathematical manipulations make us see that the pair in question will be considered 1 - (1-1) ^ k = 1 times, which completes the proof.
The steps taken so far require O (m log log m) for the sieve and O (m) to calculate the result. The last thing to do is build an array T. We could for each A [i] just increase T [j] for all j dividing i. Since A [i] can have at most O (sqrt (A [i])) divisors (and in practice even less than this), we could construct T in O (n sqrt m). But we can do it better!
Take a two-dimensional array W. At each moment of time, the following invariant holds: if for each nonzero W [i] [j] we will increase the counter in table T by W [i] [j] for all numbers that divide i and also separate the exact exponents i in j are the smallest prime divisors of i, then T will be constructed properly. Since this may seem a bit confusing, take a look at it in action. In the beginning, to make the invariant true, for each A [i] we simply increase W [A [i]] [0]. We also note that a number not exceeding m can have at most O (log m) prime divisors; therefore, the total size of W is O (m log m). Now we see that the information stored in W [i] [j] can be “pushed forward” as follows: consider p as the (j + 1) -th first divisor of i, if it has one. Then some divisor I can either have p with exponent, as in i, or lower. The first of these cases - W [i] [j + 1] - we add another prime, which should be “fully taken” by the divisor. The second case of W [i / p] [j] as a divisor i that does not have p with the highest exponent, must also divide i / p. And this! Consider everything i in descending order, then j in ascending order. We “promote” information from W [i] [j]. See that if I have exactly j simple divisors, then the information from it cannot be pressed, but we really do not need it! If I have j prime divisors, then W [i] [j] basically says: increment by W [i] [j] contains only the index i in the array T. Therefore, when all the information was clicked on the "last lines" in each W [i], we go through these lines and complete the construction of T. Since each cell W [i] [j] was visited once, this algorithm takes O (m log m) and O (n) time at the beginning. This completes the construction. Here is the C ++ code from the actual implementation:
FORD(i,SIZE(W)-1,2) //i in descending order { int v = i, p; FOR(j,0,SIZE(W[i])-2) //exclude last row { p = S[v]; //j-th divisor; S[v] - smallest prime divisor of v while (v%p == 0) v /= p; W[i][j+1] += W[i][j]; W[i/p][j] += W[i][j]; } T[i] = W[i].back(); }
In the end, I would say that I think that the array T can be built faster and easier than I showed. If anyone has a clear idea of how this can be done, I would appreciate all the feedback.