O (N) Identification of permutations

This answer determines if the two lines are permutations by comparing their contents. If they contain the same number of each character, they are obviously permutations. This is done O (N) times.

I don't like the answer though, because it invents what is_permutation does. However, is_permutation has complexity:

No more than O (N 2 ) predicate applications, or exactly N if the sequences are already equal, where N=std::distance(first1, last1)

Therefore, I cannot advocate the use of is_permutation , where it is several orders of magnitude slower than the manual descent algorithm. But, of course, the standard performer would not have missed such an obvious improvement? So why is_permutation O (N 2 )?

+7
c ++ big-o permutation standard-library string-comparison
source share
3 answers

I wrote this answer.

When the string value_type is char , the number of elements needed in the lookup table is 256. For a double-byte encoding, 65536. For a four-byte encoding, the lookup table will have a little over 4 billion entries, with a probable size of 16 GB! And most of them will not be used.

So, the first thing to recognize, even if we restrict the types char and wchar_t , it can still be untenable. Similarly, if we want to do is_permutation for sequences of type int .

We could have a specialization of std::is_permutation<> for integral types of 1 or 2 bytes in size. But this is somewhat reminiscent of std::vector<bool> , which not everyone considers a good idea in retrospect.

We could also use a lookup table based on std::map<T, size_t> , but it will probably be heavy, so it may not be a win in performance (or at least not always). It might be worth doing one for a detailed comparison.

In general, I am not mistaken in the C ++ standard, not including the high-performance version of is_permutation for char . Firstly, because in the real world I’m not sure that this is the most common use of the template, and secondly, because STL is not all and all algorithms, especially where domain knowledge can be used to speed up calculations for special cases .

If it turns out that is_permutation for char quite common in the wild, C ++ library developers will, within their rights, provide specialization for it.

+7
source share

is_permutation works with almost any data type. The algorithm in your link works for data types with few values.

For the same reason that std::sort is O (N log N), but the sort count is O (N).

+9
source share

The answer you quote works on char s. It is assumed that they are 8 bits (this is not necessary), and therefore only 256 possibilities are available for each value, and you can cheaply switch from each value to the numerical index that will be used for the counter lookup table (for char in this case, the value and the index is the same thing!)

It generates the number of times how many times each char value occurs on each line; then, if these distributions are the same for both rows, the rows are permutations of each other.

What is the time complexity?

  • you need to go through each character of each line, so steps M + N for two inputs of lengths M and N
  • each of these steps involves increasing the number in a fixed-size table with the index given by char, so constant time

So the overall time complexity is O (N + M): linear, as you described.

Now std::is_permutation does not make such assumptions about its input. He does not know that there are only 256 possibilities, or even that they are limited at all. He does not know how to move from an input value to a number that he can use as an index, no matter how to do it in constant time. The only thing he knows is to compare the two values ​​for equality, as the caller passes this information.

So the time complexity:

  • we know that he should consider each element of each input at some point
  • we know that for every element that he has not seen before (I will leave a discussion of how this is defined and why this does not affect O’s great complexity as an exercise), he cannot turn the element into any type of index or key for the counter table , so it is not possible to calculate how many elements of this element exist, which is better than moving linearly through both inputs to see how many elements correspond

therefore, complexity will be quadratic at best.

+4
source share

All Articles