A more efficient way for my program to write every billionth combination?

Question

A more efficient way for my program to write every billionth combination?

Thus, the following program generates combinations of characters in this master line that you will see in the program. First, the program generates all 48 of the 12 combinations, and then up to 48 chooses 19.

The problem is that the total number of combinations is 65 trillion, which cannot be calculated in a reasonable amount of time. I thought, “OK, I’ll just write every one billion to a file.” Well, it will also take a lot of time, because the program should still be up to 65 trillion, even if it only records every billionth combination.

Is there anything that I could change in my program to avoid this, to count on an unusually large amount, but still write every billionth combination in a file?

#include <iostream> #include <string> #include <iostream> #include <fstream> #include <vector> using namespace std; template <typename Iterator> bool next_combination(const Iterator first, Iterator k, const Iterator last) { if ((first == last) || (first == k) || (last == k)) return false; Iterator i1 = first; Iterator i2 = last; ++i1; if (last == i1) return false; i1 = last; --i1; i1 = k; --i2; while (first != i1) { if (*--i1 < *i2) { Iterator j = k; while (!(*i1 < *j)) ++j; std::iter_swap(i1,j); ++i1; ++j; i2 = k; std::rotate(i1,j,last); while (last != j) { ++j; ++i2; } std::rotate(k,i2,last); return true; } } std::rotate(first,k,last); return false; } unsigned long long count = 0; int main() { ofstream myfile; myfile.open ("m = 8.txt"); string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop"; for (int i = 12; i <= 19; i++) { std::size_t comb_size = i; do { if (count == 0) myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl; if (++count % 1000000000 == 0) myfile << std::string(s.begin(),s.begin() + comb_size) << std::endl; }while(next_combination(s.begin(),s.begin()+ comb_size,s.end())); } myfile.close(); cout << "Done!" << endl; system("PAUSE"); return 0; }

+7

c ++ permutation combinations combinatorics

user2583947 Jul 15 '13 at 14:52

source share

6 answers

Howard hinnant · Answer 1 · 2013-07-15T21:40:06+0000

I have a simple transformation to use another library, which is about 36 times faster than yours. This is still brute force. But while on my machine I estimate that your code will take 418 days, my code will only take about 3.65 days. Still outrageously long. But it comes to a long weekend.

Here is my code:

 #include <iostream> #include <string> #include <fstream> #include "../combinations/combinations" using namespace std; unsigned long long count = 0; int main() { ofstream myfile("m = 8.txt"); string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop"; for (int i = 12; i <= 19; i++) for_each_combination(s.begin(), s.begin() + i, s.end(), [&](std::string::const_iterator f, std::string::const_iterator l) -> bool { if (::count++ % 1000000000 == 0) myfile << std::string(f, l) << std::endl; return false; }); myfile.close(); cout << "Done!" << endl; return 0; }

The reduction in the number of tests on count in the inner loop was 15% higher.

"../combination/combination" refers to this library:

http://howardhinnant.imtqy.com/combinations.html

The link contains a description and full source code.

This test program can also be easily modified to calculate the total number of combinations:

 #include <iostream> #include <string> #include "../combinations/combinations" using namespace std; int main() { string s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnop"; unsigned long long count = 0; for (int i = 12; i <= 19; i++) count += count_each_combination(s.begin(), s.begin() + i, s.end()); cout << "Done! " << count << endl; return 0; }

which outputs:

 Done! 27189132782091

The code is open source with an extended license (it is not part of the boost library). Feel free to use it.

Annie kim · Answer 2 · 2013-07-15T15:35:50+0000

This is the code I wrote earlier to find the k-th permutation of this string. I think my idea is similar to @Tarik, that we do not need to list all permutations before kth.

 string getPermutation(string s, int k) { string res; int n = s.size(); int total = 1, digits = n - 1; for (int i = 1; i < n; ++i) total *= i; while (res.size() < n) { int i = 0; for (int m = 1; m < (int) ceil(k * 1.0 / total); ++m) i++; res += s[i]; s.erase(s.begin() + i); // erase from string is not a good idea:) k = (k % total == 0) ? total : k % total; total = (total == 1) ? 1 : total / digits--; } return res; }

This works well for a short line. For example, getPermutation("12345", 37) will return 24135 .

But for your string s with a length of 48 variable total will overflow even with the type long long . Therefore, we need to do extra work with this.

My code is a little hard to understand :) ~~You can improve your code.~~ I hope this helps you.

UPDADE: I understand that you need a combination not a permutation . I was completely mistaken! Forget my code :)

Bob bryan · Answer 3 · 2013-07-16T07:07:56+0000

There is a way to access k-indices from a rank without having to repeat all the previous ones. I discovered and published this technique a couple of years ago. So, if all you want to do is print every billionth combination, the class I wrote makes it almost trivial, and it takes no more than a few seconds of 1,000,000,000 to calculate all the combinations for the 27+ trillion combinations available at time intervals. 27 trillion / 1 billion = approximately 27,000 combinations to print.

The class is written in C # to handle common functions for working with the binomial coefficient. It performs the following tasks:

Prints all K-indices in a good format for any N that selects K to a file. K-indices can be replaced with more descriptive strings or letters.
Converts K-indices to the corresponding lexicographic index or ranking in a table of sorted binomial coefficients. This method is much faster than older published iteration-based methods. He does this using the mathematical property inherent in the Pascal Triangle, and is very efficient compared to iterating over a set.
Converts an index into a sorted table of binomial coefficients into the corresponding K-indices. The technique used is also much faster than older iterative solutions.
Uses the Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with large numbers.
The class is written in .NET C # and provides a way to manage the objects associated with the problem (if any) using a common list. The constructor of this class takes a bool value, called InitTable, which, when true, will create a general list for storing objects to be managed. If this value is false, then it will not create a table. The table does not need to be created in order to use the 4 above methods. Access methods are provided to access the table.
There is a related test class that shows how to use the class and its methods. It has been widely tested in many cases, and there are no known errors. The class uses a mathematical relationship that can be seen from the Pascal Triangle to provide an efficient way to translate between rank and K-indices.

To read about this class and download the code, see Tablizing the Binomial Coefficient .

The following verified sample code will go through each unique combination:

 public void Test10Choose5() { String S; int Loop; int N = 10; // Total number of elements in the set. int K = 5; // Total number of elements in each group. // Create the bin coeff object required to get all // the combos for this N choose K combination. BinCoeff<int> BC = new BinCoeff<int>(N, K, false); int NumCombos = BinCoeff<int>.GetBinCoeff(N, K); // The Kindexes array specifies the indexes for a lexigraphic element. int[] KIndexes = new int[K]; StringBuilder SB = new StringBuilder(); // Loop thru all the combinations for this N choose K case. for (int Combo = 0; Combo < NumCombos; Combo++) { // Get the k-indexes for this combination. BC.GetKIndexes(Combo, KIndexes); // Verify that the Kindexes returned can be used to retrive the // rank or lexigraphic order of the KIndexes in the table. int Val = BC.GetIndex(true, KIndexes); if (Val != Combo) { S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString(); Console.WriteLine(S); } SB.Remove(0, SB.Length); for (Loop = 0; Loop < K; Loop++) { SB.Append(KIndexes[Loop].ToString()); if (Loop < K - 1) SB.Append(" "); } S = "KIndexes = " + SB.ToString(); Console.WriteLine(S); } }

You will need to port the BinCoeff class to C ++ and use it instead of ints. Your application will have at least 2 cycles. The outer loop will create a BinCoeff object for each N selects case K. The inner loop will increment a long value by a billion and call the BinCoeff.GetKIndexes method, which will return a combination suitable for printing. The returned combination is in the form of an array with a null value (int / long). You can create another char array or a string that can be indexed with the values returned by GetKIndexes.

If you need numbers that exceed what a long value might contain, you probably want to examine the class to use the BigInteger class.

Thanks for the link to the Wolfram Alpha calculator. The best binomial odds calculator that I found earlier that works with very large numbers (he accurately calculated the case, which gave more than 15,000 digits as a result) can be found here

hivert · Answer 4 · 2013-07-16T18:24:44+0000

From http://en.wikipedia.org/wiki/Combinadic there is an algorithm for directly calculating the kth combination. First you need to save Pascal's triangle. If you need some sample code, you can have a look (Python language) https://github.com/sagemath/sagelib/blob/master/sage/combinat/choose_nk.py .

TemplateRex · Answer 5 · 2013-07-17T20:40:07+0000

You can use the bitvector to speed up some calculations adapted from bit-cool pages in the wiki chess version.

 #include <iostream> #include <iomanip> #include <cstdint> using U64 = uint64_t; // generate the next integer with the same number of bits as c U64 next_combination(U64 c) { auto const smallest = c & -c; auto const ripple = c + smallest; auto ones = c ^ ripple; ones = (ones >> 2) / smallest; return ripple | ones; } // generate all integers with k of the first n bits set template<class Function> void for_each_combination(std::size_t n, std::size_t k, Function fun) { U64 y; auto const n_mask = (1ULL << n) - 1; // mask with all n bits set to 1 auto const k_mask = (1ULL << k) - 1; // mask with first k bits set to 1 auto x = k_mask; fun(x); for (; (y = next_combination(x) & n_mask) > x; x = y) fun(y); } int main() { auto const million = 1000000ULL; auto count = U64 { 0 }; for (auto i = 12; i < 20; ++i) { for_each_combination(48, i, [&](U64 c) { /*if (count++ & million == 0) std::cout << std::dec << std::setfill(' ') << std::setw(8) << (count - 1) / million << ": " << std::hex << std::showbase << std::setfill('0') << std::setw(16) << c << "\n";*/ ++count; }); } std::cout << count << "\n"; }

In one core inside the virtual box of my Xeon E5-1650 @ 3.2 Ghz, my best estimate is that it will take 3.52 days to increase the counter 2.7e13 times (without generating the output itself). It only works for subsets of B (n, k) with n <64 unless you use a 128-bit integer class.

Given a bitvector that has k of n bits set to 1, the direct relation is to match this to the original character sequence or any other type and print any combination. For sequences that do not have random access to the iterator, it is certainly more expensive than Howard Hinnant's approach.

Yangling · Answer 6 · 2013-07-15T15:16:48+0000

if you do not care what the actual score is, you use a 32-bit int, which will still let you know that you have gained 1 billion.

A more efficient way for my program to write every billionth combination?

More articles: