How to improve compilation time for a giant boolean expression?

I need to perform a rather complicated check on a vector, and I have to repeat it thousands and millions of times. To make it more efficient, I translated this formula into C ++ source code and compiled it in a highly optimized binary format, which I call in my code. The formula is always pure Boolean: only & &, || as well as! used. Typical source code is as follows:

#include <assert.h> #include <vector> using DataType = std::vector<bool>; static const char T = 1; static const char F = 0; const std::size_t maxidx = 300; extern "C" bool check (const DataType& l); bool check (const DataType& l) { assert (l.size() == maxidx); return (l[0] && l[1] && l[2]) || (l[3] && l[4] && l[5]); //etc, very large line with && and || everywhere } 

I will compile it as follows:

 g++ -std=c++11 -Ofast -march=native -fpic -c check.cpp 

The effectiveness of the resulting binary is critical.

It worked great with a recent test case with lots of variables (300, as you can see above). In this case, g ++ consumes more than 100 GB of memory and freezes forever.

My question is pretty simple: how can I simplify this code for the compiler? Should I use some additional variables, get rid of the vector or something else?

EDIT1: Ok, here is a screenshot from the top utility.

enter image description here

cc1plus is busy with my code. The test function depends on 584 variables (sorry for the inaccurate number in the above example) and contains 450,000 expressions.

I agree with @akakatak's comment below. G ++ seems to be doing something O (N ^ 2).

+6
source share
2 answers

This is a small entry, but I still have to share my results.

The solution suggested by Thilo in the comments above is the best. It is very simple and provides measurable compilation time. Just divide your expression into pieces of the same size. But, in my experience, you should carefully choose the desired length of the subsequence - you may encounter a significant decrease in performance with a large number of subexpressions; the compiler cannot fully optimize the entire expression.

0
source

The obvious optimization is to throw out the vector and use a bit field based on the maximum possible integer type:

 uint_fast8_t item [n]; 

You can write it like

 #define ITEM_BYTES(items) ((items) / sizeof(uint_fast8_t)) #define ITEM_SIZE(items) ( ITEM_BYTES(items) / CHAR_BIT + (ITEM_BYTES(items)%CHAR_BIT!=0) ) ... uint_fast8_t item [ITEM_SIZE(n)]; 

Now you have a piece of memory with n segments, where each segment is the ideal size for your processor. In each such segment, set the bits to 1 = true or 0 = false using bitwise operators.

Depending on how you want to optimize, you have to group the bits differently. I would suggest storing 3 bits of data in each segment, since you always want to check 3 adjacent logical numbers. This means that ā€œnā€ in the example above will be the total number of Boolean numbers divided by 3.

Then you can just iterate over the array this way:

 bool items_ok () { for(size_t i=0; i<n; i++) { if( (item[i] & 0x7u) == 0x7u ) { return true; } } return false; } 

Using the above method, you optimize:

  • The size of the data in which comparisons are made, and with possible alignment problems.
  • Total memory usage.
  • The number of branches needed for comparisons.

It also eliminates any inefficiency risks caused by normal C ++ meta-programming. I would never trust std::vector , std::array or std::bitfield to create optimal code.

Once you have completed the above work, you can always check if the std::bitfield etc container has the same efficient machine code. If you find that they spawned any form of unrelated insanity in your machine code, do not use them.

0
source

All Articles