We have a similar discussion about the tuple and structure, and I am writing some simple tests using one of my colleges to determine the differences in lead times between the tuple and structure. First, we start with the default structure and the tuple.
struct StructData { int X; int Y; double Cost; std::string Label; bool operator==(const StructData &rhs) { return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label); } bool operator<(const StructData &rhs) { return X < rhs.X || (X == rhs.X && (Y < rhs.Y || (Y == rhs.Y && (Cost < rhs.Cost || (Cost == rhs.Cost && Label < rhs.Label))))); } }; using TupleData = std::tuple<int, int, double, std::string>;
Then we use Celero to compare the performance of our simple structure and tuple. Below is the comparative code and the results obtained with gcc-4.9.2 and clang-4.0.0:
std::vector<StructData> test_struct_data(const size_t N) { std::vector<StructData> data(N); std::transform(data.begin(), data.end(), data.begin(), [N](auto item) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<> dis(0, N); item.X = dis(gen); item.Y = dis(gen); item.Cost = item.X * item.Y; item.Label = std::to_string(item.Cost); return item; }); return data; } std::vector<TupleData> test_tuple_data(const std::vector<StructData> &input) { std::vector<TupleData> data(input.size()); std::transform(input.cbegin(), input.cend(), data.begin(), [](auto item) { return std::tie(item.X, item.Y, item.Cost, item.Label); }); return data; } constexpr int NumberOfSamples = 10; constexpr int NumberOfIterations = 5; constexpr size_t N = 1000000; auto const sdata = test_struct_data(N); auto const tdata = test_tuple_data(sdata); CELERO_MAIN BASELINE(Sort, struct, NumberOfSamples, NumberOfIterations) { std::vector<StructData> data(sdata.begin(), sdata.end()); std::sort(data.begin(), data.end());
Performance results compiled with clang-4.0.0
Celero Timer resolution: 0.001000 us ----------------------------------------------------------------------------------------------------------------------------------------------- Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec | ----------------------------------------------------------------------------------------------------------------------------------------------- Sort | struct | Null | 10 | 5 | 1.00000 | 196663.40000 | 5.08 | Sort | tuple | Null | 10 | 5 | 0.92471 | 181857.20000 | 5.50 | Complete.
And the results obtained with gcc-4.9.2
Celero Timer resolution: 0.001000 us ----------------------------------------------------------------------------------------------------------------------------------------------- Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec | ----------------------------------------------------------------------------------------------------------------------------------------------- Sort | struct | Null | 10 | 5 | 1.00000 | 219096.00000 | 4.56 | Sort | tuple | Null | 10 | 5 | 0.91463 | 200391.80000 | 4.99 | Complete.
From the above results it is clearly seen that
Tuple is faster than the default structure
Clang binaries have better performance than gcc. clang-vs-gcc is not the purpose of this discussion, so I will not go into details.
We all know that writing == or <or> for each structure definition will be a painful and erroneous task. Let us replace our custom comparator using std :: tie and repeat our test.
bool operator<(const StructData &rhs) { return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label); } Celero Timer resolution: 0.001000 us ----------------------------------------------------------------------------------------------------------------------------------------------- Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec | ----------------------------------------------------------------------------------------------------------------------------------------------- Sort | struct | Null | 10 | 5 | 1.00000 | 200508.20000 | 4.99 | Sort | tuple | Null | 10 | 5 | 0.90033 | 180523.80000 | 5.54 | Complete.
Now we see that using std :: tie makes our code more elegant and harder to make mistakes, however we will lose about 1% of performance. I will stay with the std :: tie solution, since I also get a warning about comparing floating point numbers with a configured comparator.
So far, we have no solution to make our structure code even faster. Letβs take a look at the swap function and rewrite it to see if we can get any performance:
struct StructData { int X; int Y; double Cost; std::string Label; bool operator==(const StructData &rhs) { return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label); } void swap(StructData & other) { std::swap(X, other.X); std::swap(Y, other.Y); std::swap(Cost, other.Cost); std::swap(Label, other.Label); } bool operator<(const StructData &rhs) { return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label); } };
Results obtained using clang-4.0.0
Celero Timer resolution: 0.001000 us ----------------------------------------------------------------------------------------------------------------------------------------------- Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec | ----------------------------------------------------------------------------------------------------------------------------------------------- Sort | struct | Null | 10 | 5 | 1.00000 | 176308.80000 | 5.67 | Sort | tuple | Null | 10 | 5 | 1.02699 | 181067.60000 | 5.52 | Complete.
And the results obtained with gcc-4.9.2
Celero Timer resolution: 0.001000 us ----------------------------------------------------------------------------------------------------------------------------------------------- Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec | ----------------------------------------------------------------------------------------------------------------------------------------------- Sort | struct | Null | 10 | 5 | 1.00000 | 198844.80000 | 5.03 | Sort | tuple | Null | 10 | 5 | 1.00601 | 200039.80000 | 5.00 | Complete.
Now our structure is slightly faster than the tuple (about 3% with clang and less than 1% with gcc), however we need to write our custom swap function for all our structures.