Std :: vector vs normal array

I am creating a program that should be very fast. It runs some things on the GPU using CUDA, and then performs some calculations on the processor. To do this, I need to convert a highly optimized graphical data structure into something that I can easily use on the processor. My data is basically a graph laid out in a grid. I am currently using std :: vector for the processor part. Since I know that quite a lot has been imposed, if I do a lot push_back(), and at least I know, because I know how many vertices I have on my graph, now I use the following code to do this:

new_graph.resize(blockSize * blockSize);
for (unsigned long long y = 0; y < blockSize; y++) {
    for (unsigned long long x = 0; x < blockSize; x++) {
        int idx = y * blockSize + x;
        new_graph[idx] = Vertex(x, y);
    }
}

Then I add the ribs. Unfortunately, I don’t know how many edges I have on the top, but I know that it will never be more than 8. Therefore, I am reserve()8 in every std :: vector that I use for the edges.

However, both of them seem very slow. If I use a regular array for the graph itself (therefore basically replacing the external std :: vector), the speed improvement in this part is huge (like 10 or so).

For the graph, this is doable, but really not for the edges, because I'm doing postprocessing at these edges, and for this I really need something like std :: vector, which is kind of dynamic (I'm adding a few edges).

std::vector 10 , ( MST). , , .

- , ?

p.s. -O2, , . -O3, .

:

struct Pos {
    int x, y;
    Pos() {
        x = 0;
        y = 0;
    }

    Pos(int x, int y) {
        this->x = x;
        this->y = y;
    }
};

struct Vertex {
    Pos pos;
    bool hidden;
    unsigned long long newIdx;
    Vertex() {
        this->pos = Pos();
        this->hidden = false;
        this->numEdges = 0;
        this->numRemovedEdges = 0;
    }

    Vertex(Pos &pos) {
        this->pos = pos;
        this->hidden = false;
        this->numEdges = 0;
        this->numRemovedEdges = 0;
    }

    Vertex(int x, int y) {
        this->pos = Pos(x, y);
        this->hidden = false;
        this->numEdges = 0;
        this->numRemovedEdges = 0;
    }
    int numEdges;
    int numRemovedEdges;
    std::vector<Edge> edges;
    std::vector<bool> removed;
    std::vector<bool> doNotWrite;
};
+5
4

, , vector , ?

reserve , 3 Vertex ( edges, removed doNotWrite). , .

, ( ), vector, .


? , , , ?


, Vertex.pos? Vertex ?

+3

, . llvm SmallVector. , std::vector, ( , , ). SmallVector , , - .

, SmallVector:

  • , , 2, , 1 , . 99.99%
  • swap() (SmallVector(). swap (vec)) ,

llvm SmallVector

+1

- , . , , . , GPU ?

AoS, 1. Vertex. 2. . std::vector dynb 3. remove doNotWrite std:: bitset < 8 > . 4. numRemoveEdges. remove.count(). 5. Edge , [8]. 6. , . 7. , .

, , . UVA (CUDA Linux), .

+1
source

It is not possible to create one Vertex object, memcpy the x and y values ​​in it (so you do not need to call the constructor for each loop) and then memcpy the whole Vertex into your std :: vector? It is guaranteed that the vector memory will be laid out as a regular array, so you can bypass the entire abstraction and manipulate the memory directly. No need for complicated things. In addition, perhaps you can compose the data that you return from the GPU so that you can memcpy entire blocks at once, saving you even more.

0
source

All Articles