Unfortunately, TCP cannot send messages, but only byte streams. If you want to send messages, you will need to apply the protocol from above. The best protocols for high performance are those that use a sanity-checking header with a message length - this allows you to read the correct amount of data directly into a suitable buffer object without repeating bytes of bytes bytes, a message character. You can then queue the POINTER buffer on another thread and create a new buffer object for the next message. This avoids any copying of bulk data, and for large messages it is efficient enough that using a non-blocking queue for pointers to message objects is somewhat pointless.
The next optimization is combining object buffers * to avoid continuous input / deletion, processing * buffers in the consumer stream for reuse in the network receive stream. This is fairly easy to do with ConcurrentQueue, preferably blocking to allow flow control instead of data corruption or segfaults / AV if the pool is temporarily empty.
Then add the [dead zone] cache line size] at the beginning of each data element of the buffer *, so prevent any flow from the data with a false exchange with any other.
The result should be a stream with a high bandwidth of complete messages to the consumer stream with very low latency, CPU waste or caching. All your 24 cores can run on different data.
Copying bulk data in multi-threaded applications is an assumption of poor design and failure.
Following actions.
It looks like you are stuck in repeating data due to different protocols :(
PDU buffer object without flags, example:
typedef struct{ char deadZone[256]; // anti-false-sharing int dataLen; char data[8388608]; // 8 meg of data } SbufferData; class TdataBuffer: public{ private: TbufferPool *myPool; // reference to pool used, in case more than one EpduState PDUstate; // enum state variable used to decode protocol protected: SbufferData netData; public: virtual reInit(); // zeros dataLen, resets PDUstate etc. - call when depooling a buffer virtual int loadPDU(char *fromHere,int len); // loads protocol unit release(); // pushes 'this' back onto 'myPool' };
loadPDU receives a pointer to the length of the raw network data. It returns either 0, or means that it has not yet completely collected the PDU, or the number of bytes that it consumed from the raw network data to completely collect the PDUs, in which case the turn off, depool another and call loadPDU () with an unused remainder raw data, then continue with the following raw data.
You can use different pools of different classes of derived buffers to serve different protocols, if necessary - the TbufferPool [Eprotocols] array. TbufferPool may simply be a BlockingCollection queue. Management becomes almost trivial: buffers can be sent to queues throughout your system, to a graphical interface to display statistics, and then, possibly, to the logger if release () is called at the end of the queue chain.
Obviously, the βrealβ PDU will load more methods, data unions / structures, iterators, perhaps a state mechanism for managing the protocol, but this is the main idea anyway. The main thing is simple management, encapsulation, and since neither of the two threads can work in one buffer instance, no blocking / synchronization is required for data analysis / access.
Oh, yes, and since no queue should remain locked longer than it takes to press / place one pointer, the chances of actual rivalry are very low - even regular blocking queues are unlikely to ever need to use a kernel lock.