C ++ socket programming: maximize throughput / throughput on localhost (I only get 3 Gbit / s instead of 23 Gbit / s)

I want to create a C ++ server / client that maximizes throughput through TCP socket communications on my local host. As preparation I used iperf to find out what is the maximum bandwidth for my i7 MacBookPro.

------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 256 KByte (default) ------------------------------------------------------------ [ 4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51583 [ 4] 0.0-120.0 sec 329 GBytes 23.6 Gbits/sec 

Without any configuration, ipref showed me that I can achieve at least 23.2 GBit / s. Then I performed my own implementation of the server / client in C ++, here you can find the full code: https://gist.github.com/1116635

In this code, I basically pass an array of 1024bytes int with each read / write operation. So my send loop on the server is as follows:

  int n; int x[256]; //fill int array for (int i=0;i<256;i++) { x[i]=i; } for (int i=0;i<(4*1024*1024);i++) { n = write(sock,x,sizeof(x)); if (n < 0) error("ERROR writing to socket"); } 

My client acquisition loop looks like this:

 int x[256]; for (int i=0;i<(4*1024*1024);i++) { n = read(sockfd,x,((sizeof(int)*256))); if (n < 0) error("ERROR reading from socket"); } 

As mentioned in the header, starting this (compiled with -O3) results in the following runtime, which is about 3 Gb / s:

 ./client 127.0.0.1 1234 Elapsed time for Reading 4GigaBytes of data over socket on localhost: 9578ms 

Where will I lose bandwidth, what am I doing wrong? Again, the full code can be seen here: https://gist.github.com/1116635

Any help is appreciated!

+4
source share
4 answers

My previous answer was erroneous. I tested your programs and here are the results.

  • If I run the original client, I get 0m7.763s
  • If I use the buffer 4 times as much, I get 0m5.209s
  • With a buffer 8 times as the original, I get 0m3.780s

I just changed the client. I suspect that more performance can be compressed if you also change the server.

The fact that I got radically different results with what you did ( 0m7.763s vs 9578ms ) also suggests that this is due to the number of system calls made (since we have different processors ..). Compress even greater performance:

  • Use scater-gather I / O commands ( readv and writev )
  • Use zero copy mechanisms: splice(2) , sendfile(2)
+3
source
  • Use larger buffers (i.e. make fewer library / system calls)
  • Use asynchronous APIs
  • Read the documentation (read / write return value is not just an error condition, it also represents the number of bytes read / written)
+5
source

You can use strace -f iperf -s localhost to find out what iperf does differently. It looks like it uses significantly larger buffers (131072 bytes, large with 2.0.5) than you.

In addition, iperf uses multiple threads. If you have 4 processor cores, using two threads on the client and server will result in approximately double the performance.

+3
source

If you really want to use the maximum performance, use mmap + splice/sendfile , and for communication through localhost use the sockets of the domain unix domain ( AF_LOCAL ).

+1
source

All Articles