Fast sockets in Python

I have a client written in Python for a server that runs over a LAN. In some part of the algorithm, the socket is heavily used, and it runs about 3-6 times slower than almost the same one written in C ++. What are the solutions to speed reading Python sockets?

I have some simple buffering, and my socket class looks like this:

import socket import struct class Sock(): def __init__(self): self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) self.recv_buf = b'' self.send_buf = b'' def connect(self): self.s.connect(('', 6666)) def close(self): self.s.close() def recv(self, lngth): while len(self.recv_buf) < lngth: self.recv_buf += self.s.recv(lngth - len(self.recv_buf)) res = self.recv_buf[-lngth:] self.recv_buf = self.recv_buf[:-lngth] return res def next_int(self): return struct.unpack("i", self.recv(4))[0] def next_float(self): return struct.unpack("f", self.recv(4))[0] def write_int(self, i): self.send_buf += struct.pack('i', i) def write_float(self, f): self.send_buf += struct.pack('f', f) def flush(self): self.s.sendall(self.send_buf) self.send_buf = b'' 

PS: Profiling also shows that most of the time is spent reading sockets.

Edit: Since the data is received in blocks with a known size, I can immediately read the entire block. So I changed my code to this:

 class Sock(): def __init__(self): self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) self.send_buf = b'' def connect(self): self.s.connect(('', 6666)) def close(self): self.s.close() def recv_prepare(self, cnt): self.recv_buf = bytearray() while len(self.recv_buf) < cnt: self.recv_buf.extend(self.s.recv(cnt - len(self.recv_buf))) self.recv_buf_i = 0 def skip_read(self, cnt): self.recv_buf_i += cnt def next_int(self): self.recv_buf_i += 4 return struct.unpack("i", self.recv_buf[self.recv_buf_i - 4:self.recv_buf_i])[0] def next_float(self): self.recv_buf_i += 4 return struct.unpack("f", self.recv_buf[self.recv_buf_i - 4:self.recv_buf_i])[0] def write_int(self, i): self.send_buf += struct.pack('i', i) def write_float(self, f): self.send_buf += struct.pack('f', f) def flush(self): self.s.sendall(self.send_buf) self.send_buf = b'' 

recv 'from the socket looks optimal in this code. But now next_int and next_float have become the second bottleneck, they take about 1 ms (3000 CPU cycles) per call only for unpacking. Is it possible to make them faster, for example, in C ++?

source share
1 answer

Your last bottleneck is in next_int and next_float , because you create intermediate lines from bytearray and because you only decompress one value at a time.

The struct module has unpack_from , which takes a buffer and an offset. This is more efficient because there is no need to create an intermediate line from your bytearray :

 def next_int(self): self.recv_buf_i += 4 return struct.unpack_from("i", self.recv_buf, self.recv_buf_i-4)[0] 

In addition, the struct module can unpack several values ​​at a time. You are currently calling from Python to C (via module) for each value. You will be better served by calling it less times and letting it do more work on each call:

 def next_chunk(self, fmt): # fmt can be a group such as "iifff" sz = struct.calcsize(fmt) self.recv_buf_i += sz return struct.unpack_from(fmt, self.recv_buf, self.recv_buf_i-sz) 

If you know that fmt will always be 4 byte integers and floats, you can replace struct.calcsize(fmt) with 4 * len(fmt) .

Finally, as a preference, I think this reads more cleanly:

 def next_chunk(self, fmt): sz = struct.calcsize(fmt) chunk = struct.unpack_from(fmt, self.recv_buf, self.recv_buf_i) self.recv_buf_i += sz return chunk 


All Articles