I linger on my research as struct.py, essentially just importing from _struct, and I have no idea where this is.
In CPython, _struct is a C extension module built from _struct.c in the Modules directory in the source tree. You can find the code here.
Whenever foo.py performs import _foo , it is almost always a C extension module, usually built from _foo.c . And if you can't find foo.py , it's probably the C extension module built from _foomodule.c .
It is also often worth looking at the equivalent source of PyPy , even if you are not using PyPy. They override almost all extension modules in pure Python, and for the rest (including this case), the main "extension language" is RPython, not C.
However, in this case, you do not need to know anything about how the struct works outside of what is in the documents.
The byte limit seems arbitrary if the underlying architecture is all 64-bit.
Look at the code that it calls:
self._send(struct.pack("!i", n))
If you look at the documentation , the format character 'i' explicitly means "4-byte integer C", not "all ssize_t is". You will need to use 'n' . Or you can explicitly use long long, with 'q' .
You can monkeypatch multiprocessing use struct.pack('!q', n) . Or '!q' . Or encode the length in some way other than struct . Of course, this will lead to a violation of compatibility with multiprocessing -fixed multiprocessing , which can be a problem if you are trying to perform distributed processing on multiple computers or something like that. But it should be pretty simple:
def _send_bytes(self, buf): # For wire compatibility with 3.2 and lower n = len(buf) self._send(struct.pack("!q", n)) # was !i # The condition is necessary to avoid "broken pipe" errors # when sending a 0-length buffer if the other end closed the pipe. if n > 0: self._send(buf) def _recv_bytes(self, maxsize=None): buf = self._recv(8) # was 4 size, = struct.unpack("!q", buf.getvalue()) # was !i if maxsize is not None and size > maxsize: return None return self._recv(size)
Of course, there is no guarantee that this change is enough; you will want to read the rest of the surrounding code and check it out.
Note. I suspect that using Queue instead of Pipe will not solve the problem, since I suspect Queue using a similar intermediate pickling step.
Well, the problem has nothing to do with pickling. Pipe does not use pickle to send lengths using struct . You can make sure pickle will not have this problem: pickle.loads(pickle.dumps(1<<100)) == 1<<100 will return True .
(In earlier versions of pickle there were also problems with huge objects, for example, a list of 2G elements, which could cause problems with a scale approximately 8 times higher than the one you are currently clicking on. Which was fixed at 3.3.)
Meanwhile ... wouldn't it be faster just to try and see, and not dig through the source to try to figure out if this will work?
Also, are you sure you really want to transfer the 2 GB data structure by implicit etching?
If I were doing something slow and hungry, I would prefer to make it explicit, for example, chop the temporary file and send the path or fd. (If you are using numpy or pandas or something else, use its binary format instead of pickle , but the same idea.)
Or, better yet, share the data. Yes, the altered general state is bad ... but sharing immutable objects is fine. No matter you have 2 GB, can you put it in multiprocessing.Array or put it in an array or ctypes structure (arrays or structures ...) that you can provide via multiprocessing.sharedctypes or ctypes this is from file , what are you mmap on both sides, or ...? There's a bit of extra code to define and highlight structures, but when the benefits are likely to be so great, it's worth a try.
Finally, when you think you have discovered an error / obvious missing feature / unreasonable restriction in Python, it's worth looking at the error tracker. It looks like problem 17560: a multiprocessing problem with really large objects? This is your problem and it has a lot of information, including suggested workarounds.