How to protect Python distributed computing layer

Question

How to protect Python distributed computing layer

These modules are designed to provide the level of computing capacity on multiple computers. What one or more proven methods are available to protect against fake packets? How can I best do a deep copy of any non-included objects referenced by the transferred caller? Is an object function the best way to encapsulate client jobs? Finally: is it possible to improve this code? Post Script: Please excuse my last question. I need to redeem my reputation.

sock.py

from socket import socket from socket import AF_INET from socket import SOCK_STREAM from socket import gethostbyname from socket import gethostname class SocketServer: def __init__(self, port): self.sock = socket(AF_INET, SOCK_STREAM) self.port = port def send(self, tdata): self.sock.bind(("127.0.0.1", self.port)) self.sock.listen(len(tdata)) while tdata: s = self.sock.accept()[0] for x in tdata.pop(): s.send(x) s.close() self.sock.close() class Socket: def __init__(self, host, port): self.sock = socket(AF_INET, SOCK_STREAM) self.sock.connect((host, port)) def recv(self, size): return self.sock.recv(size) def close(self): self.sock.close()

pack.py

 #http://stackoverflow.com/questions/6234586/we-need-to-pickle-any-sort-of-callable from marshal import dumps as marshal_dumps from pickle import dumps as pickle_dumps from struct import pack as struct_pack from hashlib import sha224 class packer: def __init__(self): self.f = [] def pack(self, what): if type(what) is type(lambda:None): self.f = [] self.f.append(marshal_dumps(what.func_code)) self.f.append(pickle_dumps(what.func_name)) self.f.append(pickle_dumps(what.func_defaults)) self.f.append(pickle_dumps(what.func_closure)) self.f = pickle_dumps(self.f) return (struct_pack('Q', len(self.f)), self.f) return None def gethash(self): hash = sha224(self.f).hexdigest() return (struct_pack('Q', len(hash)), hash) def getwithhash(self, what): a, b = self.pack(what) c, d = self.gethash() return (a, b, c, d)

unpack.py

 from types import FunctionType from pickle import loads as pickle_loads from marshal import loads as marshal_loads from struct import unpack as struct_unpack from struct import calcsize from hashlib import sha224 #http://stackoverflow.com/questions/6234586/we-need-to-pickle-any-sort-of-callable class unpacker: def __init__(self): self.f = [] self.fcompiled = lambda:None self.sizeofsize = calcsize('Q') def unpack(self, sock): size = struct_unpack('Q', sock.recv(self.sizeofsize))[0] self.f = sock.recv(size) size = struct_unpack('Q', sock.recv(self.sizeofsize))[0] hash0 = sock.recv(size) sock.close() hash1 = sha224(self.f).hexdigest() if hash0 != hash1: return None self.f = pickle_loads(self.f) a = marshal_loads(self.f[0]) b = globals() # TODO c = pickle_loads(self.f[1]) d = pickle_loads(self.f[2]) e = pickle_loads(self.f[3]) self.fcompiled = FunctionType(a, b, c, d, e) return self.fcompiled

test.py

 from unpack import unpacker from pack import packer from sock import SocketServer from sock import Socket from threading import Thread from time import sleep count = 2 port = 4446 def f(): print 42 def server(): ss = SocketServer(port) pack = packer() functions = [pack.getwithhash(f) for nothing in range(count)] ss.send(functions) if __name__ == "__main__": Thread(target=server).start() sleep(1) unpack = unpacker() for nothing in range(count): print unpack.unpack(Socket("127.0.0.1", port))

output:

 <function f at 0x0000000> <function f at 0x0000000>

+4

python hash distributed-computing

motoku Jun 05 '11 at 23:12

source share

1 answer

sarnold · Accepted Answer · 2011-06-06T00:02:49+0000

Now I have carefully looked at your code, and I have some comments:

This code looks so that it can easily protect against accidental modification of pickled objects while they are in flight. sha224 is a great hash algorithm and it will be easy to spot packets that have been accidentally changed that may still transmit the TCP checksum .
This code does not protect against malicious modification of pickled objects while they are in flight. There is no certainty that the packets come from a trusted member of the computer network, and there is no assurance that the packets have not been changed. (Or completely fallen.)

Using only a hashing algorithm cannot prove the source of the packets or prove that they were not maliciously changed: an attacker can simply recalculate the hash after changing the data and resend the packet.

There are several “normal approaches” to this problem: you can use a shared secret, a key common to all clients participating in the network. This key will be used as part of the hash key, such as HMAC , and the data recipients will recalculate the HMAC authentication code using the shared key. It's quick and easy (and legal in some jurisdictions that prohibit cryptographic software), but a common key is a giant responsibility if any one of them has its own key. (Compromised systems may not even be part of your threat model.)

You can also use shared secrets paid by the host. It works the same as a shared secret between all nodes, but if one client key has been cracked, only one client key needs to be replaced on all other systems.

You can also use public key cryptography to provide package signatures . Each client has a private key and a corresponding public key, which is known to all customers. A compromised private key still crashes the system, but it significantly reduces the number of keys you need to prepare. (Only one for each customer, not one for each customer pair: O (N) versus O (N ² ).)

Public key systems are fun to write yourself as a learning experience, but it's awful to try to program correctly. Protection against repeated attacks, selective deletion of messages, slicing / building messages, etc. Requires a lot of smart protocol design.

Thus, most people deploy a predefined transport security scheme such as SSLv3 or TLS . In combination with client certificates, it can easily provide assurances that both endpoints are those who, according to them, (to the level of compromised keys, of course), and ensure that data sent in a TLS-protected stream is delivered in the correct order and without fake.

TLS can work hard to properly configure. You can have just as good success with a simpler tool like ssh . Libraries are available , so you can manage connections programmatically, rather than relying on the ssh(1) and sshd(8) clients provided by the system.

How to protect Python distributed computing layer

More articles: