Python Multiprocessing: Only One Process Runs

I am trying to start several parallel processes using the Python multiprocessing module. Basically, I did something like

pool = Pool(30)
results = [pool.apply_async(foo, (trainData, featureVector, terms, selLabel)) for selLabel in selLabels]
for r in results:
    tmp = r.get()
    modelFiles[tmp[0]] = tmp[1]

30 processes were created, but most of the processes were started while only one process was running. Below I get from ps:

PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

31267 74.6  2.4 7125412 6360080 pts/1 Sl+  13:06  24:25  \_ python2.6 /home/PerlModules/Python/DoOVA.py

31427 27.4  2.3 6528532 6120904 pts/1 R+   13:20   5:18      \_ python2.6 /home/PerlModules/Python/DoOVA.py

31428  0.0  1.3 4024724 3617016 pts/1 S+   13:20   0:00      \_ python2.6 /home/PerlModules/Python/DoOVA.py

31429  0.0  1.3 4024724 3617016 pts/1 S+   13:20   0:00      \_ python2.6 /home/PerlModules/Python/DoOVA.py

31430  0.0  1.3 4024724 3617016 pts/1 S+   13:20   0:00      \_ python2.6 /home/PerlModules/Python/DoOVA.py

DoOVA.pyis a script I'm running. Most of them have status S+.

Can someone give me some idea what the problem is? I know that the input argument is featureVectorquite large in size, say, about 300 MB. Would this be a problem? The machine I work for has several TB of memory.

foo does something like:

def foo(trainData, featureVector, terms, selLabel, penalty):
    outputFile = 'train_'+selLabel+'.dat'
    annotation = dict()
    for id in trainData:
        if trainData[id] == selLabel:
            annotation[id] = '1'
        else:
            annotation[id] = '-1'
    try:
        os.mkdir(selLabel)
        os.chdir(selLabel)
    except OSError:
        os.chdir(selLabel)
    ###Some more functions, which involves a command line call through call from subprocess module
    os.chdir('../')
    return (selLabel, 'SVM_' + selLabel + '.model')

. 100 cpus. script , - , foo os.mkdir()

+4
1

, featureVector initializer initargs Pool. Unix- ( selLabels 1 ), , os.fork. , foo, featureVector , . , , , , featureVector , .

, , , , :

Pool, 30 , , Pool. , . . , , .

pool.apply_async , foo, . , 300 , , . . ( , ) , .

64k ( Linux), , . , , , , , . , . .

foo, foo . , , foo. ( , - , , foo .) foo , , . foo , , , foo, foo.

, foo , , , , featureVector , , . foo , featureVector , . , foo, , foo, foo . .

, - :

def child_initialize(_trainData, _featureVector, _terms):
     global trainData, featureVector, terms
     trainData = _trainData
     featureVector = _featureVector
     terms = _terms

def foo(selLabel):
     ...

pool = Pool(30, initialize = child_initialize, initargs = (trainData, featureVector, terms))
results = [pool.apply_async(foo, (selLabel,)) for selLabel in selLabels]

trainData term initargs , .

, , ps runable , . foo , " ".

+12

All Articles