I tried the Multiprocessing module to convert a list of text files to BERT Embedding.
A BERT attachment is created for each file, but the process does not end for a specific file.
Previously, I used the join () operation to terminate processes, but earlier it came to a standstill.
So, as suggested here
Process.join () and Queue () do not work in case of large numbers
I made changes to the code to replace process.join ()
from multiprocessing import Process import multiprocessing import time import sys def process(file,appended_data): start = datetime.now() file1_obj = open(form_path + file, 'r') file1 = file1_obj.readlines() file1_obj.close() file11=[i.rstrip() for i in file1 if not(bool(not i or i.isspace()))] file111=[' |||'.join(file11)] try: bc=BertClient() embedding1=bc.encode(file111) del bc except ValueError:
nevertheless, a deadlock happens in the case of certain files.
Description below:
Executing the embedding_dic function on a list of files results in
No of files available : 7 Files _names: ['0001368007_10-K_2007-03-22.txt', '0001368007_10-K_2008-03-25.txt', '0001368007_10-K_2009-02-27.txt', '0001368007_10-K_2010-03-01.txt', '0001368007_10-K_2011-02-28.txt', '0001368007_10-K_2012-02-29.txt', '0001368007_10-K_2012-02-29.txt'] Processes_started: [<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>] 0 [<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>] 0 [<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>] 0 [<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>] 0 [<Process(Process-1899, started)>, <Process(Process-1900, started)>, <Process(Process-1901, started)>, <Process(Process-1902, started)>, <Process(Process-1903, started)>, <Process(Process-1904, started)>, <Process(Process-1905, started)>] 0 finished 0001368007_10-K_2009-02-27.txt 0:00:03.055049 finished 0001368007_10-K_2012-02-29.txt 0:00:03.023879 finished 0001368007_10-K_2012-02-29.txt 0:00:03.055496 finished 0001368007_10-K_2010-03-01.txt 0:00:03.096127 finished 0001368007_10-K_2011-02-28.txt 0:00:03.099099 [<Process(Process-1899, started)>, <Process(Process-1900, started)>] 5 finished 0001368007_10-K_2008-03-25.txt 0:00:04.473414 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 [<Process(Process-1899, started)>] 6 Process Process-1899: File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/site-packages/bert_serving/client/__init__.py", line 206, in arg_wrapper return func(self, *args, **kwargs) [<Process(Process-1899, started)>] 6 Traceback (most recent call last): File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "<ipython-input-315-ffe782d1c2f5>", line 12, in process embedding1=bc.encode(file111) File "/home/jovyan/.conda/envs/pycp_py3k/lib/python3.6/site-packages/bert_serving/client/__init__.py", line 291, in encode r = self._recv_ndarray(req_id)
Thus, this process is at an impasse with the file 0001368007_10-K_2007-03-22.txt, when a list of files is specified as input.
In case I try only with the same file as the input. It ends !!!
It ends even if the number of files is saved up to 5.
Even for some other list of files that have files larger than 7, for example 10 or 12. The process ends.
I cannot debug the same thing.
Another symptom I noticed is that
- if I restart the code after a while, it will end!
Help appreciated!