How to remove english text from arabic string in python?

I have an Arabic line with English text and punctuation. I need to filter the Arabic text, and I tried to remove punctuation and English words using a stinger. However, I have lost the gap between the Arabic words. Where am I mistaken?

import string exclude = set(string.punctuation) main_text = "وزارة الداخلية: لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا http://alriyadh.com/1031499" main_text = ''.join(ch for ch in main_text if ch not in exclude) [output after this step="وزارة الداخلية لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا httpalriyadhcom1031499]" n = filter(lambda x: x not in string.printable, n) print n وزارةالداخليةلاتتوفرلدينامعلوماترسميةعنسعوديينموقوفينفيليبيا 

I can remove punctuation and text in English, but I lost the space between words. How to save all the words?

+5
source share
2 answers

You can save spaces in your line using

 n = filter(lambda x: True if x==' ' else x not in string.printable , main_text) 

or

 n = filter(lambda x: x==' ' or x not in string.printable , main_text) 

This will check if the character is a space, if not, it checks if it can be printed.

+5
source

You can stop it when deleting any space as follows:

 n = filter(lambda x: x in string.whitespace or x not in string.printable, n) 
+4
source

Source: https://habr.com/ru/post/1216665/


All Articles