I have a project where I am provided with a file, and I need to extract the lines from the file. Basically think of the "string" command in linux, but I do it in python. The next condition is that the file is provided to me as a stream (for example, a string), so the obvious answer to using one of the subprocess functions to run strings is also not an option.
I wrote this code:
def isStringChar(ch):
if ord(ch) >= ord('a') and ord(ch) <= ord('z'): return True
if ord(ch) >= ord('A') and ord(ch) <= ord('Z'): return True
if ord(ch) >= ord('0') and ord(ch) <= ord('9'): return True
if ch in ['/', '-', ':', '.', ',', '_', '$', '%', '\'', '(', ')', '[', ']', '<', '>', ' ']: return True
return False
def process(stream):
dwStreamLen = len(stream)
if dwStreamLen < 4: return None
dwIndex = 0;
strString = ''
for ch in stream:
if isStringChar(ch) == False:
if len(strString) > 4:
strString = ''
else:
strString += ch
This technically works, but the WAY is slow. For example, I was able to use the strings command in the 500Meg executable, and it produced 300,000 lines of lines in less than 1 second. I ran the same file through the specified code, and it took 16 minutes.
Is there a library out there that will allow me to do this without the burden of delaying python?
Thank!