Splitting a large text file with a delimiter in Python

I am creating an image, it will be a simple task, but I cannot find what I am looking for exactly in the previous StackOverflow questions, here ...

I have large text files in a proprietary format that look like this:

:Entry - Name John Doe - Date 20/12/1979 :Entry -Name Jane Doe - Date 21/12/1979 

And so on.

Text files range in size from 10 KB to 100 MB. I need to split this file into a separator :Entry . How can I process each file based on :Entry blocks?

+8
python text-parsing
source share
2 answers

You can use itertools.groupby to group the lines that appear after :Entry into lists:

 import itertools as it filename='test.dat' with open(filename,'r') as f: for key,group in it.groupby(f,lambda line: line.startswith(':Entry')): if not key: group = list(group) print(group) 

gives

 ['- Name\n', 'John Doe\n', '\n', '- Date\n', '20/12/1979\n'] ['\n', '-Name\n', 'Jane Doe\n', '- Date\n', '21/12/1979\n'] 

Or, to process groups, you do not need to convert group to a list:

 with open(filename,'r') as f: for key,group in it.groupby(f,lambda line: line.startswith(':Entry')): if not key: for line in group: ... 
+13
source share

If each input block starts with a colon, you can simply divide it into this:

 with open('entries.txt') as fp: contents = fp.read() for entry in contents.split(':'): # do something with entry 
+3
source share

All Articles