If you want to group partitions, you can use itertools.groupby , using empty lines as separators:
from itertools import groupby with open("in.txt") as f: for k, sec in groupby(f,key=lambda x: bool(x.strip())): if k: print(list(sec))
With a few more itertools foo, we can get the sections using the uppercase header as a separator:
from itertools import groupby, takewhile with open("in.txt") as f: grps = groupby(f,key=lambda x: x.isupper()) for k, sec in grps: # if we hit a title line if k: # pull all paragraphs v = next(grps)[1] # skip two empty lines after title next(v,""), next(v,"") # take all lines up to next empty line/second paragraph print(list(takewhile(lambda x: bool(x.strip()), v)))
What will give you:
['There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.\n'] ['What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.']
The beginning of each section has all the headings in uppercase, so as soon as we get there, we know that there are two empty lines, then the first paragraph and the sample are repeated.
To break it down into using loops:
from itertools import groupby from itertools import groupby def parse_sec(bk): with open(bk) as f: grps = groupby(f, key=lambda x: bool(x.isupper())) for k, sec in grps: if k: print("First paragraph from section titled :{}".format(next(sec).rstrip())) v = next(grps)[1] next(v, ""),next(v,"") for line in v: if not line.strip(): break print(line)
For your text:
In [11]: cat -E in.txt THE LAY OF THE LAND$ $ $ There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.$ $ Of all the kinds of interest attaching to the study of the world wild animals, there are none that surpass the study of their minds, their morals, and the acts that they perform as the results of their mental processes.$ $ $ WILD ANIMAL TEMPERAMENT & INDIVIDUALITY$ $ $ What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.
Dollar signs are new lines, output:
In [12]: parse_sec("in.txt") First paragraph from section titled :THE LAY OF THE LAND There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence. First paragraph from section titled :WILD ANIMAL TEMPERAMENT & INDIVIDUALITY What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.