Print the first paragraph in python

I have a book in a text file, and I need to print the first paragraph of each section. I thought that if I find text between \ n \ n and \ n, I will find my answer. Here are my codes and it didn't work. Can you tell me where I am going wrong?

lines = [line.rstrip('\n') for line in open('G:\\aa.txt')] check = -1 first = 0 last = 0 for i in range(len(lines)): if lines[i] == "": if lines[i+1]=="": check = 1 first = i +2 if i+2< len(lines): if lines[i+2] == "" and check == 1: last = i+2 while (first < last): print(lines[first]) first = first + 1 

I also found the code in stackoverflow, which I tried, but it just printed an empty array.

 f = open("G:\\aa.txt").readlines() flag=False for line in f: if line.startswith('\n\n'): flag=False if flag: print(line) elif line.strip().endswith('\n'): flag=True 

I shared a sample of this book in the city.

I

LAND PLAN

There is a vast area of ​​fascinating human interests that lie only at our doors, which are still little studied. This is an animal intelligence field.

Of all the types of interest associated with the study of world wildlife, there is not one who is superior to the study of their minds, their morality and the actions that they perform as the results of their mental processes.

II

ANIMAL TEMPERAMENT AND INDIVIDUALITY

What I'm trying to do here is find the strings in uppercase and put them in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indices of these elements of this array that I created.

The output should look like this:

There is a vast area of ​​fascinating human interests that lie only at our doors, which are still little studied. This is an animal intelligence field.

What I'm trying to do here is find the strings in uppercase and put them in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indices of these elements of this array that I created.

+6
source share
5 answers

If you want to group partitions, you can use itertools.groupby , using empty lines as separators:

 from itertools import groupby with open("in.txt") as f: for k, sec in groupby(f,key=lambda x: bool(x.strip())): if k: print(list(sec)) 

With a few more itertools foo, we can get the sections using the uppercase header as a separator:

 from itertools import groupby, takewhile with open("in.txt") as f: grps = groupby(f,key=lambda x: x.isupper()) for k, sec in grps: # if we hit a title line if k: # pull all paragraphs v = next(grps)[1] # skip two empty lines after title next(v,""), next(v,"") # take all lines up to next empty line/second paragraph print(list(takewhile(lambda x: bool(x.strip()), v))) 

What will give you:

 ['There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.\n'] ['What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.'] 

The beginning of each section has all the headings in uppercase, so as soon as we get there, we know that there are two empty lines, then the first paragraph and the sample are repeated.

To break it down into using loops:

 from itertools import groupby from itertools import groupby def parse_sec(bk): with open(bk) as f: grps = groupby(f, key=lambda x: bool(x.isupper())) for k, sec in grps: if k: print("First paragraph from section titled :{}".format(next(sec).rstrip())) v = next(grps)[1] next(v, ""),next(v,"") for line in v: if not line.strip(): break print(line) 

For your text:

 In [11]: cat -E in.txt THE LAY OF THE LAND$ $ $ There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.$ $ Of all the kinds of interest attaching to the study of the world wild animals, there are none that surpass the study of their minds, their morals, and the acts that they perform as the results of their mental processes.$ $ $ WILD ANIMAL TEMPERAMENT & INDIVIDUALITY$ $ $ What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created. 

Dollar signs are new lines, output:

 In [12]: parse_sec("in.txt") First paragraph from section titled :THE LAY OF THE LAND There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence. First paragraph from section titled :WILD ANIMAL TEMPERAMENT & INDIVIDUALITY What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created. 
+7
source

There is always a regular expression ....

 import re with open("in.txt", "r") as fi: data = fi.read() paras = re.findall(r""" [IVXLCDM]+\n\n # Line of Roman numeral characters [^az]+\n\n # Line without lower case characters (.*?)\n # First paragraph line """, data, re.VERBOSE) print "\n\n".join(paras) 
+1
source

Scroll through the code you found, line by line.

 f = open("G:\\aa.txt").readlines() flag=False for line in f: if line.startswith('\n\n'): flag=True if flag: print(line) elif line.strip().endswith('\n'): flag=True 

It never seems to set the flag variable to true.

And if you can share some examples from your book, it will be more useful for everyone.

0
source

This should work if there are no paragraphs with all the headers:

  f = open('file.txt') for line in f: line = line.strip() if line: for c in line: if c < 'A' or c > 'Z': # check for non-uppercase chars break else: # means the line is made of all caps ie I, II, etc, meaning new section f.readline() # discard chapter headers and empty lines f.readline() f.readline() print(f.readline().rstrip()) # print first paragraph f.close() 

If you also want to get the last paragraph, you can track the last visible line containing lowercase letters, and then as soon as you find the entire uppercase line (I, II, etc.), indicating a new section, then you print the very last line , as this will be the last paragraph in the previous section.

0
source

TXR solution

  $ txr firstpar.txr data
 There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored.  It is the Field of Animal Intelligence.
 What I am trying to do here is, find the uppercase lines, and put them all in an array.  Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.

Code in firstpar.txr :

  @ (repeat)
 @num

 @title

 @firstpar
 @ (require (and (<(length num) 5)
                  [some title chr-isupper]
                  (not [some title chr-islower])))
 @ (do (put-line firstpar))
 @ (end)

Basically we are looking for input to match the pattern for a three-element multi-line pattern that binds the variables num , title and firstpar . Now this template, as such, may coincide in the wrong places, so add some restraining heuristics with the require statement. The section number should be a short string, and the title bar should contain uppercase letters and not contain lowercase letters. This expression is written to TXR Lisp.

If we get a match with this restriction, we print the string written in the firstpar variable.

0
source

All Articles