Extract specific section from LaTeX file using python

Question

Extract specific section from LaTeX file using python

I have a set of LaTeX files. I would like to highlight an “abstract” section for each of them:

\begin{abstract} ..... \end{abstract}

I tried the sentence here: How to parse LaTex file

And tried:

 A = re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data)

If the data contains text from a LaTeX file. But A is just an empty list. Any help would be greatly appreciated!

+5

python regex latex

user3745472 Aug 26 '16 at 1:30

source share

2 answers

. does not match the newline character. However, you can pass a flag to ask it to include newline characters.

Example:

 import re s = r"""\begin{abstract} this is a test of the linebreak capture. \end{abstract}""" pattern = r'\\begin\{abstract\}(.*?)\\end\{abstract\}' re.findall(pattern, s, re.DOTALL) #output: ['\nthis is a test of the\nlinebreak capture.\n']

+1

James Aug 26 '16 at 1:48

source share

John1024 · Accepted Answer · 2016-08-26T01:45:00+0000

.* does not match newlines unless the re.S flag is specified:

 re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S)

Example

Consider this test file:

 \documentclass{report} \usepackage[margin=1in]{geometry} \usepackage{longtable} \begin{document} Title maybe \begin{abstract} Good stuff \end{abstract} Other stuff \end{document}

This gets the abstract:

 >>> import re >>> data = open('a.tex').read() >>> re.findall(r'\\begin{abstract}(.*?)\\end{abstract}', data, re.S) ['\nGood stuff\n']

Documentation

On the web page of the re module:

re.S
re.DOTALL
Make a '.' a special character matches any character in everything, including a new line; without this flag ". will match anything but a new line.

Extract specific section from LaTeX file using python

Example

Documentation

More articles: