Using ^ to match the beginning of a string in a Python regex

I am trying to extract the Thomson-Reuters-style ISI year data. The line "Year of publication" looks like this (at the very beginning of the line):

PY 2015 

For the script I write, I defined the following regex function:

 import re f = open('savedrecs.txt') wosrecords = f.read() def findyears(): result = re.findall(r'PY (\d\d\d\d)', wosrecords) print result findyears() 

This, however, gives false positive results, since the pattern may appear elsewhere in the data.

So, I only want to match the pattern at the beginning of the line. Normally I would use ^ for this purpose, but r'^PY (\d\d\d\d)' failed to match my results. Using \n on the other hand, seems like what I want, but it can lead to further complications for me.

+6
source share
3 answers
 re.findall(r'^PY (\d\d\d\d)', wosrecords, flags=re.MULTILINE) 

should work let me know if it is not. I have no data.

+7
source

Use re.search with re.M :

 import re p = re.compile(r'^PY\s+(\d{4})', re.M) test_str = "PY123\nPY 2015\nPY 2017" print(re.findall(p, test_str)) 

Watch the IDEONE demo

EXPLANATION

  • ^ - start of line (due to re.M )
  • PY - Literal PY
  • \s+ - 1 or more spaces
  • (\d{4}) - capture group containing 4 digits
+2
source

In this particular case, there is no need to use regular expressions, because the search string is always "PY" and, as expected, will be at the beginning of the string, so string.find can be used for this task. The find function returns the position at which the substring in the given line or line is located, therefore, if it is found at the beginning of the line, the return value is 0 (-1, if not found at all), that is :: /// p>

 In [12]: 'PY 2015'.find('PY') Out[12]: 0 In [13]: ' PY 2015'.find('PY') Out[13]: 1 

Perhaps it would be nice to break the white spaces, i.e.:

 In [14]: ' PY 2015'.find('PY') Out[14]: 2 In [15]: ' PY 2015'.strip().find('PY') Out[15]: 0 

And further, if only a year is of interest, it can be extracted using split, i.e.:

 In [16]: ' PY 2015'.strip().split()[1] Out[16]: '2015' 
0
source

All Articles