Use regex to extract file path and save it in python

I have a text file that contains many path file.txt files:

C:\data\AS\WO\AS_WOP_1PPPPPP20070506.bin C:\data\AS\WO\AS_WOP_1PPPPPP20070606.bin C:\data\AS\WO\AS_WOP_1PPPPPP20070708.bin C:\data\AS\WO\AS_WOP_1PPPPPP20070808.bin ... 

What I did with Regex to extract the date from the path:

 import re textfile = open('file.txt', 'r') filetext = textfile.read() textfile.close() data = [] for line in filetext: matches = re.search("AS_[AZ]{3}_(.{7})([0-9]{4})([0-9]{2})([0-9]{2})", line) data.append(line) 

he does not give what I want.

My output should look like this:

 year month 2007 05 2007 06 2007 07 2007 08 

and then save it as a list of lists :

 [['2007', '5'], ['2007', '6'], ['2007', '7'], ['2007', '8']] 

or save it as pandas .

is there any way with regex to get what i want!

+6
source share
2 answers

try to do this with pandas:

 df = pd.read_csv('yourfile.txt',header=None) df.columns = ['paths'] # pandas string method extract takes a regex df['paths'].str.extract('(\d{4})(\d{2})') 

output:

  0 1 0 2007 05 1 2007 06 2 2007 07 3 2007 08 
+2
source

You can simplify your regex:

 /(....)(..)..\.bin$/ 

Group 1 will have a year, while Group 2 will have a month. I assume the format refers to the file.

Now . represents any character, and \. represents a "dot" or literal . . $ means at the end of the line. So, I match .bin at the end of the line and leave the day and just group the year and month.

+3
source

All Articles