Extract the text separated by a separator using regex

Question

Extract the text separated by a separator using regex

I have an example input file as follows: columns Id, Name, start date, end date, age, description, location

220;John;23/11/2008;22/12/2008;28;Working as a Professor in University;Hyderabad 221;Paul;30;23/11/2008;22/12/2008;He is a Software engineer at MNC;Bangalore 222;Emma;23/11/2008;22/12/200825;Working as a mechanical enginner;Chennai

It contains 30 rows of data. My requirement is to extract only descriptions from a text file.

My conclusion should contain

Work as a professor at the university
He is a software developer at MNC
works like a mechanical enginner

I need to find a regular expression to extract the description, and tried many kinds, but could not find a solution. Any suggestions?

+4

regex

mahodaya Feb 19 '13 at 4:53

source share

3 answers

/^(?:[^;]+;){3}([^;]+)/ will take the fourth group between semicolons.

Although, as pointed out in my comment, you should just split the line with a semicolon and grab the 4th split element ... that the whole point of the file is delimited - you do not need complex pattern matching.

Perl implementation example using your input example:

 open(my $IN, "<input.txt") or die $!; while(<$IN>){ (my $desc) = $_ =~ /^(?:[^;]+;){3}([^;]+)/; print "'$desc'\n"; } close $IN;

gives:

 'Working as a Professor in University' 'He is a Software enginner at MNC' 'Working as a mechanical enginner'

+2

Lone shepherd Feb 19 '13 at 5:13

source share

This should work

 /^[^\s]+\s+[^\s]+\s+[^\s]+\s+(.+)\s+[^\s]+$/m

or as a lone shepherd pointed out

 /^\S+\s+\S+\s+\S+\s+(.+)\s+\S+$/m

or with half columns

 /^[^;]+;[^;]+;+[^;]+;+(.+);+[^;]+$/m

0

Eric Feb 19 '13 at 5:01

source share

Anirudha · Accepted Answer · 2013-02-19T05:27:04+0000

You can use this regex

 [^;]+(?=;[^;]*$)

[^;] matches any character except ;

+ is a quantifier that matches the previous char or group 1 many times

* is a quantifier that matches the previous char or group 0 many times

$ is the end of the line

(?=pattern) is a view that checks to see if any particular pattern goes through

Extract the text separated by a separator using regex

More articles: