Extract the text separated by a separator using regex

I have an example input file as follows: columns Id, Name, start date, end date, age, description, location

220;John;23/11/2008;22/12/2008;28;Working as a Professor in University;Hyderabad 221;Paul;30;23/11/2008;22/12/2008;He is a Software engineer at MNC;Bangalore 222;Emma;23/11/2008;22/12/200825;Working as a mechanical enginner;Chennai 

It contains 30 rows of data. My requirement is to extract only descriptions from a text file.

My conclusion should contain

Work as a professor at the university

He is a software developer at MNC

works like a mechanical enginner

I need to find a regular expression to extract the description, and tried many kinds, but could not find a solution. Any suggestions?

+4
source share
3 answers

You can use this regex

 [^;]+(?=;[^;]*$) 

[^;] matches any character except ;

+ is a quantifier that matches the previous char or group 1 many times

* is a quantifier that matches the previous char or group 0 many times

$ is the end of the line

(?=pattern) is a view that checks to see if any particular pattern goes through

+11
source

/^(?:[^;]+;){3}([^;]+)/ will take the fourth group between semicolons.

Although, as pointed out in my comment, you should just split the line with a semicolon and grab the 4th split element ... that the whole point of the file is delimited - you do not need complex pattern matching.

Perl implementation example using your input example:

 open(my $IN, "<input.txt") or die $!; while(<$IN>){ (my $desc) = $_ =~ /^(?:[^;]+;){3}([^;]+)/; print "'$desc'\n"; } close $IN; 

gives:

 'Working as a Professor in University' 'He is a Software enginner at MNC' 'Working as a mechanical enginner' 
+2
source

This should work

 /^[^\s]+\s+[^\s]+\s+[^\s]+\s+(.+)\s+[^\s]+$/m 

or as a lone shepherd pointed out

 /^\S+\s+\S+\s+\S+\s+(.+)\s+\S+$/m 

or with half columns

 /^[^;]+;[^;]+;+[^;]+;+(.+);+[^;]+$/m 
0
source

All Articles