Extract a specific pattern from strings using sed, awk or perl

Can I use sed if I need to extract a template enclosed in a specific template if it exists in a string?

Suppose I have a file with the following lines:

There are many people who dare not kill themselves for [/ fear /] what the neighbors say.

Advice is what we ask for when we already know the answer /* */ , but want us not to.

In both cases, I have to scan the string for the first counter pattern, i.e. '[ / ' or ' /* ' as appropriate, and save the next template until it completes, i.e. ' / ]' or ' */ ' respectively.

In short, I need fear and answer . If possible, it can be extended to several lines, in the sense that if the output pattern occurs on a line other than the same.

Any help in the form of suggestions or algorithms is welcome. Thanks in advance for your answers.

+7
source share
3 answers
 use strict; use warnings; while (<DATA>) { while (m#/(\*?)(.*?)\1/#g) { print "$2\n"; } } __DATA__ There are many who dare not kill themselves for [/fear/] of what the neighbors will say. Advice is what we ask for when we already know the /* answer */ but wish we didn't. 

As single line:

 perl -nlwe 'while (m#/(\*?)(.*?)\1/#g) { print $2 }' input.txt 

The inner while loop will cycle through all matches using the /g modifier. The backreference \1 function ensures that we will only match the same open / close tags.

If you need to match blocks that span multiple lines, you need to reset the input:

 use strict; use warnings; $/ = undef; while (<DATA>) { while (m#/(\*?)(.*?)\1/#sg) { print "$2\n"; } } __DATA__ There are many who dare not kill themselves for [/fear/] of what the neighbors will say. /* foofer */ Advice is what we ask for when we already know the /* answer */ but wish we didn't. foo bar / baz baaz / fooz 

Single line:

 perl -0777 -nlwe 'while (m#/(\*?)(.*?)\1/#sg) { print $2 }' input.txt 

The switch -0777 and $/ = undef will cause the file to break, which means that all files are read into the scalar. I also added the /s modifier to a wildcard . match newline characters.

Explanation for a regular expression: m#/(\*?)(.*?)\1/#sg

 m# # a simple m//, but with # as delimiter instead of slash /(\*?) # slash followed by optional * (.*?) # shortest possible string of wildcard characters \1/ # backref to optional *, followed by slash #sg # s modifier to make . match \n, and g modifier 

The "magic" here is that a star * is required for backreference only if it is found before it.

+4
source

Fast and dirty way in awk

 awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' input_file 

Test:

 $ cat file There are many who dare not kill themselves for [/fear/] of what the neighbors will say. Advice is what we ask for when we already know the /* answer */ but wish we didn't. $ awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' file fear answer 
+1
source

Single Line Matches

If you really want to do this in sed, you can easily remove your delimited templates if they are on the same line.

 # Using GNU sed. Escape a whole lot more if your sed doesn't handle # the -r flag. sed -rn 's![^*/]*(/\*?.*/).*!\1!p' /tmp/foo 

Multi-Line Matches

If you want to do multi-line matches with sed, things get a little ugly. However, this can certainly be done.

 # Multi-line matching of delimiters with GNU sed. sed -rn ':loop /\/[^\/]/ { N s![^*/]+(/\*?.*\*?/).*!\1!p T loop }' /tmp/foo 

The trick is to look for the starting delimiter, and then continue to add lines in the loop until you find the ending delimiter.

This works very well if you really have a trailing delimiter. Otherwise, the contents of the file will be added to the template space until sed finds it, or until it reaches the end of the file. This can cause problems with some versions of sed or with really large files whose size is out of control.

For more information, see Limitations and Limitations without Limitations .

+1
source

All Articles