How to use regexp with Awk to extract substring between parentheses?

On the next Bash command line, I can get the index for the substring when the substring is between double quotes.

text='123ABCabc((XYZabc((((((abc123(((123' echo $text | awk '{ print index($0, "((((a" )}' # 20 is the result. 

However, in my application, I will not know what character will be, where "a" is in this example. So I thought I could replace “a” with a regular expression that took any character other than "(" I thought that / [^ (} / would be what I needed. However, I was unable to get the index of the awk command to work with any form of regular expression instead of "(((((a" in the example.

UPDATE: William Pursell noted that the index operation does not accept regex as the second operand.

I ended up trying to extract a substring located after four or more "(" followed by one or more ")". Dennis Williamson provided a solution using the following code:

 echo 'dksjfkdj(((((((I-WANT-THIS-SUBSTRING)askdjflsdjf' | mawk '{match($0,/\(\(\(\([^()]*\)/); s = substr($0,RSTART, RLENGTH); gsub(/[()]/, "", s); print s}' 

Thank you all for your help!

+4
source share
3 answers

To get the position of the first non-open bracket after their sequence:

 $ echo "$text" | awk '{ print match($0, /\(\(\(\(([^(])/, arr); print arr[1, "start"]}' 20 24 

Shows the position of the substring "((([^ (]" (20) and the position of the character after parentheses (24).

The ability to do this with match() is an extension of GNU ( gawk ).

Edit:

 echo 'dksjfkdj(((((((I-WANT-THIS-SUBSTRING)askdjflsdjf' | mawk '{match($0,/\(\(\(\([^()]*\)/); s = substr($0,RSTART, RLENGTH); gsub(/[()]/, "", s); print s}' 
+3
source

You want match instead of index. And you need to avoid ( s. For example:

 echo $text | awk '{ print match($0, /\(\(\(\([^(]/) }' 

Note that this does not give the index of the character after the string (((( , but the index of the first ( .

+1
source

If you want to combine four or more open parentheses to find the beginning of another substring in a match, you really need to calculate the value.

 # Use GNU AWK to index the character after the end of a substring. echo "$text" | awk --re-interval 'match( $0, /\({4,}/ ) { print RSTART + RLENGTH }' 

This should give you the correct starting character index after the sequence of parentheses, which in this case is 24.

+1
source

Source: https://habr.com/ru/post/1415403/


All Articles