How to find all words appearing between brackets?

Question

How to find all words appearing between brackets?

I have a file containing a few words in brackets. I would like to compile a list of all the unique words that appear there, for example:

This is some (text). This (text) has some (words) in parenthesis. Sometimes, there are numbers, such as (123) in parenthesis too.

This will be the resulting list:

 text words 123

How can I list all elements between brackets?

+7

bash grep

Village May 19, '12 at 1:44

source share

5 answers

With GNU grep, you can use a perl-compatible regular expression with look-around statements to exclude parens:

 grep -Po '(?<=\().*?(?=\))' file.txt | sort -u

+5

glenn jackman May 19 '12 at 9:49

source share

grep -oE '$[[:alnum:]]*?$' | sed 's/[()]//g' | sort | uniq

-o Print only relevant text
-E means using extended regular expressions
\( means letter match
[[:alnum:]] is the POSIX character class for letters and numbers.

That sed script should strip the pars. This is tested against GNU grep, but BSD sed is so careful.

+3

mkb May 19 '12 at 2:00

source share

Play List:

 cat file.txt | sed 's/.*(\(.*\)).*/\1/'

To compile a list of unique words, you need to process the list further:

 cat file.txt | sed 's/.*(\(.*\)).*/\1/' | sort | uniq

+2

Mark o'connor May 19 '12 at 2:05

source share

You can try this

  sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename|awk -F'(' '{print $2}'|sort -u

Explaination:

The first sed statement puts the words in brackets in a new line, and the second sed replaces the ')' character with a new line. So, after following the instructions below

 sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename

The output will look like this:

 This is some (text .This (text has some (words in parenthesis. Sometimes, there are numbers, such as (123 in parenthesis too.

Now move this output below the awk statement, which prints the second word between the filter character '('

 awk -F'(' '{print $2}'

the conclusion will now be

 text text words 123

the above output is passed through the channel to sort the -u command to give unique words from the above output. Hope this explanation helps.

+1

Venkat madhav May 20 '12 at 17:42

source share

Steve · Accepted Answer · 2012-05-19T02:42:45+0000

You can use awk as follows:

awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt

prints:

 text text words 123

You can use an array to print unique values:

awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt

prints:

 text words 123

NTN

How to find all words appearing between brackets?

More articles: