How to find all words appearing between brackets?

I have a file containing a few words in brackets. I would like to compile a list of all the unique words that appear there, for example:

This is some (text). This (text) has some (words) in parenthesis. Sometimes, there are numbers, such as (123) in parenthesis too. 

This will be the resulting list:

 text words 123 

How can I list all elements between brackets?

+7
source share
5 answers

You can use awk as follows:

awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt

prints:

 text text words 123 

You can use an array to print unique values:

awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt

prints:

 text words 123 

NTN

+17
source

With GNU grep, you can use a perl-compatible regular expression with look-around statements to exclude parens:

 grep -Po '(?<=\().*?(?=\))' file.txt | sort -u 
+5
source

grep -oE '\([[:alnum:]]*?\)' | sed 's/[()]//g' | sort | uniq

  • -o Print only relevant text
  • -E means using extended regular expressions
  • \( means letter match
  • [[:alnum:]] is the POSIX character class for letters and numbers.

That sed script should strip the pars. This is tested against GNU grep, but BSD sed is so careful.

+3
source

Play List:

 cat file.txt | sed 's/.*(\(.*\)).*/\1/' 

To compile a list of unique words, you need to process the list further:

 cat file.txt | sed 's/.*(\(.*\)).*/\1/' | sort | uniq 
+2
source

You can try this

  sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename|awk -F'(' '{print $2}'|sort -u 

Explaination:

The first sed statement puts the words in brackets in a new line, and the second sed replaces the ')' character with a new line. So, after following the instructions below

 sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename 

The output will look like this:

 This is some (text .This (text has some (words in parenthesis. Sometimes, there are numbers, such as (123 in parenthesis too. 

Now move this output below the awk statement, which prints the second word between the filter character '('

 awk -F'(' '{print $2}' 

the conclusion will now be

 text text words 123 

the above output is passed through the channel to sort the -u command to give unique words from the above output. Hope this explanation helps.

+1
source

All Articles