How can I match a square bracket in a regex with grep?

I am trying to match both [ and ] with grep, but only to match [ . No matter how I try, I can’t figure out how to combine it correctly ] .

Here is a sample code:

 echo "fdsl[]" | grep -o "[ az]\+" #this prints fdsl echo "fdsl[]" | grep -o "[ \[az]\+" #this prints fdsl[ echo "fdsl[]" | grep -o "[ \]az]\+" #this prints nothing echo "fdsl[]" | grep -o "[ \[\]az]\+" #this prints nothing 

Edit: my original regex on which I need to do this is as follows:

 echo "fdsl[]" | grep -o "[ \[\]\t\na-zA-Z\/:\.0-9_~\"'+,;*\=()$\ !@ #&?-]\+" #this prints nothing 

NB: I tried all the answers from this , but it did not help in this particular case. And I need to use these brackets inside [] .

+7
bash regex grep
source share
3 answers

According to the BRE / ERE section with parentheses in the POSIX regular expression specification:

  • [...] The right bracket ( ']' ) loses its special meaning and appears in the expression of the bracket if it appears first in the list (after the initial stroke ( '^' ), if any), Otherwise, it must stop parenthesis expression if it does not appear in the character matching (for example, "[.].]" ) or is the final right parenthesis for the character matching, equivalence class or character class. Special characters '.' , '*' , '[' and '\' (period, asterisk, left bracket and backslash, respectively) lose their special meaning in the expression of the bracket.

and

  1. [...] If the expression in the bracket indicates both '-' and ']' , place ']' (after '^' , if any) and '-' last in the bracket expression.

Therefore, your regular expression should be:

 echo "fdsl[]" | grep -Eo "[][ az]+" 

Pay attention to the E flag, which indicates the use of ERE, which supports quantifier + . + quantifier is not supported in BRE (default mode).

The solution in Mike Holt answers "[][az ]\+" with escaped + , it works because it works on GNU grep, which extends the grammar to support \+ to repeat one or more times . In fact, this is undefined behavior in accordance with the POSIX standard (which means that the implementation can give meaningful behavior and document it or produce a syntax error, or something else).

If everything is fine with the assumption that your code can only work in the GNU environment, then it makes full use of Mike Holt's answer . Using sed as an example, you are stuck in BRE when using POSIX sed (there is no flag to switch to ERE) and it is cumbersome to write even a simple regular expression with POSIX BRE, where there is only a specific quantifier * .

Original regex

Note that grep consumes the input file line by line, and then checks if the line matches the regular expression. Therefore, even if you use the P flag with your original regular expression, \n always redundant, since the regular expression cannot match in lines.

While it is possible to map the horizontal tab without the P flag , I think it is more natural to use the P flag for this task.

Given this input:

 $ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\ !@ #$%^&*()_+-=~\`89" fds l[]kSAJD<>?,./:";'{}|[]\ !@ #$%^&*()_+-=~`89 

The original regex in the question works with a slight modification (unescape + at the end):

 $ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\ !@ #$%^&*()_+-=~\`89" | grep -Po "[ \[\]\t\na-zA-Z\/:\.0-9_~\"'+,;*\=()$\ !@ #&?-]+" fds l[]kSAJD ?,./:";' [] !@ #$ &*()_+-=~ 89 

Although we can remove \n (as it is redundant, as explained above) and several other unnecessary shoots:

 $ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\ !@ #$%^&*()_+-=~\`89" | grep -Po "[ \[\]\ta-zA-Z/:.0-9_~\"'+,;*=()$\ !@ #&?-]+" fds l[]kSAJD ?,./:";' [] !@ #$ &*()_+-=~ 89 
+6
source share

One problem is that [ is a special character in the expression, and it cannot get the escape code with \ (at least not in my tastes of grep). The solution is to define it as [[] .

+5
source share

According to regular-expressions.info :

In most varieties of regular expressions, the only special characters or metacharacters within the character class are the closing bracket (]), backslash (\), carriage (^), and hyphen (-). Regular metacharacters are normal characters within a character class and do not require backslash escaping.

... and ...

A closing bracket (]), a carriage (^), and a hyphen (-) can be included by escaping them with a backslash or by placing them in a position in which they do not take their special meaning.

So, assuming that the special flavor of the regex syntax supported by grep matches this, then I expected "[ az[\]]\+" to work.

However, my version of grep (GNU grep 2.14) only matches "[]" at the end of "fdsl[]" with this regular expression.

However, I tried to use another technique mentioned in this quote (putting ] in a position in the character class, where it cannot perceive its usual meaning and seems to have worked:

 $ echo "fdsl[]" | grep -o "[][az ]\+" fdsl[] 
+3
source share

All Articles