Regex - matching leading and trailing spaces, spaces between opening and closing brackets and words, but not between words

I apologize if this question has already been answered, but I searched and can not find the answer. I am trying to write a regular expression that will match all leading and trailing spaces, the spaces between the opening and closing brackets and the word, but will not match the spaces between the words. The following are examples of the formatted data format that I process:

[Header] [ SomeSpace] [ Some1 More Space 15 ] 
  • no leading and trailing space, no spaces between brackets and just one word.

  • some leading and final space, the space between the opening bracket and the finite space.

  • some leading space, a space between the word and numbers, the space between the opening and closing brackets, and the ending space.

The closest one regex I came up with is:

 /[^\[\]a-zA-Z\d]/ 

But I can’t cancel only the spaces between words and numbers ...

The ruby ​​code I use as a workaround is:

 line.gsub!(/^\s*/, "") line.gsub!(/\[/, "") line.gsub!(/\]/, "") s = line.gsub!(/^\s*|\s*$/, "") s = "[" + s + "]\n" 

Obviously not very pretty ...

Any help on optimizing this into the elegant gsub line is welcome.

Thanks!

Lee

+4
source share
3 answers

If I understand your question correctly, you are trying to include this text

 [Header] [ SomeSpace] [ Some1 More Space 15 ] 

in it:

 [Header] [SomeSpace] [Some1 More Space 15] 

This regex will do the job. The key addition here is unwanted ? quantifier on the inner character class. This makes the character class match as little as possible and leaves the trailing space in brackets (if any) for the next greedy \s* .

 s/^\s*\[\s*([\w\s]*?)\s*\]\s*$/[$1]/g 

Ruby:

 line.gsub! /^\s*\[\s*([\w\s]*?)\s*\]\s*$/, '[\\1]' 

sed (ugly and most likely inactive. I'm not a sed master!)

 sed -Ee "s/^ *\[([a-zA-Z0-9 ]+)\] *$/\\1/g" -e "s/^ */[/g" -e "s/ *$/]/g" infile 
+3
source

Regex matches all of the extra replacement spaces:

 /(?<=^|\[)\s+|\s+(?=$|\])|(?<=\s)\s+/ 
  • The first part will correspond to all leading spaces at the beginning and inside the bracket.
  • The second part will match all trailing spaces at the end and inside the bracket.
  • The last part will detect a sequence of 2 or more spaces and remove the extra ones.

Just replace the matches with an empty string.

Test Data

  [Header] [ SomeSpace] [ Some1 More Space 15 ] [ Super Space ] [ ] [ ] [] [a] [a ] [ a] [ a ] [aa] [aaaaab] [ dasdasd dsd ] 
0
source

I do not know about the elegant, but simplest, perhaps:

 line.gsub /^\s*(\[)\s*|\s*(\])\s*$/, '\\1\\2' 
0
source

All Articles