Repeating a regular expression pattern

First, I don’t know if this is really possible, but I want to repeat the regex pattern. I am using a template:

sed 's/[^-\t]*\t[^-\t]*\t\([^-\t]*\).*/\1/' films.txt 

entrance

 250. 7.9 Shutter Island (2010) 110,675 

Will return:

 Shutter Island (2010) 

I map all the tabs (250.), then the tab, and then all the tabs (7.9), and then the tab. Then I love the name of the movie, and then match all the other characters (110 675).

This works well, but I study the regex and it looks ugly, the regex [^ - \ t] * \ t is repeated right after it, is there a way to repeat this as if you can use a character like {2,2}?

I tried ([^-\t]*\t){2,2} (and options), but I assume I'm trying to match [^-\t]*\t\t?

Also, if there is any way to make my above code shorter and cleaner, any help would be greatly appreciated.

+4
source share
6 answers

I think you are mistaken. If you just want to extract the name of the movie and it will release the year, you can try this regex:

 (?:\t)[\w ()]+(?:\t) 

As seen here:

http://regexr.com?2sd3a

Please note that it matches the tab character at the beginning and end of the actual line you want, but does not include them in the corresponding group.

+2
source

This works for me:

 sed 's/\([^\t]*\t\)\{2\}\([^\t]*\).*/\2/' films.txt 

If your sed supports -r , you can get rid of most of the escaping:

 sed -r 's/([^\t]*\t){2}([^\t]*).*/\2/' films.txt 

Change the first 2 to select different fields (0-3).

This will also work:

 sed 's/[^\t]\+/\n&/3;s/.*\n//;s/\t.*//' films.txt 

Change 3 to select different fields (1-4).

+5
source

To use repeating curly braces and group brackets with sed correctly, you might need to avoid it with backslashes, like

sed 's/\([^-\t]*\t\)\{3\}.*/\1/' films.txt

Yes, this command will work correctly with your example.

If you feel annoyed, you can choose the -r option, which allows extended regex mode and forget about the backslash escape codes in brackets.

sed -r 's/([^-\t]*\t){3}.*/\1/' films.txt

It was discovered that this is almost the same as Denis Williamson's answer, but I leave it because it is a shorter expression to do the same.

+3
source

You can repeat everything by putting them in parentheses, for example:

 ([^-\t]*\t){2,2} 

And the full template to match the title will be as follows:

 ([^-\t]*\t){2,2}([^-\t]+).* 

You said you tried. I'm not sure if it was different, but the above worked for me on your sample data.

+2
source

why are you doing something difficult

 $ awk '{$1=$2=$NF=""}1' file Shutter Island (2010) 
+2
source

If it is a file with a shared tab with the usual format, I would use cut instead of sed

cut -d' ' -f3 films.txt

Pay attention to one tab between quotation marks after -d , which can be entered on the command line by typing ctrl + v first, i.e. ctrl+v ctrl+i

+1
source

All Articles