Repeating a regular expression pattern

Question

Repeating a regular expression pattern

First, I don’t know if this is really possible, but I want to repeat the regex pattern. I am using a template:

sed 's/[^-\t]*\t[^-\t]*\t\([^-\t]*\).*/\1/' films.txt

entrance

 250. 7.9 Shutter Island (2010) 110,675

Will return:

 Shutter Island (2010)

I map all the tabs (250.), then the tab, and then all the tabs (7.9), and then the tab. Then I love the name of the movie, and then match all the other characters (110 675).

This works well, but I study the regex and it looks ugly, the regex [^ - \ t] * \ t is repeated right after it, is there a way to repeat this as if you can use a character like {2,2}?

I tried ([^-\t]*\t){2,2} (and options), but I assume I'm trying to match [^-\t]*\t\t?

Also, if there is any way to make my above code shorter and cleaner, any help would be greatly appreciated.

+4

regex sed

akd5446 Oct 20 '10 at 16:32

source share

6 answers

This works for me:

 sed 's/\([^\t]*\t\)\{2\}\([^\t]*\).*/\2/' films.txt

If your sed supports -r , you can get rid of most of the escaping:

 sed -r 's/([^\t]*\t){2}([^\t]*).*/\2/' films.txt

Change the first 2 to select different fields (0-3).

This will also work:

 sed 's/[^\t]\+/\n&/3;s/.*\n//;s/\t.*//' films.txt

Change 3 to select different fields (1-4).

+5

Dennis williamson Oct 26 '10 at 1:20

source share

To use repeating curly braces and group brackets with sed correctly, you might need to avoid it with backslashes, like

sed 's/\([^-\t]*\t\)\{3\}.*/\1/' films.txt

Yes, this command will work correctly with your example.

If you feel annoyed, you can choose the -r option, which allows extended regex mode and forget about the backslash escape codes in brackets.

sed -r 's/([^-\t]*\t){3}.*/\1/' films.txt

It was discovered that this is almost the same as Denis Williamson's answer, but I leave it because it is a shorter expression to do the same.

+3

Ch.Idea Oct 08 '15 at 16:01

source share

You can repeat everything by putting them in parentheses, for example:

 ([^-\t]*\t){2,2}

And the full template to match the title will be as follows:

 ([^-\t]*\t){2,2}([^-\t]+).*

You said you tried. I'm not sure if it was different, but the above worked for me on your sample data.

+2

Samuel neff Oct 20 '10 at 16:41

source share

why are you doing something difficult

 $ awk '{$1=$2=$NF=""}1' file Shutter Island (2010)

+2

ghostdog74 Oct 20 '10 at 17:05

source share

If it is a file with a shared tab with the usual format, I would use cut instead of sed

cut -d' ' -f3 films.txt

Pay attention to one tab between quotation marks after -d , which can be entered on the command line by typing ctrl + v first, i.e. ctrl+v ctrl+i

+1

Stephen p Oct 20 '10 at 17:05

source share

andy matthews · Accepted Answer · 2010-10-20T16:52:16+0000

I think you are mistaken. If you just want to extract the name of the movie and it will release the year, you can try this regex:

 (?:\t)[\w ()]+(?:\t)

As seen here:

http://regexr.com?2sd3a

Please note that it matches the tab character at the beginning and end of the actual line you want, but does not include them in the corresponding group.

Repeating a regular expression pattern

More articles: