Based on a duplicate question, I will post a solution that works for "traditional" regular expression implementations that do not support Perl \s , \W extensions, etc. Beginners who are not even familiar with the fact that there are different dialects (aka tastes) of regular expressions that are recommended to be read, for example, Why are there so many different dialects of regular expressions?
If you have support for the POSIX class, you can use [[:alpha:]] for \W , [^[:alpha:]] for \W , [[:space:]] for \s , etc. But if we assume that spaces will always be space and you want to extract the first three tokens between spaces, you really don't need this.
[^ ]+[ ]+[^ ]+[ ]+[^ ]+
matches three tokens, separated by spaces. (I put spaces in square brackets to make them stand out, and they are easy to expand if you want to include other characters than just one regular ASCII space in the token separator set. For example, if your regex dialect takes \t for the tab, or you you can insert a regular tab in its place, you can expand it to
[^ \t]+[ \t]+[^ \t]+[ \t]+[^ \t]+
In most shells, you can enter a literal tab using ctrl + v tab , that is, prefix it with an escape code, which is often typed by holding down the ctrl key and entering v .)
To use this, you may want to do
grep -Eo '[^ ]+[ ]+[^ ]+[ ]+[^ ]+' file
where single quotes are necessary to protect the regular expression from the shell (double quotes will work here, but weaker or inverted characters reset each character in the regular expression that matters to the shell as a metacharacter) or, possibly, / p>
sed -r 's/([^ ]+[ ]+[^ ]+[ ]+[^ ]+).*/\1/' file
to replace each line with only a captured expression (the brackets form a capture group with which you can return to \1 in the replacement part in the s command in sed ). The -r option selects a slightly more functional regular expression dialect than the traditional bare-bone sed ; if your sed doesn't have one, try -E or put a backslash in front of each bracket and plus sign.
Because of how regular expressions work, the first three are easy, because the regular expression engine always returns the first possible match in a string. If you want three tokens to start with the second, you must enter a skip expression. Adapting the sed script above, it will be
sed -r 's/[^ ]+[ ]+([^ ]+[ ]+[^ ]+[ ]+[^ ]+).*/\1/'
where you will notice how I put in marker + group without marker before capture. (This cannot be done with grep -o unless you have grep -P , in which case the full gamut of Perl extensions is available to you.)
If your regex dialect supports {m, n} repetition, you can of course reorganize the regular expression to use this. If you need a large number of repetitions, it is certainly more readable and more convenient. Just make sure you do not add parentheses where you break the order of the return line (the first left bracket creates the first group \1 , the second \2 , etc.)
sed -r 's/([^ ]+([ ]+[^ ]+){2}).*/\1/' file
Notice how the second group in brackets is required to indicate the repetition area {2} (we want to repeat more than just a single character immediately before the left curly bracket). The OP attempt had an error when a repetition was indicated outside the last bracket; then the backward link \1 (or whatever it called up in your dialect - TextMate seems to use $1 , just like Perl) will refer to the last single parenthesis match, since repetition is not part of the capture outside the sliding parentheses .