Regular expression for parsing tags from a string. Flickr style

I wonder if anyone can provide me with the regular expressions needed to parse a string, for example:

'foo bar "multiple tag words"

into an array of tags, for example:

["foo", "bar", "multiple word tag"]

thanks

+3
source share
6 answers

In Ruby

scan(/\"([\w ]+)\"|(\w+)/).flatten.compact

eg.

"foo bar \"multiple words\" party_like_1999".scan(/\"([\w ]+)\"|(\w+)/).flatten.compact
=> ["foo", "bar", "multiple words", "party_like_1999"]
+7
source

You can implement a scanner for this. For example, in Python, it looks something like this:

import re
scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", lambda s,t:t),       # regular tag
    (r"\".*?\"",      lambda s,t:t[1:-1]), # multi-word-tag
    (r"\s+",          None),               # whitespace not in multi-word-tag
    ])
tags, _ = scanner.scan('foo bar "multiple word tag"')
print tags
# ['foo', 'bar', 'multiple word tag']

This is called lexical analysis.

+2
source

, split() /, . - , , ( ), , . ,

split('foo bar "multiple word tag"', ' ', 3)

3 , . / trim() strip() ( ), .

, , , , - , , . , ; , . , (-?) Perl- ( - , ):

pos = 0;
while pos < length(string):
    # match(regular expression, string to search, starting position for the search)
    m = match(/\s*(".+?"|\S+)?\s*/, string, pos);
    tag = m.group(1).strip('"');
    # process the tag

, , , , DFA ( ), , , ( - , ). , , , , ( ) DFA.

0

, match- > array:

(?<=")[^"]+|\w+


( , - , \S+ \w+ .)


Ruby:

myarray = mystring.scan(/(?<=\")[^\"]+|\w+/)

()

0

( Perl):

^(?:"([^"]*?)"|(\S+?)|\s*?)*$

:

^                    // from begginning                 
 (?:                  // non-capturing group of three alternatives
    "([^"]*?)"   // capture "tag"                                               "
 |
    (\S+?)        // capture tag
 |
    \s*?            // ignore whitespace
 )*                  
$                    // until the end of the line
0

, . Regex . , ,

"^(?<username>[\w\d]+)@.*$"

" "

. . ", " ", ". , .

-1

All Articles