Regular expression - replace word except url / uri

Question

Regular expression - replace word except url / uri

I’m writing a globalization module for a web application and I need a regular expression to replace all instances of a word with another word (translation) - with the exception of words found in the URL / URI.

EDIT: I forgot to mention that I'm using Ruby, so I can't use "Lookbehind"

+2

url ruby regex replace word

Jose Fernandez Jan 29 '10 at 15:20

source share

3 answers

You can probaby use something like

(?<!://[^ ]*)\bfoo\b

, , , , , :// - .

PS Home:\> "foo foobar http://foo_bar/baz?gak=foobar baz foo" -replace '(?<!://[^ ]*)\bfoo\b', 'FOO'
FOO foobar http://foo_bar/baz?gak=foobar baz FOO

0

Joey 29 . '10 15:26

Did you try to split the text into words and repeat the words? Then you can check each word, determine if it is a URI, translate it if it is not.

0

glenn jackman Jan 29 '10 at 17:11

source share

Wayne conrad · Accepted Answer · 2010-01-31T12:38:55+0000

Split into a regular expression URI; include the URI in the result.
For each part:
- if it is a URI, leave it alone
- otherwise replace words
Join the parts

the code:

# From RFC 3986 Appendix B, with these modifications:
#   o Spaces disallowed
#   o All groups non-matching, except for added outermost group
#   o Not anchored
#   o Scheme required
#   o Authority required
URI_REGEX = %r"((?:(?:[^ :/?#]+):)(?://(?:[^ /?#]*))(?:[^ ?#]*)(?:\?(?:[^ #]*))?(?:#(?:[^ ]*))?)"

def replace_except_uris(text, old, new)
  text.split(URI_REGEX).collect do |s|
    if s =~ URI_REGEX
      s
    else
      s.gsub(old, new)
    end
  end.join
end

text = <<END
stack http://www.stackoverflow.com stack
stack http://www.somewhere.come/stack?stack=stack#stack stack
END

puts replace_except_uris(text, /stack/, 'LINKED-LIST')

# => LINKED-LIST http://www.stackoverflow.com LINKED-LIST
# => LINKED-LIST http://www.somewhere.come/stack?stack=stack#stack LINKED-LIST

Regular expression - replace word except url / uri

More articles: