Regex matches only certain characters preceded by a space or nothing (beginning of line)

Consider the following tweets:

RT @username This is my tweet Check this! RT @username This is my tweet I have PART 2 downloaded 

In the preg_replace() call, I use a regular expression to replace RT (general retvet syntax) with {RT} . It almost works, however it also matches RT in PART in the last tweet:

  • I have PART 2 downloaded becomes I have PA{RT} 2 downloaded

I want the regex to allow anything (the beginning of a line) or a space (U + 0020) before RT .

Current call to preg_replace() :

 echo preg_replace("(\RT(?=\s)/", '{RT}', $tweet); 
+7
source share
4 answers

Add (^|[ ]) before RT to your regular expression to match the beginning of a line or space. Add more characters between the square brackets to include them (for example, (^|[ _]) to also match underscores.

Explanation

  • ^ matches start of line
  • [ ] matches a space (U + 0020) (or any other character between [ and ] )
  • ( i ) make a group
  • | between ( ) means or

So...

  • (^|[ ]) means a group that is either the beginning of a line or a space (U + 0020)

New regex

 echo preg_replace("/(^|[ ])(\RT(?=\s))/", '$1{RT}', $tweet); 

Note. . @DVK mentioned that bad practice only matches the beginning of a line and a space (and not word boundaries). Since specific characters were requested by the OP, word-matching is not technically correct. However, since @DVK really made a mistake, I would like to mention that using (\b) instead of (^|[ ]) in many cases will give results that are better suited to your idea “correctly” (for example, “Awesome, RT Some tweet ".). However, keep in mind that this note was added after adoption and is in no way part of the answer to this specific question - it is provided only to those who may encounter this answer for a similar but different problem.

+8
source

Use \b for the word boundary . \bRT\b

+1
source

Edited: ^ \ C * RT

will match any line starting with RT or space such as RT

+1
source

I think the best way to verify RT is with a regular expression to validate RT (space) @username. That means you have something like

 #RT\ s@ ([a-zA-Z0-9_]+)# 

Of course, you will need to change [a-zA-Z0-9 _] + depending on which characters are allowed in the username. Given this tweet, twitter allows you to write letters, numbers, and underscores so that this regular expression works fine.

0
source

All Articles