I would use some kind of tokenizer to distinguish between comments and other language characters.
How you process PHP files, you should use the PHP token_get_all :
$tokens = token_get_all($source);
Then you can list the markers and select the markers by their type :
foreach ($tokens as &$token) { if (in_array($token[0], array(T_COMMENT, T_DOC_COMMENT, T_ML_COMMENT))) { // comment } else { // not a comment $token[1] = str_replace('example.com', 'example.net', $token[1]); } }
At the end, put everything back together with implode .
For other languages in which you do not have a suitable tokenizer, you can write your own small tokenizer:
preg_match_all('~/\*.*?\*/|//(?s).*|(example\.com)|.~', $code, $tokens, PREG_SET_ORDER); foreach ($tokens as &$token) { if (strlen($token[1])) { $token = str_replace('example.com', 'example.net', $token[1]); } else { $token = $token[0]; } } $code = implode('', $tokens);
Note that this does not account for any other tokens, such as strings. Thus, this does not match example.com if it appears in a line, but also as a comment:
'foo bar'
source share