I will begin to say that everything that I write below is not quite what is happening, but for clarity, I will simplify it.
Imagine that two evaluations are performed using regular expressions : the first is executed by PHP, and the second is executed using PCRE, as if they were separate mechanisms. And for our failure,
PHP and PCRE are EVALUATED IN DIFFERENT WAYS.
We have 3 "guys" here: 1) USER; 2) PHP and; 3) PCRE.
The USER interacts with PHP by writing the CODE that you enter in the code editor. PHP then evaluates this CODE and sends another bit of information to PCRE. This bit of information is different from what you entered in your CODE. Then PCRE evaluates it and returns something in PHP, which evaluates this answer and returns something to the USER.
I will explain better in the example below. There I use backslash ("\") to show what is happening.
Assume this CODE bit in a php file:
<?php $sub = "A backslash \ in a string"; $pat1 = "#\#"; $pat2 = "#\\#"; $pat3 = "#\\\#"; $pat4 = "#\\\\#"; echo "sub: ".$sub; echo "\n\n"; echo "pat1: ".$pat1; echo "\n"; echo "pat2: ".$pat2; echo "\n"; echo "pat3: ".$pat3; echo "\n"; echo "pat4: ".$pat4; ?>
This will print:
sub: A backslash \ in a string pat1: #\# pat2: #\# pat3: #\\# pat4: #\\#
There is no regular expression in this example, so there is only a PHP evaluation of the code. PHP leaves a backslash as is, unless it precedes a special character . Therefore, it correctly prints a backslash in $ sub.
PHP evaluates $ pat1 and $ pat2 EXACTLY, because in $ pat1 the backslash remains as it is, and in $ pat2 the first backslash skips the second, resulting in a single backslash.
Now, in $ pat3, the first backslash resets the second, resulting in a single backslash. PHP then evaluates the third backslash and leaves it as it is because it does not precede anything special. The result will be double backslash.
Now someone may say: "But now we again have two backslashes! Should the first not avoid the second?" The answer is no. "After PHP evaluates the first two backslashes in one, it does not look back and continues to evaluate what comes next.
At this point, you already know what happens with $ pat4: the first backslash defeats the second, and the third slashes the fourth, leaving two at the end.
Now that he has figured out what PHP is doing with these lines, add another code after the previous one.
if (preg_match($pat1, $sub)) echo "test1: true"; else echo "test1: false"; echo "\n"; if (preg_match($pat2, $sub)) echo "test2: true"; else echo "test2: false"; echo "\n"; if (preg_match($pat3, $sub)) echo "test3: true"; else echo "test3: false"; echo "\n"; if (preg_match($pat4, $sub)) echo "test4: true"; else echo "test4: false";
And the result:
test1: false test2: false test3: true test4: true
So what happens here, PHP does not send “what you typed” to CODE directly to PCRE. Instead, PHP sends what it appreciated earlier (this is exactly what we saw above).
For test1 and test2, despite the fact that we wrote different patterns in CODE for each test, PHP sends the same pattern # \ # to PCRE. The same thing happens for test3 and test4: PHP sends # \\ # . Thus, the results for test1 and test2 are the same, as well as for test3 and test4.
Now, what happens when PCRE evaluates these patterns? PCRE does not work like PHP.
In tests1 and test2, when PCRE sees that one backslash has nothing special (or nothing at all), it does not leave it as it is. Instead, he probably thinks, "What the hell is this?" and returns a PHP error (in fact, I really don’t know what happens when sending one backslash to PCRE, I searched for it, but there is still no final one). Then PHP takes what we assume is an error and evaluates it as “false” and returns it to the rest of the code (in this example, the if () function).
In tests test3 and test4, everything happens as we now expect: PCRE evaluates the first backslash as the acceleration of the second, which results in one backslash. This, of course, matches $ sub string and returns a "successful message" to PHP, which evaluates it to "true".
ANSWERED QUESTIONS
Some characters are special for PHP (for example, n for NEW LINE, t for TAB).
Some characters are special for PCRE (e.g . , . (Period) to match any s character to match spaces).
And some characters are special for both (for example, $ for php is the beginning of the variable name, and for PCRE is the end of the object).
This is why you need to avoid newlines only once, for example \ n . PHP will evaluate it as a REAL character NEW LINE and send it to PCRE.
For a period, if you want to match this particular character, you must use \. and PHP will not do anything because the dot is not a special character for PHP in the string . Instead, he will send them, like PCRE. Now on PCRE, he “sees” the backslash before the dot and realizes that it must match that particular character. If you use double escape \\. The first backslash will escape the second, leaving you with the same result.
And if you want to match the dollar sign in the string, you should use \\\ $ . In PHP, the first backslash is flushed to the second, leaving one backslash. Then the third backslash will come out of the dollar sign. As a result, the result is \ $ . This is what PCRE will get. PCRE will see that backslash and understand that the dollar sign does not approve the end of the subject, but is literal.
QUOTES
And now we have come to quotes. The problem with them is that PHP evaluates the string differently, depending on the quotes used for its environment. Check it out: Rows
All that I said so far this moment is not suitable for double quotes. If you try this '\ n' in single quotes, PHP will evaluate this backslash as a literal. But, if it is used in a regular expression, PCRE will get this string as is. And since n is also special for PCRE, it will interpret it as a newline and BOOM, it "magicaly" matches a newline in a line. Check escape sequences here: Escape Sequences
As I said at the beginning, everything is not as accurate as I tried to explain here, but I really hope this helps (and does not make it more confusing than it already is).