Interpret escape characters in single quotes

The presence of a string with a single quote :

$content = '\tThis variable is not set by me.\nCannot do anything about it.\n'; 

I would like to inerpret / process the string as if it were with a double quote . In other words, I would like to replace all possible escape characters (not only tab and linefeed, as in this example) with real values, taking into account that the backslash can also be escaped, so "\\ n" needs to be replaced by "\ n". eval () will easily do what I need, but I cannot use it.

Is there any simple solution?

(A similar thread that I found relates to expanding variables in a single quoted string, while I am after replacing escape characters.)

+8
string php escaping
source share
3 answers

There is a very simple way to do this based on the preg_replace Doc and stripcslashes , both built-in:

 preg_replace( '/\\\\([nrtvf\\\\$"]|[0-7]{1,3}|\x[0-9A-Fa-f]{1,2})/e', 'stripcslashes("$0")', $content ); 

This works as long as "\\n" should become "\n" and the like. Demo

If you want to process these lines literally, see the previous answer.

Edit: You asked in a comment:

I am a little puzzled that the difference between the output of this and stripcslashes () directly [?]

The difference is not always visible, but there is one: stripcslashes will delete tracker \ if the subsequent transition sequence does not follow. In PHP strings, in this case, the slash is not discarded. Example, "\d" , d not a special character, so PHP saves a slash:

 $content = '\d'; $content; # \d stripcslashes($content); # d preg_replace(..., $content); # \d 

That's why preg_replace is useful here, it will only use the function on those substrings where stripcslashes works as intended: all valid escape sequences.

+5
source share

If you need to do exact escape sequences like PHP does, you need a long version, which is the DoubleQuoted class. I expanded the input line a bit to cover more escape sequences than in your question to make this more general:

 $content = '\\\\t\tThis variable\\string is\x20not\40set by me.\nCannot \do anything about it.\n'; $dq = new DoubleQuoted($content); echo $dq; 

Output:

 \\t This variable\string is not set by me. Cannot \do anything about it. 

However, if you approach this well, there is a PHP function called stripcslashes , for comparison, I've added the result of this and the double quote string of PHP:

 echo stripcslashes($content), "\n"; $compare = "\\\\t\tThis variable\\string is\x20not\40set by me.\nCannot \do anything about it.\n"; echo $compare, "\n"; 

Output:

 \t This variablestring is not set by me. Cannot do anything about it. \\t This variable\string is not set by me. Cannot \do anything about it. 

As you can see, stripcslashes discards some characters here compared to the original PHP output.

( Edit: See also my other answer , which offers something simple and enjoyable with cstripslashes and preg_replace .)

If stripcslashes does not fit, DoubleQuoted exists. The constructor accepts a string that is treated as a double-quoted string (minus the replacement of variables, only escape sequences of the character).

As manual outlines, there are several control sequences. They look like regular expressions and everything starts with \ , so it is almost suitable for using regular expressions to replace them.

However, there is one exception: \\ skip the escape sequence. The regular expression must have feedback and / or atomic groups to handle this, and I don’t own them, so I just did a simple trick: I applied regular expressions only to those parts of the line that do not contain \\ , just breaking the line first and then unleashing it again.

Two regular expression-based functions replace the functions, preg_replace Doc and preg_replace_callback Doc , and allow you to work with arrays, so this is pretty easy to do.

This is done in the __toString() Doc :

 class DoubleQuoted { ... private $string; public function __construct($string) { $this->string = $string; } ... public function __toString() { $this->exception = NULL; $patterns = $this->getPatterns(); $callback = $this->getCallback(); $parts = explode('\\\\', $this->string); try { $parts = preg_replace_callback($patterns, $callback, $parts); } catch(Exception $e) { $this->exception = $e; return FALSE; # provoke exception } return implode('\\\\', $parts); } ... 

See explode Doc and implode Doc . They take care that preg_replace_callback does not work on any line containing \\ . Thus, the replacement operation was relieved of the burden to address these special cases. This is a callback function that is called by preg_replace_callback for each pattern match. I wrapped it in a close so that it was not publicly available:

 private function getCallback() { $map = $this->map; return function($matches) use ($map) { list($full, $type, $number) = $matches += array('', NULL, NULL); if (NULL === $type) throw new UnexpectedValueException(sprintf('Match was %s', $full)) ; if (NULL === $number) return isset($map[$type]) ? $map[$type] : '\\'.$type ; switch($type) { case 'x': return chr(hexdec($number)); case '': return chr(octdec($number)); default: throw new UnexpectedValueException(sprintf('Match was %s', $full)); } }; } 

You need more information to figure this out, as this is no longer a complete class. I look at the missing points and add the missing code:

All templates of the β€œseek” class contain subgroups of at least one. This is included in $type and is either the only character to be translated or an empty string for octal, and x for hexadecimal numbers.

The optional second group $number either not set ( NULL ) or contains an octal / hexadecimal number. The input $matches normalized to the just named variables on this line:

 list($full, $type, $number) = $matches += array('', NULL, NULL); 

Patterns are predefined as sequences in a private member variable:

 private $sequences = array( '(n|r|t|v|f|\\$|")', # single escape characters '()([0-7]{1,3})', # octal '(x)([0-9A-Fa-f]{1,2})', # hex ); 

The getPatterns() function simply transfers these definitions to valid PCRE regular expressions, for example:

 /\\(n|r|t|v|f|\$|")/ # single escape characters /\\()([0-7]{1,3})/ # octal /\\(x)([0-9A-Fa-f]{1,2})/ # hex 

It is pretty simple:

 private function getPatterns() { foreach($this->sequences as $sequence) $patterns[] = sprintf('/\\\\%s/', $sequence) ; return $patterns; } 

Now that the patterns are laid out, this explains what $matches contains when the callback function is called.

Another thing you need to know to understand how the callback works is $map . This is just an array containing single replacement characters:

 private $map = array( 'n' => "\n", 'r' => "\r", 't' => "\t", 'v' => "\v", 'f' => "\f", '$' => '$', '"' => '"', ); 

And that is pretty much for the class. There is another private variable $this->exception , which is used for storage if an exception was thrown as __toString() cannot throw exceptions and will result in a fatal error if this happens in the callback function. Thus, it gets and is stored in the private variable of the class, and here is this part of the code:

  ... public function __toString() { $this->exception = NULL; ... try { $parts = preg_replace_callback($patterns, $callback, $parts); } catch(Exception $e) { $this->exception = $e; return FALSE; # provoke exception } ... 

In case of an exception when replacing, the function exists with FALSE , which will lead to a catchable exception. The getter function makes an internal exception available:

 private $exception; ... public function getException() { return $this->exception; } 

To easily access the source string, you can add another recipient to get this:

 public function getString() { return $this->string; } 

And that whole class. Hope this will be helpful.

+5
source share

A regular expression solution is likely to be the most convenient to maintain here (definitions of valid escape sequences in strings are even provided as regular expressions in the documentation):

 $content = '\tThis variable is not set by me.\nCannot do anything about it.\n'; $replaced = preg_replace_callback( '/\\\\(\\\\|n|r|t|v|f|"|[0-7]{1,3}|\x[0-9A-Fa-f]{1,2})/', 'replacer', $content); var_dump($replaced); function replacer($match) { $map = array( '\\\\' => "\\", '\\n' => "\n", '\\r' => "\r", '\\t' => "\t", '\\v' => "\v", // etc for \f \$ \" ); $match = $match[0]; // So that $match is a scalar, the full matched pattern if (!empty($map[$match])) { return $map[$match]; } // Otherwise it octal or hex notation if ($match[1] == 'x') { return chr(hexdec(substr($match, 2))); } else { return chr(octdec(substr($match, 1))); } } 

The above can also (and really should) be improved:

  • Instead, replace the replacement function as an anonymous function
  • Maybe replace $map with switch to increase performance
0
source share

All Articles