Using JSON.decode for this has significant flaws that you should be aware of:
- You must enclose the string in double quotation marks
- Many characters are not supported and must be escaped by themselves. For example, passing any of the following elements to
JSON.decode (after wrapping them in double quotes) will result in an error, even if they are all valid: \\n , \n , \\0 , a"a - It does not support hex transitions:
\\x45 - It does not support Unicode code point sequences:
\\u{045}
There are other caveats. Essentially, using JSON.decode for this purpose is a hack and does not work as you can always expect. You should stick to using the JSON library to handle JSON, not string operations.
I recently ran into this problem myself and wanted to have a reliable decoder, so I wrote it myself. It is fully and thoroughly tested and available here: https://github.com/iansan5653/unraw . It is as close as possible to the JavaScript standard.
Explanation:
The source text contains about 250 lines, so I wonβt include everything here, but, in fact, it uses the following regular expression to search for all escape sequences and then parse them using parseInt(string, 16) to decode base-16 numbers and then String.fromCodePoint(number) to get the corresponding character:
/\\(?:(\\)|x([\s\S]{0,2})|u(\{[^}]*\}?)|u([\s\S]{4})\\u([^{][\s\S]{0,3})|u([\s\S]{0,4})|([0-3]?[0-7]{1,2})|([\s\S])|$)/g
Comment (NOTE: This regular expression matches all escape sequences, including invalid sequences. If a line throws an error in JS, it will throw an error in my library [ie '\x!!' will fail]]:
/ \\ # All escape sequences start with a backslash (?: # Starts a group of 'or' statements (\\) # If a second backslash is encountered, stop there (it an escaped slash) | # or x([\s\S]{0,2}) # Match valid hexadecimal sequences | # or u(\{[^}]*\}?) # Match valid code point sequences | # or u([\s\S]{4})\\u([^{][\s\S]{0,3}) # Match surrogate code points which get parsed together | # or u([\s\S]{0,4}) # Match non-surrogate Unicode sequences | # or ([0-3]?[0-7]{1,2}) # Match deprecated octal sequences | # or ([\s\S]) # Match anything else ('.' does not match newlines) | # or $ # Match the end of the string ) # End the group of 'or' statements /g # Match as many instances as there are
Example
Example
Using this library:
import unraw from "unraw"; let step1 = unraw('http\\u00253A\\u00252F\\u00252Fexample.com');
Ian Aug 19 '19 at 16:25 2019-08-19 16:25
source share