Regex for quoted string
How to get the substring " It big \"problem " using regular expression?
s = ' function(){ return " It\ big \"problem "; }'; This comes from nanorc.sample, available on many linux distributions. It is used to highlight the syntax of C-style strings.
\"(\\.|[^\"])*\" As stated in ePharaoh, the answer is
/"([^"\\]*(\\.[^"\\]*)*)"/ To apply the above to single quotes or double quotes, use
/"([^"\\]*(\\.[^"\\]*)*)"|\'([^\'\\]*(\\.[^\'\\]*)*)\'/ "(?:\\"|.)*?" Alternating \" and . Goes through escaped quotes, while the lazy *? Quantifier ensures you don't go past the end of the quoted string. Works with .NET Framework RE classes
Most proposed solutions use alternative repetition paths, i.e. (A | B) *.
You may encounter stack overflow on large inputs, as some template compiler implements this using recursion.
Java, for example: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6337993
Something like this: "(?:[^"\\]*(?:\\.)?)*" , Or the one provided by Guy Bedford will reduce the number of parsing steps, avoiding most stack overflows.
/"(?:[^"\\]++|\\.)*+"/ Taken directly from man perlre on a Linux system with Perl 5.22.0 installed. As an optimization, this regular expression uses the "posessive" form of both + and * to prevent backtracking, since it is known in advance that a line without a closing quote will in no way match.
/(["\']).*?(?<!\\)(\\\\)*\1/is should work with any quoted string
This works fine on PCRE and does not crash with StackOverflow.
"(.*?[^\\])??((\\\\)+)?+" Explanation:
- Each quoted string begins with Char:
"; - It can contain any number of characters:.
.*?{Lazy match}; ending with non escape[^\\]; - The operator (2) is Lazy (!) Optional, because the string may be empty (""). So:
(.*?[^\\])?? - Finally, each quoted string ends with Char (
"), but it may be preceded by an even number of escape code pairs(\\\\)+; and this Greedy (!) Is optional:((\\\\)+)?+{Greedy matching}, the bacause string may be empty or without trailing pairs!
here is what works with both “and” and easily adds others at the beginning.
("| ') (?: \\\ 1 | [^ \ 1]) *? \ 1 it uses a backlink (\ 1) to exactly match what is in the first group ("or").
Keep in mind that regular expressions are not a silver bullet for all -y strings. Some things are easier to do with the cursor and linear, manual search. A CFL could do the trick quite trivially, but there are not many CFL implementations (afaik).
If the search is done from the very beginning, maybe this can work?
\"((\\\")|[^\\])*\" A more extensive version of https://stackoverflow.com/a/312618/
/"([^"\\]{50,}(\\.[^"\\]*)*)"|\'[^\'\\]{50,}(\\.[^\'\\]*)*\'|"[^"\\]{50,}(\\.[^"\\]*)*"/ This version also contains
- Minimum Quote Length 50
- Additional types of quotes (open
"and close")
Noted in regexpal and ended up with this regex: (Don't ask me how this works, I barely understand even tho I wrote this lol)
"(([^"\\]?(\\\\)?)|(\\")+)+"