For most strings, you need to allow something escaped (not just escaped quotes). for example, you most likely need to allow escaped characters such as "\n" and "\t" , and, of course, escape-escape: "\\" .
This is a frequently asked question, and one that has been resolved (and optimized) a long time ago. Jeffrey Friedl examines this issue in detail (as an example) in his classic work: Mastering Regular Expressions (3rd edition) . Here is the regex you are looking for:
Good:
"([^"\\]|\\.)*"
Version 1: works correctly, but not very efficient.
Better:
"([^"\\]++|\\.)*" or "((?>[^"\\]+)|\\.)*"
Version 2: more effective if you have possessive quantifiers or atomic groups (see "The Correct Answer to Sin," which uses the atomic group method).
Best:
"[^"\\]*(?:\\.[^"\\]*)*"
Version 3: even more efficient. Implements the Friedl technique: "loop unrolling". It does not require possessive or atomic groups (i.e. It can be used in Javascript and other less functional regular expression engines.)
Here are the recommended regular expressions in PHP syntax for double and single quotes:
$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s'; $re_sq = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'/s";
ridgerunner Apr 17 '11 at 20:13 2011-04-17 20:13
source share