How can I extract all the quotes in the text?

I am looking for SimpleGrepSedPerlOrPythonOneLiner that displays all the quotes in the text.


Example 1:

echo "HAL," noted Frank, "said that everything was going extremely well." | SimpleGrepSedPerlOrPythonOneLiner

standard output:

"HAL,"
"said that everything was going extremely well."

Example 2:

cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner

standard output:

"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"

and etc.

( link to the corresponding text ).

+5
source share
4 answers

I like it:

perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'

This is a bit verbose, but it handles escaped quotes and discards much better than a simple implementation. What he says:

my $re = qr{
   "               # Begin it with literal quote
   ( 
     (?>           # prevent backtracking once the alternation has been
                   # satisfied. It either agrees or it does not. This expression
                   # only needs one direction, or we fail out of the branch

         [^"\\]    # a character that is not a dquote or a backslash
     |   \\+       # OR if a backslash, then any number of backslashes followed by 
         [^"]      # something that is not a quote
     |   \\        # OR again a backslash
         (?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
         "         # and a quote
     )*            # any number of *set* qualifying phrases
  )                # all batched up together
  "                # Ended by a literal quote
}x;

If you don’t need such power - say that these are most likely dialogs, not structured quotes, then

/"([^"]*)"/ 

probably works just like everything else.

+7
source

regexp , ,

$ echo \"HAL,\" noted Frank, \"said that everything was going extremely well\"  
 | perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"HAL,"
"said that everything was going extremely well"

$ cat eula.txt| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"EULA"
"online"
"Software"
"Workstation Computer"
"Device"
"multiplexing"
"DRM"
"Secure Content"
"DRM Software"
"Secure Content Owners"
"DRM Upgrades"
"WMFSDK"
"Not For Resale"
"NFR,"
"Academic Edition"
"AE,"
"Qualified Educational User."
"Exclusion of Incidental, Consequential and Certain Other Damages"
"Restricted Rights"
"Exclusion des dommages accessoires, indirects et de certains autres dommages"
"Consumer rights"
+5
grep -o "\"[^\"]*\""

This greps for "+ is nothing but a quote, any number of times +"

-o only prints consistent text, not the entire line.

+4
source
grep -o '"[^"]*"' file

The -o option prints only a pattern

0
source

All Articles