How to parse a command line with regular expressions?

I want to split the command line as a string in single string parameters. What a regular expression looks like for him. The problem is that the parameters can be specified. For example, for example:

"param 1" param2 "param 3"

should get:

param 1, param2, param 3

+6
regex parsing
source share
13 answers

I am using regexlib for this problem. If you go to http://regexlib.com/ and search for the "command line", you will find three results that look like they are trying to solve this problem or similar problems - this should be a good start.

This might work: http://regexlib.com/Search.aspx?k=command+line&c=-1&m=-1&ps=20

+5
source share

You should not use regular expressions for this. Instead, write a parser or use one of them provided by your language.

I don’t understand why I am taking this for this. Here's how to do it in Python:

>>> import shlex >>> shlex.split('"param 1" param2 "param 3"') ['param 1', 'param2', 'param 3'] >>> shlex.split('"param 1" param2 "param 3') Traceback (most recent call last): [...] ValueError: No closing quotation >>> shlex.split('"param 1" param2 "param 3\\""') ['param 1', 'param2', 'param 3"'] 

Now tell me that destroying your brain about how regix will solve this problem is always worth what you need.

+12
source share

Without regard to the implementation language, your regular expression might look something like this:

 ("[^"]*"|[^"]+)(\s+|$) 

The first part of "[^"]*" searches for a quoted string that does not contain embedded quotes, and the second part of [^"]+ searches for a sequence of characters without quotes. \s+ corresponds to the separator sequence of spaces, and $ corresponds to the end of the line.

+5
source share

Most languages ​​have other functions (built-in or provided by the standard library) that will more easily parse command lines than create their own regular expression, plus you know that they will do it exactly out of the box. If you edit the message to determine the language that you are using, I am sure that someone here can tell you the one that is used in that language.

Regexes are very powerful tools and useful for a wide range of things, but there are also many problems for which they are not the best solution. This is one of them.

+2
source share
 ("[^"]+"|[^\s"]+) 

what i use c ++

 #include <iostream> #include <iterator> #include <string> #include <regex> void foo() { std::string strArg = " \"par 1\" par2 par3 \"par 4\""; std::regex word_regex( "(\"[^\"]+\"|[^\\s\"]+)" ); auto words_begin = std::sregex_iterator(strArg.begin(), strArg.end(), word_regex); auto words_end = std::sregex_iterator(); for (std::sregex_iterator i = words_begin; i != words_end; ++i) { std::smatch match = *i; std::string match_str = match.str(); std::cout << match_str << '\n'; } } 

Output:

 "par 1" par2 par3 "par 4" 
+2
source share

This will split exe from it params; cutting brackets from exe; accepts clean data:

 ^(?:"([^"]+(?="))|([^\s]+))["]{0,1} +(.+)$ 

You will have two matches at a time, from three match groups:

  • Exe if it was enclosed in brackets
  • Exe if it was not enclosed in parentheses
  • Parameter grouping

<strong> Examples:

 "C:\WINDOWS\system32\cmd.exe" /c echo this 

Match 1: C:\WINDOWS\system32\cmd.exe

Match 2: $ null

Match 3: /c echo this

 C:\WINDOWS\system32\cmd.exe /c echo this 

Match 1: $ null

Match 2: C:\WINDOWS\system32\cmd.exe

Match 3: /c echo this

 "C:\Program Files\foo\bar.exe" /run 

Match 1: C:\Program Files\foo\bar.exe

Match 2: $ null

Match 3: /run

Thoughts:

I'm sure you will need to create a loop to capture an infinite number of parameters.

This regular expression can easily be looped on it in the third match until the match is complete; no more options.

+1
source share

Something like:

 "(?:(?<=")([^"]+)"\s*)|\s*([^"\s]+) 

or simpler:

 "([^"]+)"|\s*([^"\s]+) 

(only for regex search;))

Apply it several times, and the group n ° 1 will give you the parameter whether it is surrounded by double quotes or not.

0
source share

If these are just quotes you are worried about, just write a simple loop to flush character by character into a string that ignores quotes.

Alternatively, if you use some string manipulation library, you can use it to remove all quotes and then concatenate them.

0
source share

If you are looking for parsing a command and parameters, I use the following (with ^ $ binding for aka multiline line breaks):

 (?<cmd>^"[^"]*"|\S*) *(?<prm>.*)? 

If you want to use it in your C # code, here it is properly escaped:

 try { Regex RegexObj = new Regex("(?<cmd>^\\\"[^\\\"]*\\\"|\\S*) *(?<prm>.*)?"); } catch (ArgumentException ex) { // Syntax error in the regular expression } 

He will analyze the following and find out what the command is compared to the parameters:

 "c:\program files\myapp\app.exe" p1 p2 "p3 with space" app.exe p1 p2 "p3 with space" app.exe 
0
source share

there is an answer to python, so we will have a ruby ​​answer :)

 require 'shellwords' Shellwords.shellsplit '"param 1" param2 "param 3"' #=> ["param 1", "param2", "param 3"] or : '"param 1" param2 "param 3"'.shellsplit 
0
source share

Regex: /[\/-]?((\w+)(?:[=:]("[^"]+"|[^\s"]+))?)(?:\s+|$)/g

Example: /P1="Long value" /P2=3 /P3=short PwithoutSwitch1=any PwithoutSwitch2

Such a regular expression can parse a list of parameters constructed by the rules:

  • Parameters are separated by spaces (one or more).
  • The parameter may contain a token ( / or - ).
  • The parameter consists of a name and a value, separated by a = or : symbol.
  • The name can be specified with alphanumeric and underscore characters.
  • The value may be missing.
  • If the value exists, it can be a set of any characters, but if it has a space, then the value must be specified.

This regular expression consists of three groups:

  • the first group contains integer parameters without a token,
  • the second group contains only the name,
  • the third group contains only the value (if it exists).

For the example above:

  • Total matches: /P1="Long value"
    • Group # 1: P1="Long value" ,
    • Group # 2: P1 ,
    • Group 3: " "Long value" .
  • Total matches: /P2=3
    • Group # 1: P2=3 ,
    • Group # 2: P2 ,
    • Group No. 3: 3.
  • Total matches: /P3=short
    • Group No. 1: P3=short ,
    • Group # 2: P3 ,
    • Group 3: short.
  • Total matches: PwithoutSwitch1=any
    • Group # 1: PwithoutSwitch1=any ,
    • Group # 2: PwithoutSwitch1 ,
    • Group 3: any.
  • Total matches: PwithoutSwitch2
    • Group # 1: PwithoutSwitch2 ,
    • Group # 2: PwithoutSwitch2 ,
    • Group 3: none.
0
source share
 \s*("[^"]+"|[^\s"]+) 

what he

-one
source share

(having read your question again, before publishing, I note that you are saying the LIKE line of the command line, therefore this information may not be useful to you, but as I wrote it, I will publish it in any case - please do not pay attention if I missed your question.)

If you clarify your question, I will try to help, but from the general comments that you made, I would say do not do this :-), you ask regexp to split the series of parmers into an array. Instead, I strongly recommend that you consider using getopt; there are versions of this library for most programming languages. Getopt will do what you ask and scale to manage the much more complex processing of arguments if you require it in the future.

If you tell me which language you use, I will try and send you a sample.

Here is an example of the source pages:

http://www.codeplex.com/getopt (.NET)

http://www.urbanophile.com/arenn/hacking/download.html (Java)

Sample (from java page above)

  Getopt g = new Getopt("testprog", argv, "ab:c::d"); // int c; String arg; while ((c = g.getopt()) != -1) { switch(c) { case 'a': case 'd': System.out.print("You picked " + (char)c + "\n"); break; // case 'b': case 'c': arg = g.getOptarg(); System.out.print("You picked " + (char)c + " with an argument of " + ((arg != null) ? arg : "null") + "\n"); break; // case '?': break; // getopt() already printed an error // default: System.out.print("getopt() returned " + c + "\n"); } } 
-3
source share

All Articles