Report invalid parameters first (or use regular expressions) with the python argparse module

When using the argparse module in Python, I'm looking for a way to catch invalid parameters and report them better. The documentation https://docs.python.org/3/library/argparse.html#invalid-arguments gives an example:

parser = argparse.ArgumentParser(prog='PROG' parser.add_argument('--foo', type=int) parser.add_argument('bar', nargs='?') # invalid option parser.parse_args(['--bar']) usage: PROG [-h] [--foo FOO] [bar] PROG: error: no such option: --bar 

However, this is pretty easy to turn off, because bad parameters are not reported at first. For instance:

 import argparse import datetime def convertIsoTime(timestamp): """read ISO-8601 time-stamp using the AMS conventional format YYYY-MM-DDThh:mm:ssUTC""" try: return datetime.datetime.strptime(timestamp,"%Y-%m-%dT%H:%M:%SUTC") except: raise argparse.ArgumentTypeError("'{}' is not a valid ISO-8601 time-stamp".format(timestamp)) parser = argparse.ArgumentParser() parser.add_argument('startTime', type=convertIsoTime) parser.add_argument('--good', type=int, help='foo') args = parser.parse_args(['--gold','5','2015-01-01T00:00:00UTC']) 

will report:

 error: argument startTime: '5' is not a valid ISO-8601 time-stamp 

When I prefer it to report more useful:

 error: no such option: --gold 

Can this be achieved? It seems to me that this is a fairly simple use case. When writing argument parsers directly, I usually use a template so that everything starting with the option prefix, which is not a known option, is immediately discarded. For example, in bash

 # Process command-line arguments while [ $# -gt 0 ]; do case "$1" in --debug) DEBUGOPTION="--debug" shift break;; --) shift break;; --*) handleUsageError "$1" shift;; *) break;; esac done 

I believe argparse uses regular expressions internally, but I don't think they are accessible via add_argument ()

Is there a way to easily make an equivalent using argparse?

+5
source share
1 answer

Short answer: parse_args uses parse_known_args . This method allows you to handle unknown arguments, such as --gold . As a result, argument type errors occur before unknown arguments .

I added a solution that includes subclassing ArgumentParser and changing the method in its call stack.


I will try to describe parse_args in relation to your example.

The first thing he does is to classify the strings as O or A Simply put, those starting with - are O , others A It also tries to match O tags with a specific argument.

In your example, it finds OAA . Regex is used to match this string with patterns defined by the nargs argument. (if necessary, I can explain this step in more detail)

--gold does not match; at some point (whether in this initial loop or later), he gets on the extras list. (See the code for details).

For the second loop through rows, it alternately tries to process columns and options.

When you try to match 5 with starttime , that your Action class raises a type error that propagates before usage and exit print. Since --gold not defined, 5 not used as an optional argument. Thus, it is parsed as the first position line. (Some types of options accept 0 arguments, so it does not assume anything that matches the --... option argument).

I think without 5 , the last line will match. parse_known_args will return with --gold in the extras member. parse_args uses parse_known_args , but throws an error when extras not empty.

Thus, in a sense, the parser detects both errors, but it is starttime , which causes an error message. He waits until the end of the complaint about the unrecognized --gold .

As a general philosophy, argparse does not attempt to detect and present all errors. It does not compile a list of errors for presentation in one final comprehensive message.

I will review the code to check the details. I don’t think you can easily change the basic parsing template. If I think about how to get the previous error unrecognized option , I will edit this answer.


def _parse_optional(self, arg_string): attempts to classify the string argv . If the string looks like positional , it returns None . If it matches Action option_string, it returns a tuple '(action, option_string, None) `with the corresponding action. Finally, if not matched, it returns:

  # it was meant to be an optional but there is no such option # in this parser (though it might be a valid option in a subparser) return None, arg_string, None 

I think this is happening with your --gold . Pay attention to the reason why it may be valid.

This function is called

 def _parse_known_args(self, arg_strings, namespace): ... for i, arg_string in enumerate(arg_strings_iter): .... option_tuple = self._parse_optional(arg_string) if option_tuple is None: pattern = 'A' else: option_string_indices[i] = option_tuple pattern = 'O' arg_string_pattern_parts.append(pattern) ... # at the end # return the updated namespace and the extra arguments return namespace, extras 

collection of this template 'AOO' , as well as a list of these tuples.

In the second cycle, it alternates between consuming positions and options. A function that consumes an optional parameter:

 def consume_optional(start_index): option_tuple = option_string_indices[start_index] action, option_string, explicit_arg = option_tuple if action is None: extras.append(arg_strings[start_index]) ...otherwise... take_action(action, args, option_string) 

As I wrote earlier, your --gold falls into the extras list, and 5 remains in the list of arguments that can be analyzed as positional.

namespace and extras are passed through parse_known_args you, the user, or parse_args .

Presumably you can subclass ArgumentParser and define a modified _parse_optional method. This may cause an error instead of returning this tuple (None, arg_string, None) .

 import argparse import datetime class MyParser(argparse.ArgumentParser): def _parse_optional(self, arg_string): arg_tuple = super(MyParser, self)._parse_optional(arg_string) if arg_tuple is None: return arg_tuple # positional else: if arg_tuple[0] is not None: return arg_tuple # valid optional else: msg = 'error: no such option: %s'%arg_string self.error(msg) def convertIsoTime(timestamp): """read ISO-8601 time-stamp using the AMS conventional format YYYY-MM-DDThh:mm:ssUTC""" try: return datetime.datetime.strptime(timestamp,"%Y-%m-%dT%H:%M:%SUTC") except: raise argparse.ArgumentTypeError("'{}' is not a valid ISO-8601 time-stamp".format(timestamp)) # parser = argparse.ArgumentParser() parser = MyParser() parser.add_argument('startTime', type=convertIsoTime) parser.add_argument('--good', type=int, help='foo') args = parser.parse_args(['--good','5','2015-01-01T00:00:00UTC']) print(args) args = parser.parse_args(['--gold','5','2015-01-01T00:00:00UTC']) 

produces

 1505:~/mypy$ python3 stack31317166.py Namespace(good=5, startTime=datetime.datetime(2015, 1, 1, 0, 0)) usage: stack31317166.py [-h] [--good GOOD] startTime stack31317166.py: error: error: no such option: --gold 

The subclass for providing custom actions is good argparse (and Python).

If you want to take a closer look at this case with Python developers, consider writing a bug/issue (in PEP for more advanced formal ideas). But there is quite a lag behind argparse errors / patches and a lot of caution regarding backward compatibility.


http://bugs.python.org/issue?%40columns=id%2Cactivity%2Ctitle%2Ccreator%2Cassignee%2Cstatus%2Ctype&%40sort=-activity&%40filter=status&%40action=searchid&ignore=file%3Acontent&%40search_text_pc search & status = -1% 2C1% 2C2% 2C3

is a list of errors / issues that reference _parse_optional . Possible changes include how ambiguous options are handled. (I will scan them to see if I have forgotten anything. Some of the patches are mine.) But, using super , my proposed changes are independent of the changes inside the function. This only affected the changes in how the function is called and that it returns, which is much less likely. By submitting your own problem, you at least bet the developers that someone is dependent on this interface.

+1
source

All Articles