Data Input (enhance_argparse)¶
General Background and Instantiation¶
Argparse is the native python library for parsing command line inputs, and we will be using this throughout out software suite. Argparse provides for a ton of flexibility, allowing you to apply custom defined functions to inputs in order to process them in some way. On the other hand, it has several glaring defficencies. Therefore, we will want somewhere to store all of our usual routines, in addition to data and methods that we add to supplement the module. There are three main things we want to handle:
- Parser data - We want to have the standard parser data organized sensibly.
- Updating variables - We want to be able to update the variables in specific ways.
- Dumping variables - We want to be able to dump the variables to a dictionary in specific ways.
- Variable dependency - we want to be able to demand certain relations between variables.
If you are not familiar with argparse, there are many excellent resources online. We will only illustrate how we are using it, via our class enhance_argparse.
Let us start assuming a command line argument of
$ python script.py params.txt -vname xx,yy -order 1,1 -halfd -schema mkruns
This will result in a sys.argv in python of
print sys.argv[1:]
['params.txt','-vname','xx,yy','-order','1,1','-halfd','-schema','mkruns']
Please note that I did include one positional argument, which is params.txt; this is required, and does not need a specifier; there will not always be such a variable. At present, we will normally pass lists assuming that they are a string with some delimiter. Normally, a comma will be the delimeter, but we may opt for both comma or spaces if the list is long. Argparse can naturally work with lists, but it seems less convenient and reliable than passing a string with a known delimiter.
Now let us see how argparse processes this:
[1]:
# assuming that sys.argv[1:] is given by sysargv in this example
sysargv=['params.txt','-vname','xx,yy','-order','1,1','-halfd','-schema','mkruns']
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-vname", type=lambda s: s.split(','))
parser.add_argument("-order", type=lambda s: [int(tt) for tt in s.split(',')])
parser.add_argument("-halfd","--half_delta", action="store_true")
parser.add_argument("-schema", choices=['mkruns','evaluate'],default='evaluate')
parser.add_argument("-dpath","--data_path")
parser.add_argument("file")
[1]:
_StoreAction(option_strings=[], dest='file', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
I have removed all the help statements, but you can see them in strain_derivative.py
if you want context. We have not parsed the arguments, as enhance_argparse will handle that. Now let us instantiate enhance_argparse:
[16]:
from enhance_argparse import enhance_argparse
eparser=enhance_argparse(parser,sysargv=sysargv)
args=eparser.get_instance()
args
[16]:
Namespace(data_path=None, dump_input=False, half_delta=True, input_file=None, order=[1, 1], schema='mkruns', vname=['xx', 'yy'])
Here eparser contains the original parser, and it will categorize its attributes in some pedagogical way. Furthermore, it will internally create an args instance based in the provided sysargv list, and you can access this via eparser.args
. However, it seems more intuitive to use a method to return args: you can do either. If you do not provide a sysargv list at the constructor, it will use sys.argv as default.
Organizing Parser Data¶
The first things eparser will do is find properties of the parser that you may want to know. If your variable has both a long and short form, argparse appears to only store the long version in the namespace. While you could get around this by manually setting multiple destinations, this would be inconvenient. Instead, we organize this information in a series of lists/dictionarys that facilitates the use of this information.
[3]:
print "Useful lists"
print eparser.single
print eparser.double
print eparser.single_wo_double
print eparser.optional
print eparser.const
print eparser.positional
print "\nUseful dictionaries"
print eparser.single_to_double
print eparser.hyphenate
Useful lists
['-h', '-vname', '-schema', '-dpath', '-order', '-halfd']
['--half_delta', '--data_path', '--help']
['-vname', '-schema', '-order']
['vname', 'schema', 'order', 'half_delta', 'data_path']
['half_delta', 'halfd']
['file']
Useful dictionaries
{'-halfd': '--half_delta', '-dpath': '--data_path', '-h': '--help'}
{'help': '--help', 'vname': '-vname', 'h': '-h', 'data_path': '--data_path', 'dpath': '-dpath', 'half_delta': '--half_delta', 'halfd': '-halfd', 'order': '-order', 'schema': '-schema'}
Each variable can have a single hyphen form, a double hyphen form, or both. Once again, argparse only stores the double hyphen form if given. So we need single, double, and single_wo_double to easily execute various tasks.
The next three lists, optional, const, and positional, categorize the different types of variables that have been provided. Here we always default to long form if given, consistent with argparse conventions.
The dictionary hyphenate is useful, as it takes a variable name, in either short or long form, and will tack on the proper number of hyphens. The dictionary single_to_double allows you to switch between long and short form, when both are present.
Hopefully one will not need to deal with the above data, as methods have been created to do all the usual tasks. Let us look at some of the methods.
Argument Data and Methods¶
All variables that have been changed from their default values can be accessed via a dictionary called modified.
[4]:
eparser.modified
[4]:
{'half_delta': True,
'order': [1, 1],
'schema': 'mkruns',
'vname': ['xx', 'yy']}
A useful method is parse
, which will take a raw input list and use argparse to parse it, and return a dictionary, but not update args.
[5]:
eparser.parse(sysargv[1:])
[5]:
{'half_delta': True,
'order': [1, 1],
'schema': 'mkruns',
'vname': ['xx', 'yy']}
Notice that data_path
is not shown, as it was not passed in as a variable to be parsed. Also, the positional variable is not shown given that it was required, and there would be no clear reason why we would change it in the future.
We will want to be able to update args in different ways. The simplest option is to provide a dictionary which will simply assign values: they will not be parsed.
[6]:
eparser.update({'order':(2,1)})
args
[6]:
Namespace(data_path=None, file='params.txt', half_delta=True, order=(2, 1), schema='mkruns', vname=['xx', 'yy'])
Care must be used because this allows you to change the nature of the variable and there is no check on the results. Normally, one will want to feed this with data that you have parsed:
[7]:
eparser.update(eparser.parse("-order 2,2".split()))
args
[7]:
Namespace(data_path=None, file='params.txt', half_delta=True, order=[2, 2], schema='mkruns', vname=['xx', 'yy'])
Sometimes you will want to directly update from a dictionary, but you will want to actually pass the dictionary through the parser to be sure that the inputs are processed correctly and checked. Therefore, we have a method called parse_dict to handle this:
[15]:
print eparser.parse_dict({'order':'3,1'})
print eparser.parse_dict({'half_delta':True})
print eparser.parse_dict({'dpath':'/home/cam1'})
print eparser.modified
{'order': [3, 1]}
{'half_delta': True}
{'data_path': '/home/cam1'}
{'schema': 'mkruns', 'half_delta': True, 'vname': ['xx', 'yy'], 'order': [1, 1]}
Another important method is dump
, which will return a dictionary, or yaml, of either all variables or only those from sysargv; and it has the option to convert tuples and lists back to comma seperated string formats. Here we illustrate this:
[9]:
print eparser.dump(option='modified')
{'half_delta': True, 'order': '2,2', 'vname': 'xx,yy', 'schema': 'mkruns'}
[10]:
print eparser.dump(option='all')
{'file': 'params.txt', 'vname': 'xx,yy', 'data_path': None, 'half_delta': True, 'order': '2,2', 'schema': 'mkruns'}
[11]:
print eparser.dump(list_to_str=False)
{'half_delta': True, 'order': '2,2', 'vname': 'xx,yy', 'schema': 'mkruns'}
Another common situation is when we want to provide a list of variables and return a dictionary of the processed results. The normal use case is when you are supplying a function with variables from args.
[12]:
print eparser.args_subdict("order schema".split())
{'order': '2,2', 'schema': 'mkruns'}
Finally, we have the possibility of introducing dependencies between variables.
[13]:
eparser.dependency("schema",['data_path'],ifparent='evaluate')
Here we see that if ‘schema’ is defined to a value of ‘evaluate’, then ‘data_path’ must have a value.
Methods for Handling Input Files¶
Given that we will often want to allow for a yaml dictionary of inputs, we have several methods to help automate this. We will now describe three simple methods that are called in the constructor if the variable setup_io=True
, though they can be individually used.
We begin with the method add_io_args, which adds relevant variables to the parser:
def add_io_args(self):
Our standard argument variable name for inputs and outputs are given as follows, and will be added by this method:
self.parser.add_argument("-i","--input_file", help="Load inputs from yaml file.")
self.parser.add_argument("-di","--dump_input", help="Dump modified args to yaml output and exit.",action="store_true")
This method must be called before any arguments are parsed.
We will then want to check if an input file has been given, and if so update args using it; and the following method will execute this few steps:
def proc_input_file(self,name='input_file',update_w_sysargv=True,strict=False):
You can see that it allows for a nonstandard name of the input file, and by default it will overide the input file values with whatever was provided on the command line.
If you want to dump all modified inputs, the following method will do those few steps if dump_input
has been set:
def dump_input_stdout(self,remove_key=['dump_input','input_file'],option='modified',list_to_str=True):
Please note that these methods are all very simple, and the only purpose is to avoid repeating oneself in many main programs. The above three methods would be mimicked in the code with the following:
# in the parser definition, we would have
parser.add_argument("-i","--input_file")
parser.add_argument("-di","--dump_inputs",action="store_true")
# if the user gives an input yaml file, we parse it, load it, and then reapply sys.argv
if args.input_file:
eparser.update(eparser.parse_dict(yaml.load(args.input_file.read())))
eparser.update(eparser.parse(sys.argv[1:]))
# dump the inputs if requested
if args.dump_inputs:
print(yaml.dump(eparser.dump(remove_key=['dump_inputs','input_file'],option='modified'),default_flow_style=False))
sys.exit()
Typical Main Program Example¶
Here we illustrate a typical main program structure, and show how inputs may be taken from a yaml file as well as the command line.
[17]:
# here is a sample sys.argv which would come from command line
sysargv=['-vname','xx,yy','-order','1,1','-halfd','-schema','mkruns']
import argparse,sys
from enhance_argparse import enhance_argparse
# define parser and add arguments
parser = argparse.ArgumentParser()
parser.add_argument("-vname", type=lambda s: s.split(','))
parser.add_argument("-order", type=lambda s: [int(tt) for tt in s.split(',')])
parser.add_argument("-halfd","--half_delta", action="store_true")
parser.add_argument("-schema", choices=['mkruns','evaluate'],default='evaluate')
parser.add_argument("-dpath","--data_path")
# instantiate our wrapper class
eparser=enhance_argparse(parser,sysargv=sysargv,setup_io=True)
# enforce whatever dependencies there are
eparser.dependency("schema",['order','vname'],ifparent='mkruns')
eparser.dependency("schema",['data_path'],ifparent='evaluate')
# define args variable for convenience
args=eparser.get_instance()
Sometimes we will have a lot of potential arguments, and the number of arguments might be a little bit much to always enter on the command line. Furthermore, you may build up some large command, and want to store it to rerun it later, as you may not be able to easily find it in your history. Of course, you can always put them in a file and then use some bash commands.
echo -vname xx,yy -order 1,1 -halfd -schema mkruns > inputs.txt
python script.py `cat inputs.txt`
However, it is nicer to take arguments from a yaml file.
$ cat inputs.txt
data_path: strained_xx_yy
half_delta: true
order: 1,1
schema: mkruns
vname: xx,yy
As defined above, we have an input flag which tells the code to read these variables:
python script.py -i inputs.txt
The yaml file may be some long template, and we may want to simply alter or define a single variable, so we can do
python script.py -i inputs.txt -dpath strain2_xx_yy
All the variables will be taken from inputs, but the value on the command line will always take precedence (this is just our convention). Finally, it should be noted that one can directly enter lists and lists of lists in yaml, and these will be converted to CSV entries and then parsed by argparse.
$ cat inputs.txt
data_path: strained_xx_yy
half_delta: true
order:
- 1
- 1
schema: mkruns
vname: xx,yy