jkawk.py

SOURCE: jkawk.py

jkawk.py is a awk emulator written in python. Awk is a line editor, which basically executes some code on every line of a file. While Awk is very powerful, it is cumbersome in that one needs to learn yet another syntax. This is circumvented with jkawk.py, which functions like Awk but takes python syntax. Some basic things to know:

  • Every line is stored as a string named ii.

  • Every line is converted into a list called l.

    • If all entries on a line are numbers, everything is floated.

  • The file can be piped or appear anywhere on the command line.

  • The variable NR gives the row number.

  • The variable NF is the number of fields in the row.

  • The path is given by the variable path.

  • The working directory is given by cwd.

  • Each line is successively stored in a list-of-list named m.

  • Each line is successively stored in a string named t.

  • One may do preprocessing by given a string which starts as “b-print ‘hello world’”.

  • One may do postprocessing by given a string which starts as “e-print ‘hello world’”.

  • If you are writing conditionals or loops, you may need to indent. You can directly tpye \n to create a new line, or use the convenient shortcut of “|”. For example, we could do a preprocessing step:

$ jkawk.py 'b-for i in range(40):|  for j in range(40):print i,j'
  • For convenience, the often typed “print” can be replaced by the shortcut pp.

Let us now look at some silly examples:

  • Print the 1st and 3rd column of a file while adding 5 to the first column:

$ jkwawk.py 'pp l[0]+5,l[2]'
  • Split a string and do something:

$ echo one/two/three 4 5 | jkawk.py 'pp l[0].split("/")[-1],l[-1]+2'
$ --> three 7.0
  • Take the derivative of a function:

$ jkawk.py 'if NF>1 and NR>1:pp l[0],(l[1]-m[-2][1])/(l[0]-m[-2][0])'
  • Fold text such that the following $1 rows below the first are shifted to the first row, etc:

$ jkawk.py 'b-a=0' 'print ii, |if a=='$1':print "";a=0 |else: a+=1'
  • Transpose a file:

$ jkawk.py 'e-for i in range(len(m[0])): |  for j in range(len(m)):print m[j][i],|  print ""'
  • Integrate a function which is given on a uniform grid:

# echo this print the zerothe and first moments for functions x y1 y2 y3 y4...
# echo output is x y1m0 y1m1 y2m0 y2m1...
$ jkawk.py 'e-dw=m[1][0]-m[0][0];ss=[0]*NF;sm=[0]*NF |for i in m: | print "%.5f"%i[0],|
            for j in range(NF-1): ss[j]+=i[j+1]*dw;sm[j]+=i[j+1]*i[0]*dw;
            print "%.5f %.5f"%(ss[j],sm[j]), | print ""'

inputs.py

SOURCE: inputs.py

This is the main class that I use to handle inputs variables. The docstrings have a fair amount of information so hopefully this is clear. Example usage would be:

from inputs import inputs

av=inputs("U=float dos=bool t=1.0 ")

av.update(sys.argv[1:])

if dir(av).count("U"):print "U=%s and t=%s"%(av.U,av.t)

Copyright (C) 2019 Marianetti Group

class inputs.inputs(input, pre_exe='')

this is a class which defines a set of variables. One may either provide default values or just define the data type. An example input string would be: aa=int bb=str cc=4.3 In this case the variables aa and bb have no deafults so the data type is given.

count(name)

This method checks if a variable has been assigned.

get_dict(inp=[])

Create a dictionary of the initialized variables and return it. If the list inp is given then only include the subset of initialized variables which are specified by inp.

get_func_args(func, strict=True, required=None)

This method takes a function as an input, finds the argument list, looks in the current assigned variables to see which are assigned, and then returns a dictionary of them. If running in strict mode, it will halt if all required parameters are not passed. If the variable ‘required’ is passed, the method will demand that the variable is present, even if optional in the function; if a string is passed, it is split on whitespace; otherwise a list is expected.

ifblank(inp)

This method provides a default value to a string variable if it is initiated but blank. This is useful if one does not want to set a default, but often uses the same value.

remap(inp, overwrite=False)

This method will map one variable to another. This can be useful if shortcuts are used for input names to methods. The input is given as a string like this: pos->struct ham->hopping Input will be split on whitespace. It should be noted that you remap to a variable that does not exist, as well as one that does. If overwrite is true, then the method will overwrite an existing variable that is assigned.

report(inp='inputs.out')

Write out all initialized variables to a file.

update(inp, pre_exe='', forbid_unknown=True, enforce_convention=True)

This method updates the value of the variables. The variable inp may be a string that gives a file which contains a string block of variable updates or it can be a string that contains at least one equals sign (=) or inp can just be a list. forbid_unknown will make the method halt if one gives a variable to update that is not one of the known variables. enforce_convention will halt program if inp does not follow the acceptable convention (ie. all entries must contain = or be preceded by -variable).

These variables may be updated by supplying a inp with entries like: [‘a=2’,’b=hi’] or [‘-a’,’2’,’-b’,’hi’] or [‘a=2’,’-b’,’bye’,’cc=5.5’] or “a=2 b=hi” etc… Often one does an update using sys.argv[1:].

robo_run.py

webscitobib.py

arxiv_to_bib.py

mysub.py

These are a few simple routines that may be useful. The documentation is given in the docstrings.

mysub.html_table(inp, header=1, cap='', merge='')

Print out a formatted table of strings with minimum colum width

class mysub.inputs(input, pre_exe='')
mysub.is_number(s)

determine if a text string is a number

mysub.linefit(x1, y1, x2, y2, x=None, y=None)

This method takes two points and makes a line. It will either return a tuple of the slope/intercept, or if one supplies x or y, then it will plug in and return the other variable.

mysub.pmatrix(input, prec=2, out='')

Print matrix for viewing. The variable prec specifies the output precision, while out will write the output to a file

mysub.str_table(inp, sp=[], buf=2, sep=[])

Print out a formatted table of strings with minimum colum width

generate_machs.py

This is a simple script to generate info about our cluster. It prints the machine information in four formats, depending on which options you select at the prompt:

  1. .ssh/config: This will ask some questions about your network configuration (whether you’re on grandcentral, your username, etc.) then prepare a file for ~/.ssh/config for each machine in the cluster. It does not overwrite any other machines in ~/.ssh/config, unless they happen to have the same name as one of the machines in the cluster.

  2. machlist : This prints out a list of machines for boomerang.

  3. ping: Lets you know which machines are not responsive

  4. rst: This prints out a table of each machine, IP address, etc. for inclusion in the group documentation.

Cluster scripts

See “New computers and users” for documentation and how to use this script.

Simple Scripts

rme.py random_sub.py prow.py pcol.py len getvalue format.py format.sh derivative.py derivative.sh interpolate.py cumul.py cumul.sh convolve.py tam.py