Tuesday, October 21, 2014

Conventions can do more with less: Clive options continued.

In a previous post I discussed a new package in Clive, opt, which deals with command arguments.
The package turned to be very easy to use and has unified how clive commands are used beyond understanding a common option syntax.

When defining new options, the user gives a pointer to a Go value, a name, and a help string:

func (f *Flags) NewFlag(name, help string, vp interface{})

The help string follows the convention that

  • for options with no arguments, it simply describes the option.
  • for options with arguments, it starts with  "argument name:" followed by the description of the option.
This allows the usage method to exploit this convention to build a usage string more descriptive than the ones in other Go programs. For example, compare the old

usage: flds -options [file...]

with the new

usage: flds [-1D] [-F sep] [-o sep] {-r range} [file...]

The method writing the usage description scans the defined flags for those that have a single rune as a name, and no argument, and combines them like in "-1D" above. Then it goes over flags that have longer names, or have arguments, and adds the usage for each one in the same line. The name of the argument is taken from the start of the help string (if found, or a generic name for the type of argument [in Go] is used). When arguments may be used more than once, curly brackets are printed instead of square ones. The method knows this because the value pointer given to NewFlag is a pointer to a slice of values.

Thus, there is no need to keep the usage summary synchronised with the source, it is always in sync.

But this convention has done even more work for Clive. Manual pages in clive (for all sections but the go packages section) are written in wr(1). Recently we have
  • added a feature to the opt package so that "-?" can never be a valid flag, which makes all programs inform of their usage when called with this flag.
  • replaced all the "synopsis" subsections in all command man pages with a new "usage" one.
The new manual pages start with something like:

_ LNS(1): print lines

* USAGE

[sh
lns -? >[2=1] | lns -r 1,-2
]
...

Here, a shell escape runs the command to make it inform about its usage, and all the output lines but for the last one (which is an exit status in clive) are inserted into the manual.

After doing so, we could remove most option description lists from manual pages, and we no longer have to document the usage of a command anymore. Plus, we are sure that the usage matches the binary installed in the system. For example, the manual for lns looks like:

LNS(1): PRINT LINES

USAGE

    usage: lns [-Dn] {-r range} [file...]
    -D: debug
    -n: print line numbers
    -r range: print this range
    range is addr,addr or addr
    addr is linenb, or +linenb, or -linenb
    

DESCRIPTION
...

Once again, we could do more by actually doing less.

Tuesday, October 14, 2014

Simple things are more powerful. The clive cmd/opt package.

The golang flag package is very powerful. It makes it easy to process your command arguments.
We have been using it for a long time to process Clive command arguments.

But, to say it in a few words, it is not able to operate on different argument vectors in a reasonable way (i.e., using the same interface used for processing the actual command line), it cannot handle combined -abc style flags instead of the utterly verbose -a -b -c, it cannot handle flags that repeat, and it is far more complex than needed. The clive opt package does all this by actually doing less.

For example, the flag package operates on os.Args to retrieve the arguments.  Using another array requires using another interface. Instead, opt takes always a []string and works on it. This makes it work on any given set of arguments, which makes it unnecessary to provide any functionality to process "sub-arguments" or "different command lines". Instead, the interface is used against the desired set of arguments.

Looking at the interface of flag provides more examples of what this post is about. There are multiple StringVar(), String(), IntVar(), Int(), ... functions to define new flags. There are many such functions and they define flags globally. Instead, opt provides a single method

    func (f *Flags) NewFlag(name, help string, vp interface{})

to define new flags. It can be used as in

        debug := false
        opts.NewFlag("d", "debug enable", &debug)
        odir := "/tmp"
        opts.NewFlag("o", "output dir, defaults to /tmp", &odir)

which has several advantages:

  1. The flag is defined on a set of flags, not globally. This permits using different sets of flags yet the code is the same that it would be to define global flags, as discussed above.
  2. There is a single way of defining a new flag. This is a benefit, not a drawback.
  3. There is a single method to call for any flag, not many to remember.
  4. It does not re-do things the language is doing. For example, a default value is simply the value of the variable unless the options change it. Go already has assignment and initialisation, thus, it is not necessary to supply that service in the flag definition. 
  5. It is not possible to introduce errors due to initialisation of option variables not matching the default value given to the flag definition call.
The implementation has also important differences. The flag package is an elaborate set of internal types defined to provide methods to handle each one of the flag types. Adding a new flag requires defining types and methods and must be done with care. A consequence is that Flag has 850 lines and is harder to understand. Opt has 356 lines. In both cases including all the documentation.

Instead of the approach used in flag to define and implement the arguments, opt relies on a type switch on the pointers given when defining each one of the flags. All the package does during parsing is to set the pointer to the value found in the argument, using the type of the pointer to determine how to parse the value by default. This is the an excerpt of the code:

switch vp := d.valp.(type) {
case *bool:
*vp = true
if len(argv[0]) == 0 {
argv = argv[1:]
} else { // put back the "-" for the next flag
argv[0] = "-" + argv[0]
}
case *string:
nargv, arg, err := optArg(argv)
if err != nil {
return nil, fmt.Errorf("option '%s': %s", d.name, err)
}
argv, *vp = nargv, arg
case *[]string:
nargv, arg, err := optArg(argv)
if err != nil {
return nil, fmt.Errorf("option '%s': %s", d.name, err)
}
argv, *vp = nargv, append(*vp, arg)

And that suffices. There is another type switch when a flag is defined to make sure that the pointer type can be handled later by the package during parsing. Because of this, the code walking the arguments trying to parse each option can be fully shared and is not highly coupled with the per-option parsing code.

What about more complex arguments? Easy: the program may define string arguments and then parse them as desired. Or, if the new argument type becomes very popular, it can be added to opt by

  1. Adding an empty case to the type switch of NewFlag (to check the pointer type for validity)
  2. Adding a new case to the type switch of Parse (the excerpt shown above) to actually do the parsing.

All this is not to say that opt is the greatest argument processing package. In fact, it is just born and it is likely that there are still bugs and things to improve. All this is to say that more can be done by doing less in the interface and in the implementation of the package considered.

Go is a very nice language. I'd like its interfaces to be as clean, tiny and powerful as possible. If we want complex interfaces and implementations, we know where to find Java and C++.

Friday, October 10, 2014

Clive for users: finders, trees, and name spaces


  1. The first example shows that listing all sources files under /zx/sys/src can be done with this command 

  2.              ; lz /zx/sys/src/,
                 /zx/sys/src
                 /zx/sys/src/clive
                 /zx/sys/src/clo
    
    
    
    Here, describing a file (or a set of files) is done by using a combination of a name and a predicate. In this case, all files starting at /zx/sys/src and matching an empty predicate (the empty string) are listed. 

    Or we can list only directories using
                              ; lz /zx/sys/src/,type=d

    Or we can remove all regular files in a hierarchy by using
                 ; rm /zx/sys/src/,type=-
    
    
    
    Commands operating on files find the involved directory entries and then operate on them. A directory entry is a set of name/value pairs, and may have any number of attributes with names and values chosen at will. Of course, there are some conventions about which attributes are expected to be there (like size, mode, etc). 

    Predicates used to find files are general expressions that may use directory attributes as values, which makes it easy for a command to issue an expression to find the entries of interest in a single or a few RPCs. Directory entries are self-describing entities (eg., they report also the address of the server and the name of the resource in the server). 

    This makes it easy for a program to issue requests for a directory entry it found. In short, file trees in clive are split into two important entities: Finders used to find directory entries, and  file trees that accept operations for directory entries

  3. Each process groups one or more finders into a name space, built from a textual representation (it might inherit the name space from the parent). For example, we can use 

      • ; NS=’/ /
      • ;;  /zx tcp!zxserver!zx
      • ;;  /dump tcp!zxserver!zx!dump
        ;; ’
        ; lz /zx/usr/nemo,type=d

  4. to define a new name space and then issue commands that work in it. In this example, we defined as / the root of the host OS file tree, and then mounted at /zx our main tree and at /dump its dump file system. To say it in a different way, the name space is a finder that may groups other finders (among other things). The name space is more powerful, and can mount at a given name a set of directory entries (be they for files or not), but the example suffices for now.