Tuesday, May 5, 2015

Commands and channels in Clive: Sam is gone.

Recently I removed the port of the venerable Sam command language in Clive.
This command language can do things like iterate for all portions of text matching addresses indicated by regular expressions and then, for example, for each portion matching another expression, apply a command or loop in some other way.

As part of the Clive application framework (still WIP), I ported most of the commands to the new standard setup: commands have a series of I/O channels that forward interface{} data. When going outside of the Clive world, only []byte is actually written into off-limits channels.

By convention, commands expect data to be just []byte messages, forwarded through std. IO channels (similar to file descriptors in UNIX, but using -modified- Go channels).

Now, the interesting thing is the result for these conventions:

  • All commands forward all messages they don't understand. 
  • Commands finding files send not just file contents, but send a zx.Dir for each file found before actually sending the file data to standard output (more on this later in this post).
  • Only []byte messages are actually considered data, all other types can be used to let friend commands talk without disrupting others in the command pipeline.
To illustrate the consequences, we can run this pipeline in Clive:


gf /tmp/echo,1 | gx  '(^func[^\n]*\n)' '(^}\n)' | gg Run | 
gp '~m.*&1' | trex -u | pf -ws,

This is just an example, I'll explain the bits and pieces; but the interesting thing is how things can work together so in Clive we can use separate commands to do what in UNIX we would have to do in a single process, or would be a lot harder to split into separate commands.

The first command, gf /tmp/echo,1  issues a Find request to the ZX file system and sends to its std. output channel a stream of data for matching files. In this case, we asked for files under /tmp/echo with depth not greater than 1. Here, filtering of interesting files happens at the file server, which sends to the client machine a stream of directory entries for matching files plus data for each one (after its directory entry).

Now, gx is mostly a grep command. In this case, it simply sends to its output all text enclosed in text matched by the two given regexps. It can be used in other ways; the one in the example would report as a []byte in the output each function or method defined in the go source.
The funny thing is that gx (as called) also writes unmatched input to the output, but uses a string data type instead. Furthermore, it posts addresses (text lines and rune ranges) to its output, should the user want to print that.

The next command, gg, receives thus the entire text for the files found, but only the portions matched by the previous gx are []byte, and thus all other data is forwarded but otherwise ignored. This command sends to the output those messages containing the given expression. At this point, only functions containing Run are in the output as []byte.

To make the example more fun, gp greps directory entries in the stream and does not forward any file to the output unless its directory entry matches the predicate given. We could have given it to gf instead, but this is more fun. At this point, only functions containing Run in files with names starting with m and with a depth no greater than 1 are in the output. 

To have more fun, trex translates the input text to uppercase. Note that all unmatched data is still forwarded (but not as []byte), because we didn't suppress it from the stream in any of the previous commands.

Finally, pf -ws is a funny command that, one file at a time, reads all the file data into memory and, once the data is completed, writes the data back to the file (using its Dir entry as found in the stream to locate the file to write). We asked it to write the files and to include also data sent for them in string messages (and not just the "official" []byte messages). Thus, this is like a "save the edited files" command in Sam.

All the files processed by previous commands in the stream are written to disk at this point. Because the command buffers in memory the files going through the stream, we don't have the "cat f > f" effect of truncating the files by mistake.

The pipe-line may be silly, but it shows how bits and pieces fit together in Clive. I was not able to do these things in any other system I used before. Clive is fun. Go is fun.

The output of the pipe is also fun (using pf -x instead):

/tmp/echo/mecho.go:#174,#716:
FUNC RUN() {
        X := &XCMD{CTX: APP.APPCTX()}
        X.FLAGS = OPT.NEW("{ARG}")
        X.NEWFLAG("D", "DEBUG", &X.DEBUG)
        X.NEWFLAG("G", "DON'T ADD A FINAL NEWLINE", &X.NFLAG)
        ARGS, ERR := X.PARSE(X.ARGS)
        IF ERR != NIL {
                X.USAGE()
                APP.EXITS("USAGE")
        }
        VAR B BYTES.BUFFER
        FOR I, ARG := RANGE ARGS {
                B.WRITESTRING(ARG)
                IF I < LEN(ARGS)-1 {
                        B.WRITESTRING(" ")
                }
        }
        IF !X.NFLAG {
                B.WRITESTRING("\N")
        }
        OUT := APP.OUT()
        OK := OUT <- B.BYTES()
        IF !OK {
                APP.WARN("STDOUT: %S", CERROR(OUT))
                APP.EXITS(CERROR(OUT))
        }
        APP.EXITS(NIL)
}


 I admit that Clive is even more user-friendly than UNIX ;)
But I also admit I like that...


Thursday, March 26, 2015

Nested control requests in Clive ZX file trees

In a previous post I wrote about ZX file system stacks in Clive. Now I look into their control interfaces, which exhibit a nice feature shown here.

When a ZX file tree operates by relying on another ZX tree deeper in the stack, it adapts its control interface to behave as a facade for it.

For example, the /Ctl file of a ZX file tree can be user to see the status of the debug flags, to see who is using the file tree (in the cases of trees served to the network) and to see usage statistics for the system.

This is the result of reading /Ctl on a stack made out of a CFS that uses a MFS as a cache, and a RFS as the remote tree. The RFS was connected to a remote server exporting a CFS that uses a MFS cache to serve a LFS. What CFS, MFS, RFS, and LFS means is not really relevant. They are just ZX file servers that rely on underlying file servers to do its job (but, CFS is a caching file sever, MFS is a RAM FS, RFS is a remote file server, and LFS is a local file server).

I have removed some of the lines when they do not help to further illustrate the case.

> cat /zx/Ctl
cfs:
fdebug off
vdebug off
noperm off
stat 8220 calls 252 errs 16440 msgs 0 bytes
bgn: min    4.911µs avg  625.628µs max 1.333559776s
end: min    5.045µs avg  625.722µs max 1.333559891s
get 3147 calls 0 errs 16501 msgs 38660497 bytes
bgn: min   54.389µs avg 2.657265ms max 856.808309ms
end: min   63.349µs avg 5.141675ms max 856.841591ms
...
cmfs:
debug off
noperm off
stat 78449 calls 2721 errs 156898 msgs 0 bytes
...
lsub:
user rfs:193.147.15.27:65348 nemo as nemo on 2015-03-26
user rfs:193.147.15.27:65349 nemo as nemo on 2015-03-26
ldebug off
rdebug off
stat 2656 calls 0 errs 5312 msgs 0 bytes
...
mfs:lsub:
debug off
noperm off
stat 47588 calls 2487 errs 95176 msgs 0 bytes
...
lsub:
debug off
rdonly off
noperm off
stat 2854 calls 0 errs 5708 msgs 0 bytes
...

Now, note the lines starting with cfs:, cmfs:, and mfs:lsub:. Only the first section comes from the top-level CFS. Remaining lines come from such server reading the /Ctl file of the underlying ZX file tree and appending it to the data served to the user; and so on as we go down on the stack.
Thus, the user may inspect the ZX stack as deep as it wants.

In the same way, we can set debug on for CFS by executing

> echo debug on > /zx/Ctl

The ZX file server understands writes to the control file as textual commands
to perform control requests.

But we can also set debug on the third tree in the stack:

> echo pass pass debug on > /zx/Ctl

When a control request starts with pass, this word is removed from the request
and the server writes what remains of the control requests to the next tree in the stack.
This makes the control interface for all the trees in the stack available to the final user.



Tuesday, March 24, 2015

Go and unix traditions are a good match

Recently I had start writing a paper on Clive's ZX (more on this soon) and had to add citations in it. I'm writing the paper using wr(1), the Clive text formatter used to write manual pages, papers, and everything.

Because we had the entire set of bibliography entries from Plan 9 and Plan B in troff's refer format, I though wr could be teach to use them.

Being short in time, I added a new Go package to Clive that eats all the entries in the lsub refer bib directory, that have a format like

Article in conference proceedings
%A M. Bishop
%A L. Snyder
%T The Transfer of Information and Authority
in a Protection System
%J Proceedings of the 7th SOSP
%P 45-54
%D 1979


The wr/refs package exploits Go maps, strings, and slices to record a map of words to bib entries, which are also maps from keys (author, title, etc.) to strings. As easy as this:

// A reference maps from the key (eg. 'A') to values (eg. authors)
type Ref {
Keys map[rune][]string
}

// A bib maps from words found in references to references
type Bib {
refs map[string] map[*Ref]bool
all []*Ref
}

Note how easy is in Go to create sets of arbitrary types by using a map of that type to booleans.
Writing the code for these types feels very much like writing lisp.

Citing now in wr is as easy as writing [bib: plan 9 networks] in the source text.
Wr simply takes all  the words in the citation and looks them up in the Go map to locate entries matching all the words. And that's all that has to be done, the result looks like [1] in the text, and the reference is added at the end of the output as shown here.

REFERENCES

1. The organization of networks in Plan 9. D. Presotto, P. Winterbottom. USENIX Association. Proceedings of the Winter 1993 USENIX Conference. 1993.

We can now easily cite in manual pages, web pages, papers in PDF, ...

The UNIX tradition of using simple text files with line oriented format and using Go code to process them is a very powerful combination. I'm still amazed.


Thursday, March 19, 2015

Limiting the number of concurrent processes

In some cases, a task can be easily split into multiple concurrent child tasks, but it is not a good idea to spawn each of of the child tasks as a concurrent process.

For example, some commands in Clive go through a tree to synchronise it, or to poll each node for changes, etc. At each node in the tree, it is clear that the children can be processed concurrently. This reduces the total latency for the entire process. However, doing it this way would place a high load on the server at the other end of the line, and also on the line itself, and it might not be a good idea.

In Go it is fairly trivial to place a limit on the maximum number of concurrent tasks kept but still exploit concurrency up to that limit. The technique shown here is not new and, in fact, has been in use for decades, even in C for Plan 9, and probably in many other languages and by many other authors. Nevertheless, I think this tiny bit of code might help those that never saw it before.

The idea is to send functions through a channel and keep a fixed set of workers at the other end of the channel. All of them receive a function, call it, and loop again. The number of workers places the limit on the number of concurrent tasks running at the same time.

This creates the pool to run tasks:

func NewPool(n int) *Pool {
p := &Pool{calls: make(chan call)}
for i := 0; i < n; i++ {
go p.worker(i)
}
return p
}

A worker is simply

func (p* Pool) worker(id int) {
for c := range p.calls {
c.fn()
c.donec <- true
}
}

And we can run functions using this method:

func (p *Pool) Go(donec chan bool, fn func()) chan bool {
if donec == nil {
donec = make(chan bool, 1)
}
p.calls <- call{fn: fn, donec: donec}
return donec
}

Now the thing is fairly easy to use. Go closures make the interface of the previous method powerful enough to do anything and accept/return any input/output values.

For example:

// Create a pool of 5 workers
p := NewPool(5) 
// Run 20 functions in the pool
dc := make(chan bool, 20)
for i := 0; i < 20; i++ {
p.Go(dc, func() {
dprintf("running this...\n")
})
}
// Wait until all of them are done
for i := 0; i < 20; i++ {
<-dc
}

The testing code in this package plots a nice diagram of the execution, as a side effect of testing.
This is one such plot, time on the X axis and each task on the Y axis. 

trace: abcdeAfBCDEghijFkGHIJlmnoKpLMNOqrstPQRST
+----+                                  
 +-----+                                
  +-----+                               
   +-----+                              
    +-----+                             
      +--------+                        
           +-----+                      
            +-----+                     
             +-----+                    
              +-----+                   
                +--------+              
                     +-----+            
                      +-----+           
                       +-----+          
                        +-----+         
                          +--------+    
                               +----+   
                                +----+  
                                 +----+ 
                                  +----+

The tests run fake jobs like this one:

func fakefn(r rune, t time.Duration) {
n := atomic.AddInt32(&ncalls, 1)
calls[n-1] = r
dprintf("fake call %c\n", r)
time.Sleep(t)
r = unicode.ToUpper(r)
n = atomic.AddInt32(&ncalls, 1)
calls[n-1] = r
}

They record a trace of the sequence of calls in a string like

abcdeAfBCDEghijFkGHIJlmnoKpLMNOqrstPQRST

where lower-case indicate the start of a task and upper-case its end.
Thus, checking that the scheduling is as expected for the test is easy as is producing a plot for fun.

func plot(trz string, ncalls int) {
c := 'a'
dprintf("trace: %s\n", trz)
ch := map[bool]string{true: "-", false: " "}
for i := 0; i < ncalls; i++ {
st := c+rune(i)
end := unicode.ToUpper(st)
sted := false
for _, ev := range trz {
switch ev {
case st, end:
dprintf("+")
sted = !sted
default:
dprintf(ch[sted])
}
}
dprintf("\n")
}
}



Tuesday, March 10, 2015

File System Stacks in Clive

In clive, The different file systems and file system tools can be combined and used together to provide features like file access, backups, history dumps, and performance measurement. For example, this is a non-trivial stack out of the multiple ones we use:


This stack traces calls made to a caching file system that relies on a local file system to cache a remote file system (wait, there is more), the remote file system serves a caching file file system that relies on a local memory-only file system to cache a local file system (at the remote machine). Confusing, isn't it? But that is the nice thing, that you can build complex stacks out of simple file system trees.

This is an example of code:


             zx1, err := lfs.New("a tree", "/foo", lfs.RW)
             zx2, err := mfs.New("a cache")
             tr1 := trfs.New(zx1)
             tr2 := trfs.New(zx2)
             cfs, err := cfs.New("a cached tree", tr1, tr2, cfs.RW)
             err := <- cfs.Mkdir("/a/b", zx.Dir{"mode:": "0775"})
This code builds a different stack: Creates a ZX tree for an underlying UNIX tree at /foo, and then a ram-only file tree, a tree that traces calls to the first, one that traces calls to the second, and a caching file tree that relies on the two tracing trees. Now we can trace calls to a cached file tree and see the calls to each one of the trees (the cache and the cached ones).
There is yet another draft of how stacks work in Clive in the clive file system stacks TR found in the Lsub web site.

Tuesday, December 9, 2014

expr + test = xp

The clive command xp(1) replaces the venerable UNIX commands expr and test. It is interesting to note why. In few words, both commands are doing the same, and thus they are a single command in Clive.

The purpose of a command line calculator is to evaluate expressions. Now, the purpose of the test command in UNIX is also to evaluate expressions. Only that test knows how to evaluate expressions on file attributes.

Considering that in clive directory entries are generic maps from attribute names to attribute values, it seems that a single command can do both. The result is that we can do calculations on, for example, file sizes and file modification times in very much the same way we can calculate on floating point numbers. Or we can perform bitwise operations on file permissions. As a result, the grammar known to the calculator is also available for free to the "test" command, because it is the same command.

For example, it is clear that we can do things like
> xp 2 + 2
4
> xp 1k
1024
> xp 1m
1048576

But, we can use the same command to print metadata for files. For example, to print out
the file type:
> xp type .
d
> xp type xp.go
-

Now we can write general purpose expressions as we see they fit; e.g. is the size larger than 1Kbyte?

> xp size xp.go '>' 1k
false


Or, how many full Mbytes are used by a binary?

> xp size '"'`which xp`'"' / 1m
5

Another example. Let's check the mode for a file
> xp mode xp.go
0644

Now we can write an expression to get the write permissions,
> xp mode xp.go '&' 0222
128

or to see if any of them is set
> xp mode xp.go '&' 0222 '!=' 0
true

which can be done easier, by the way:
> xp w xp.go
true

Tuesday, November 4, 2014

Compare go source files in two file trees in clive

I'm still amazed at how pieces fit together in Clive. This is like a recursive diff in unix, but in Clive, and compares just source files in go as found within two trees:

    ; comp ',~\.go$' '/tmp/x,~\.go$'
    #diff comp.go:208: and /tmp/x/comp.go:207:
    -   x.diff()
    #diff: comp.go:209: and /tmp/x/comp.go:209:
    +  x.diff()
    ...

There are several things working together here
  • Commands accept names made of two elements: a path and a predicate (separated by a comma). In the example the predicates match the name against what would be "*.go" in UNIX.
  • Each name is issued to a finder to find a stream of directory entries
  • Directory entries are self-describing, and thus a stream of directory entries is enough to reach the different file servers involved in the name space used for the search.
  • File and directory contents may be retrieved along with the find request; they are streamed through channels, like everybody else.
  • Directories always report and find entries in the same order, i.e. they are always sorted
As a result, the compare command only has to use a library function to make each name become a stream and then compare the two streams.