Thursday, July 11, 2013

A memory trace for command execution in Nix


This is the output for executing

% cat /dev/alloc

in the steady state. That is, after executing the same command a few times so that all the previous things that could be cached are indeed cached. The file used, by the way, is reporting the statistics of several allocators in Nix, including the page allocator. So there is more information in the trace described here than just the set of faults and pages allocated or deallocated.

The events include everything since we typed the command until the shell prints out its prompt.
Thus, there is no microbenchmark trick going on.

The first faults are due to fork, used by rc to spawn the child process. Then, we have more faults for the child while it proceeds to execute the cat command. Pid 21 is the original shell process; pid 27 is the child that will execute pwd.

Let's see the trace before and then make a few remarks near the end of this post.

A fault caused because fork makes the data segment memory shared, but copied on reference. Here the parent references a data page:

fault pid 21 0x400000 r
fixfault pid 21 s /bin/rc sref 1 Data 0x400000 addr 0x400000 pg 0x3fe78000 r2 n1
newpage 0x400000 -> 0x3fe24000 src 0xfffffffff018711c
fixfaulted pid 21 s /bin/rc Data 0x400000 addr 0x400000 pg 0x3fe24000 ref 1


The child starts executing and references a new text address:

fault pid 27 0x20f9e0 r
fixfault pid 27 s /bin/rc sref 9 Text 0x200000 addr 0x20c000 pg 0x1160000 r1 n1
fixfaulted pid 27 s /bin/rc Text 0x200000 addr 0x20c000 pg 0x1160000 ref 1


Another fault in the parent due to the copy on reference. This is because we flushed the MMU state during fork, but the page is already there, there is nothing to do to handle this type of fault, other than updating the MMU state to refer to the page, which is already there.

fault pid 21 0x404b30 w
fixfault pid 21 s /bin/rc sref 1 Data 0x400000 addr 0x404000 pg 0x3fe74000 r2 n1
newpage 0x404000 -> 0x3fe64000 src 0xfffffffff018711c
fixfaulted pid 21 s /bin/rc Data 0x400000 addr 0x404000 pg 0x3fe64000 ref 1


The child continues executing and references a new text address:

fault pid 27 0x20a9d9 r
fixfault pid 27 s /bin/rc sref 9 Text 0x200000 addr 0x208000 pg 0x1188000 r1 n1
fixfaulted pid 27 s /bin/rc Text 0x200000 addr 0x208000 pg 0x1188000 ref 1

The child continues executing and references a data address:

fault pid 27 0x400060 w
fixfault pid 27 s /bin/rc sref 1 Data 0x400000 addr 0x400000 pg 0x3fe78000 r1 n1
fixfaulted pid 27 s /bin/rc Data 0x400000 addr 0x400000 pg 0x3fe78000 ref 1

Another fault in the parent due to the copy on reference, like before.

fault pid 21 0x409d5c r
fixfault pid 21 s /bin/rc sref 1 Data 0x400000 addr 0x408000 pg 0x3fe70000 r2 n1
newpage 0x408000 -> 0x3fe68000 src 0xfffffffff018711c
fixfaulted pid 21 s /bin/rc Data 0x400000 addr 0x408000 pg 0x3fe68000 ref 1

The child continues executing and references a new text address:

fault pid 27 0x202aec r
fixfixfault pid 27 s /bin/rc sref 9 Text 0x200000 addr 0x200000 pg 0x10cc000 r1 n1
fixfaulted pid 27 s /bin/rc Text 0x200000 addr 0x200000 pg 0x10cc000 ref 1

Another fault in the parent due to the copy on reference, like before.

fault pid 21 0x40c178 r
fault pid 21 s /bin/rc sref 1 Data 0x400000 addr 0x40c000 pg 0x3fe4c000 r2 n1
newpage 0x40c000 -> 0x3fe6c000 src 0xfffffffff018711c
fixfaulted pid 21 s /bin/rc Data 0x400000 addr 0x40c000 pg 0x3fe6c000 ref 1

The child continues executing and references a data address:

fault pid 27 0x40a738 r
fixfault pid 27 s /bin/rc sref 1 Data 0x400000 addr 0x408000 pg 0x3fe70000 r1 n1
fixfaulted pid 27 s /bin/rc Data 0x400000 addr 0x408000 pg 0x3fe70000 ref 1

The child continues executing and references a new text address:

fault pid 27 0x212b47 r
fixfault pid 27 s /bin/rc sref 9 Text 0x200000 addr 0x210000 pg 0x118c000 r1 n1
fixfaulted pid 27 s /bin/rc Text 0x200000 addr 0x210000 pg 0x118c000 ref 1

The child continues executing and references a data address:

fault pid 27 0x404b30 w
fixfault pid 27 s /bin/rc sref 1 Data 0x400000 addr 0x404000 pg 0x3fe74000 r1 n1
fixfaulted pid 27 s /bin/rc Data 0x400000 addr 0x404000 pg 0x3fe74000 ref 1

The child continues executing and references a data address:

fault pid 27 0x40c17c r
fixfault pid 27 s /bin/rc sref 1 Data 0x400000 addr 0x40c000 pg 0x3fe4c000 r1 n1
fixfaulted pid 27 s /bin/rc Data 0x400000 addr 0x40c000 pg 0x3fe4c000 ref 1


We are now doing the exec in the child process...



pgfree pg 0x3fe78000
pgfree pg 0x3fe74000
pgfree pg 0x3fe70000
pgfree pg 0x3fe4c000

The child references a data address, which is not paged in from the file. The diagnostic indicates that the page is paged in from the cached image of the executable file, but no I/O was required. Simply, a new page is allocated and initial values for data are copied.

fault pid 27 0x400018 w
fixfault pid 27 s /bin/cat sref 1 Data 0x400000 addr 0x400000 pg 0x0 r0 n-1
pagein pid 27 s 0x400000 addr 0x400000 soff 0x0
newpage 0x400000 -> 0x3fe4c000 src 0xfffffffff018711c
fixfaulted pid 27 s /bin/cat Data 0x400000 addr 0x400000 pg 0x3fe4c000 ref 1

0/67108864 cache bytes
0/10 cache segs 0/0 reclaims 0 procs
0/3 rpcs
22/23 segs
8/10 text segs 0/0 reclaims
1071140864 memory
15663104 kernel
0/0 1G pages 0 user 0 kernel 0 bundled 0 split
0/502 2M pages 0 user 0 kernel 0 bundled 0 split
134/236 16K pages 119 user 15 kernel 0 bundled 0 split
61/61 4K pages 0 user 61 kernel 884 bundled 0 split
6/6 pgas
1447/1509 pgs
66/67 path
69/70 chan
11780256/14229504 malloc 1 segs
0/8388608 ialloc bytes
61/61 mmu 4096 pages

And we are done.

pgfree pg 0x3fe4c000


There are a few things to note here. One important thing is that there are no faults in the stack segments. That is because we copy stack memory and pre-install the entries in the MMU. Stacks in nix are small and it pays to copy them.

Another thing to note is that the text is not paged in. It is served from the cache. Thus, only updating the MMU is needed.

Note also how page allocation and deallocation is quite reduced, compared to what it could be. The reason is that stacks retain their memory, and that the page size is 16K, and so, not many page allocations are required.

Also, if you compare this with the trace from a standard Plan 9, you will notice other effects, described in a previous post in the Nix memory management TR.