System Software: data structures

Showing posts with label data structures. Show all posts

Wednesday, May 28, 2014

bytes.Buffer or builtin concatenation to build strings in Go?

During profiling of clive Go code I have found an interesting bit.
This is the same old discussion about deciding on using a string buffer or using raw string concatenation to build strings.

A Dir is actually a map[string]string, and, the network format is a string with form

attr1=val1 attr2=val2 ....

The function Dir.String() builds the string from the map. The question is:

Do we use bytes.Buffer and Fprintf to append data to the string, asking at the end to Buffer.String the resulting string? Or do we use a native Go string and concatenate using "+="?

Using bytes.Buffer and Fprintf in Dir.String() yields:
BenchmarkDirString 500000 5163 ns/op

Using strings and +=
BenchmarkDirString 500000 5077 ns/op

Surprisingly (perhaps), it's both easier to write and faster to use the strings and forget
about using the bytes.Buffer. It's likely this will place more pressure in the GC, because it builds and discards intermediate strings, but, it's faster and easier.

Saturday, July 13, 2013

A description of the selfish Nix allocator

This TR from the Lsub papers page describes the Nix allocator and provides a bit of initial evaluation for it. In short, half of the times allocation of resources could be done by the allocating process without interlocking with other processes or cores and without disturbing any other system component. Plus other benefits described in the TR.

Here is a picture of the allocator as a teaser:

Thursday, July 11, 2013

Selfish processes

There is an important optimization not described in previous posts, and not
considered in the evaluation and the traces shown there. The idea is
to let processes keep a few of the resources they release in case
they are needed later.

In particular, we modified the process structure to keep up to 10 pages
(of the size used for user segments). When a process releases a page
and has less than 10 pages kept, it simply keeps the page without
releasing it. Later, if a new page is needed it would first try to use
one from the per-process pool. The pool is not released when a process dies.
Instead, the pool is kept in the process structure and will be used
again when a new process is allocated using it.

The trace output taken after applying this optimization shows that
most of the pages are reused, and that for small cached programs about 1/3
of the allocations are satisfied with the per-process pool. Thus,
the contention on the central page allocator is greatly reduced with this
change.

Per process resource pool should be used with care. For example, our
attempts to do the same with the kernel memory allocator indicated that
it is not a good idea in this case. Memory allocations have very different
sizes and some structures are very long lived while others are very short
lived. Thus, what happen was that memory was wasted in per process pools
and, at the same time, not many memory allocations could benefit from this
technique.

In general, per-process allocation pools are a good idea when the structures
are frequently used and have the same size. For example, this could be
applied also to Chan and Path structures as used on Nix.

Thursday, July 4, 2013

Memory management in Nix Mark IV

There is a draft TR describing how the memory management has been fully reworked in Nix mark IV. The resulting system is faster, suffers a lot fewer page faults than its predecessor, and exploits better concurrent access to file servers.

We are still testing the implementation, and hopefully will make it public soon.
In the future I will write another post showing some evaluation for the system and traces we obtained from debug output that are quite illustrative.

Monday, May 2, 2011

What's faster? A vector or a list?

Actual program performance is non-intuitive. Stroustrup, the author
of C++, included measurements in a talk for ordered insertion in C++
vectors and lists. We (E.Soriano and me) measured the same issue using C in Plan 9, Mac
OS X, and Linux, and C++ in Mac OS X.

Well, it seems that it's not clear at all which data structure is better for
something as simple as sorted insertion. Depending on the implementation of
the containers, on the system, and the language used, one or another may be
the right choice. There are so many factors that I'd say it cannot be predicted.

Thus, we might say that it doesn't really matter. Pick your preferred one and do not optimize early. All in all, it might be the better one, or the worse one; nobody really knows.

See Some Performance Experiments for Simple Data Structures for the experiments.