Sunday, May 29, 2011

User level semaphores

Multicore computers as found today can share memory. Semaphores are
popular for building other synchronization and communication primitives.
But as found on most operating systems, they require entering the kernel
even if it is not necessary. In many cases, two process will be running each one
in its own core and one would put a ticket in a semaphore before the other would
try to acquire it. Why should they enter the kernel in that case?
Linux includes futexes, with the same rationale, but they are not reliable and, therefore,
cannot be used for real applications.

In the operating system we are building, NIX, we have included user-level semaphores
using this interface:

extern int altsems(int *ss[], int n);
extern int downsem(int *s, int dontblock);
extern void upsem(int *s);

Up and down are similar to other interfaces. They rely on atomic increment and
decrement operations to update the semaphore (an integer value) without races.
However, if a process finds during a down that it should block, it would enter
the kernel and sleep. If another process increments the semaphore and finds that
someone was awaken (because it was negative), it enters the kernel to awake
the sleeper. With some book-keeping, operations may proceed in many cases without paying
the cost of entering and exiting the kernel.

Altsems is a novel operation. It tries to perform simultaneous downs on an array of
semaphores, using the same approach described above. Only one of them will proceed,
but the process does not know which one until it returns. This is useful to implement
user-level channels, equipped with an alt operation to send or receive from multiple

Monday, May 2, 2011

What's faster? A vector or a list?

Actual program performance is non-intuitive. Stroustrup, the author
of C++, included measurements in a talk for ordered insertion in C++
vectors and lists. We (E.Soriano and me) measured the same issue using C in Plan 9, Mac
OS X, and Linux, and C++ in Mac OS X.

Well, it seems that it's not clear at all which data structure is better for
something as simple as sorted insertion. Depending on the implementation of
the containers, on the system, and the language used, one or another may be
the right choice. There are so many factors that I'd say it cannot be predicted.

Thus, we might say that it doesn't really matter. Pick your preferred one and do not optimize early. All in all, it might be the better one, or the worse one; nobody really knows.

See Some Performance Experiments for Simple Data Structures for the experiments.

Systems for multicore computers

With so many cores in modern machines, it is time to rethink how the system should handle them.
There are interesting models, like the multikernel, published in the literature. However, it seems that a simpler way must exist.

For example, we can make some cores execute just user code, with no interrupts, and no round robin. We can make other cores execute system code.