Wednesday, January 18, 2012

Manycore and scheduling

Time for some performance evaluation. After some scaffolding, we are measuring how long it takes to do several things (1: compiling a kernel with source from a remote file server, 2: copying files from a local ramfs to a local ramfs, 3: compiling a kernel with the source in a local rmafs).

The aim is to compare how long it takes to do this with different number of cores and different schedulers.

These are the results for nix. Times are user, system, and total. The value for the system time is not reliable, so pay attention mostly to the total and perhaps to the user time.
  • Single scheduler for 32 TCs:
    • 1/output:times 7.03556 50.7456 14.0344
    • 2/output:times 0.233 1.626 1.921
    • 3/output:times 7.663 63.668 10.458
  • Single scheduler for just 4 TCs:
    • 4/output:times 3.989 5.585 10.789
    • 5/output:times 0.156 0.404 0.608
    • 6/output:times 4.062 5.429 5.073
  • 8 scheduling groups, stealing when idle, for 32 TCs:
    • 7/output:times 5.849 22.625 13.859
    • 8/output:times 0.193 0.94 1.132
    • 9/output:times 6.357 28.8 9.404
We are making more experiments, but the interesting thing is that it takes longer to do the work with 32 cores than it takes to do it with 4 cores.

Changing the scheduler so that there's one per each 4 cores helps, but the time is still far from the surprisingly best case, which was using 4 cores.


If we run 8 schedulers like in the last test, but we do not permit them to steal/donate jobs,
which means that there is a single group in use with just 4 cores, the numbers get back to
the case of a single scheduler with just 4 TCs:
  • 8 scheduling groups, but only 1 used (4 out of 32TCs):
    • 13/output:times 4.147 5.727 9.64
    • 14/output:times 0.174 0.374 0.596
    • 15/output:times 4.049 4.816 4.804