Tuesday, January 24, 2012

SMP, AMP, and benchmarks

The new AMP scheduler for NIX can keep up with loads resulting from actual compilation of the system software. Until 16 cores are added, a real speed up is obtained. From there on, contention on other resources result in times not improving, and even slowing down a bit.

With the old SMP scheduler, when more than about 6 cores are used, times get worse fast. The curve is almost exponential, because of a high contention on the scheduler.

That was using a system load from the real world. As an experiment, we measured also what happen to a program similar to mk. This program spawns concurrently as many processes as needed to compile all the source files. But it's not a real build, because no dependencies are taken into account, no code is generated by running scripts, etc.

For this test, which is half-way between a real program and a micro-benchmark, the results differ. Instead of getting much worse, SMP keeps its time when more than 4 cores are added. Also, AMP achieves about a 30% or 40% of speedup.

Thus, you might infer from this benchmark that SMP is ok for building software with 32 cores and that AMP may indeed speed it up. However, for the actual build process, SMP is not ok, but much slower. And AMP is not achieving any speed up when you reach 32 cores, you better run with 16 cores or less.

There are lies, damn lies, and microbenchmarks.