Power efficiency

July 31st, 2013 by

We’ve just done a performance comparison of one of our little dedicated servers versus a dual core VM hosted on VMware through another provider.

The VMware machine has two cores of an Intel Xeon E5530 (Westmere) at 2.4Ghz, we have four hyperthreaded cores of an i7-3615QM (Ivy Bridge) at 2.3Ghz.

Both machines are running the same operating system install, same application code so we ran siege for 30s at a time with different concurrency levels as a benchmark to find out if our machine was faster and by how much.

The initial comparison was the dual core VMware service (green), versus our VM (red). At very low concurrency (1-2 simultaneous requests) our machine is slightly slower to render each page. Beyond this the existing machine has exactly the predicted load curve in that it slows linearly with additional simultaneous users – the new machine appears to slow only very slightly with minimal performance degradation.

By default we’re running the ondemand cpu scheduler which means when idle the cores are clocked at 1.2Ghz. The page render time remains almost constant to four cores as the host spreads the load around the four 1.2ghz cores keeping the render time constant. Beyond this the scheduler starts to turn up the core speed as the load rises, so at 8 cores we’re still rendering pages in the same average time because each core is now clocked at 2.3Ghz almost twice as fast – we’ve doubled the amount of CPU available. Only then does the performance begin to drop off and then sub-linearly.

On the existing host the performance is much more predictable – it takes a constant amount of CPU to render each page request and as you double the concurrency the render time doubles.

If you turn off the power-saving on the i7 and set it to performance mode it gives the expected linear performance decrease with increasing load. Interestingly it’s slightly slower at maximum CPU utilisation, I think (but haven’t confirmed) this is because it can’t use the turbo boost feature to increase the clock-speed as much as the power-saving option because it’s always running at a warmer temperature as it doesn’t cool down as much between each benchmark run.

We’re going to leave the machine in ondemand mode, whilst it’s slightly slower in normal use, it uses less electricity so it cheaper to run and less harmful to the polar bears, it also has significantly better performance for short peaks – it has a stockpile of cold that it can borrow against for short periods of time.

I wonder if they should start teaching thermodynamics in computer science courses.