Rebuilding Raspberry Pi

March 9th, 2016 by

After the Raspberry Pi 2 launch in February 2015, we had a review of how to improve and scale the hosting setup for Raspberry Pi. There were two components that caused us pain during the Pi 2 launch: the main site, running WordPress, and the forums, powered by phpBB.

The first question from our review was whether we should be putting effort into scaling a WordPress site. WordPress is estimated to be powering as many as a quarter of all websites, and it’s popular for a reason: it makes site development very easy. WordPress is easily extensible through themes and plugins, it’s supported by a vast array of existing third party plugins, and it provides a good built-in framework for delegating and moderating authoring roles.

Unfortunately, this ease of development is at least in part down to a very simplistic execution model, with each page being dynamically generated, executing code from every installed plugin, and typically resulting in multiple database queries. When the Raspberry Pi site gets busy, it’s usually down to a huge number of visitors hitting just a page which is essentially a static news story. WordPress provides no built-in mechanism for caching such content, so by default, we’re dynamically generating many copies of identical, or near identical pages.

Losing the flexibility and ease of development offered by WordPress just to cope with the handful of days when the site gets very busy would be unfortunate, so we decided to put effort into making the existing site scalable.

Caching

For pretty much every WordPress problem you can imagine, there’s at least one plugin offering to solve it for you. For site performance, there are a number of plugins such as WP Supercache, but as WordPress itself provides no framework for identifying cacheable parts of page, these can only take a very simple and typically over-cautious, page-based approach.

For example, if you’re a logged in user, you might get served a page that is in someway tailored to you, so Supercache bypasses its cache and serves you a dynamic page. Similarly, if Supercache sees a request that looks like a comment being posted, the cache is invalidated, and a dynamic page is served, and cached for future requests.

During the Pi 2 launch, we saw significant problems with load spikes when comments were posted. Clearly, small delays in comments being visible on the site is a minor annoyance compared to thousands of visitors being served an error page, so we set about making our caching more aggressive.

We wrote a small hack called staticify. This fetches the key pages from the blog every 60 seconds and renders them to static HTML. That way we always have a page in our static cache, and because we’re selecting the pages that we cache we can afford to be more brutal with our caching: we know that there’s no user-specific content on these pages, so we serve up the same cached page even if you’re logged in.

More virtualisation

An important goal after the Pi 2 launch review was to split out different parts of the site onto separate virtual servers. For example, having the WordPress blog and the forums software on different VMs made it much easier to experiment with using Hip Hop VM which offered a significant performance boost to the blog, but is incompatible with the forum software.

Although the Raspberry Pi setup runs as a private cloud on a single host machine, having different components split onto separate VMs makes it much easier to balance resources between them, and if necessary, spin up extra capacity quickly using our public cloud.

IPv6

When Raspberry Pi was hit by DDOS we built an IPv6-only backend network for the machines to communicate with each other. In the new setup all access to the back-end VMs comes from either one of four front-end load balancers, or a “gateway” VM. So we thought we’d remove IPv4 connectivity from the VMs entirely. For example this is ifconfig on one of the blog PHP VMs :

eth0      Link encap:Ethernet  HWaddr 52:54:00:3f:8a:5a  
          inet6 addr: 2a00:1098:0:82:1000:x:y:z/64 Scope:Global
          inet6 addr: fe80::5054:ff:fe3f:8a5a/64 Scope:Link

The VM needs to occasionally call out over IPv4. For example, akismet and Twitter don’t yet have full IPv6 support, so these requests go through a NAT64 gateway, provided by Mythic Beasts that proxies the connections so it appears almost seamless to the VM. This is part of the Mythic Beasts IPv6 education project, backward ISPs claim there is no demand for IPv6, whereas we provide multiple services from IPv6-only servers and give discounts if you use IPv6-only services.

SSL

Officially we enabled SSL because we wanted to improve our Google ranking but handy side effects include irritating the security services and preventing third party networks injecting adverts or corrupting downloads. The SSL decryption is done on the front-end load balancers and as they have lots of spare CPU incurs no performance issue. The only thing that isn’t is the image downloads because of incompatibilities with the current version of NOOBs. We hope to eventually resolve this.

Pi Zero Launch

November 26th at 7am, the Pi Zero is launched, a $5 computer given away on magazines. The bandwidth graph for the Raspberry Pi server does this:

Launch day bandwidth graph

Launch day bandwidth graph

It’s very exciting and quickly exceeds our previous records for the launch of Pi 2. The two VMs that generate all the webpages for the blog and deliver all the content are humming along at 10-25% capacity. The database VM is almost completely idle, we’ve successfully cached almost everything,our database server only sees load when a comment is being posted or the cache is being updated. Meanwhile we neatly exceed the 4500 users we had for the Pi2 breaking 10,000 simultaneous users at our peak.


A quick back-of-the-envelope calculation and we conclude that our staticify script avoided executing WordPress a large number of times and the following slightly dubious claim is mostly true:


The MagPi site was a bit more difficult, it hadn’t had the same level of optimisation and went through a number of changes throughout the day to accelerate it. However, the VM setup meant that the excess load was contained to specific virtual machines- under our original flat hosting setup the load from the MagPi would have taken everything offline and made identifying the underlying cause much harder.

Raspbian

We now run a full mirror of the main Raspbian site, and we’ve even done a test to make sure that the failover works.

The mirror director is a critical piece of infrastructure, without it package downloads will fail and updates can’t complete. So in the event of a failure we need to bring the mirror director back up much more quickly than we can restore 4TB+ of data from backup. As a result of this work we now have a hot spare, which has been fully tested.

Does it work and is WordPress still a good idea?

We weathered the PiZero and Christmas Day traffic peaks with ease and we think we can probably double or triple the number of people using the sites at peak times before we have to think much more or add hardware. The result is we’ve a really useful and very busy site, that supports our multiple contributors, moderators and users with a relatively minimal amount of engineering and administration time, on a comparatively small server setup.