Archive for November, 2009

Wherein the Toolserver becomes more reliable

Some months ago, the Wikimedia Foundation approved a $40,000 grant to Wikimedia Deutschland, for the purpose of improving Toolserver reliability. We’ve now implemented the first part of this plan: redundant NFS and LDAP.

When we first proposed the grant, the plan (which you can read more about at the above link) was to purchase 3 database servers, which we would use to provide a redundant backup for the 3 current servers. However, before we made the purchase, we realised that for the same amount of money, we could purchase 2 database servers, 2 smaller servers and a disk array. The Foundation approved this change, and that’s what we ended up buying.

The purpose of the two small servers and array was to provide redundant service for NFS and LDAP. These services are critical to the platform operation; if either is offline, the entire platform is down. Previously, both were hosted on a single server (hyacinth), which meant the entire Toolserver depended on this server being up. As well as hurting reliability, this made it very difficult to do any maintenance on that server.

Now, however, the NFS and LDAP data is stored on the disk array, which is connected to two servers (turnera and damiana) running Solaris Cluster software. If one server breaks, or we need to do maintenance on it, the services are automatically moved to the other server, with no interruption in service. The array itself has two redundant, independent controllers, making failure quite unlikely.

The previous NFS/LDAP server, which is now idle, has exactly the same specification as a database server. We will be using this as the third redundant database (along with the two we purchased with the grant) to provide redundant access to the MySQL databases. More news on that later.

No Comments