<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Toolserver Journal &#187; Tech</title>
	<atom:link href="http://journal.toolserver.org/entry/category/tech/feed/" rel="self" type="application/rss+xml" />
	<link>http://journal.toolserver.org</link>
	<description>Updates from the Toolserver administration</description>
	<lastBuildDate>Fri, 05 Feb 2010 16:23:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Where is cassini?</title>
		<link>http://journal.toolserver.org/entry/2009/06/14/where-is-cassini/</link>
		<comments>http://journal.toolserver.org/entry/2009/06/14/where-is-cassini/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 14:36:00 +0000</pubDate>
		<dc:creator>dab</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[osm]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=43</guid>
		<description><![CDATA[After the hardware-installation were finished yesterday River began to set up the servers that will run solaris, while I began to set up cassini that will run debian.
After I was trough the tftp-network-installation-hell again (took me only 2h and a dozen reboots to let the installation starting&#8230;), we discovered that the host doesn&#8217;t have a [...]]]></description>
			<content:encoded><![CDATA[<p>After the <a href="http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/">hardware-installation</a> were finished yesterday River began to set up the servers that will run solaris, while I began to set up <em>cassini</em> that will run debian.</p>
<p>After I was trough the tftp-network-installation-hell again (took me only 2h and a dozen reboots to let the installation starting&#8230;), we discovered that the host doesn&#8217;t have a hardware-raid. A fast investigation result in that the osm-squid <em>ortelius </em>(which is nearly identical to <em>cassini</em>) and <em>cassini</em> were swap at the hardware installation. Because <em>cassini</em> should run a little copy of the osm-database itself, using it without hardware-raid is no option.</p>
<p>We are unsure what is the best way to handle the situation and how long it will take until <em>cassini</em> can used at the moment.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/06/14/where-is-cassini/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New servers are there</title>
		<link>http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/</link>
		<comments>http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 14:14:53 +0000</pubDate>
		<dc:creator>dab</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[Add new tag]]></category>
		<category><![CDATA[osm]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=35</guid>
		<description><![CDATA[Yesterday Mark and Multichil were at our colo again and finished the hardware-installation of our new servers.
The toolserver-cluster got two new boxes: daphe, that is going to replace zedler as databasehost of the s2-cluster, and hyacinth that is going to work as host for our /home-directory.
Also they installed new servers for the openstreetmap/wikimedia-cooperation: A openstreetmap-toolserver [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday Mark and Multichil were at our colo again and finished the hardware-installation of our new servers.</p>
<p>The toolserver-cluster got two new boxes: <em>daphe</em>, that is going to replace <em>zedler</em> as databasehost of the s2-cluster, and <em>hyacinth</em> that is going to work as host for our <em>/home</em>-directory.</p>
<p>Also they installed new servers for the openstreetmap/wikimedia-cooperation: A openstreetmap-toolserver on that people can play with osm-data (like our toolserver for wikimedia) named <em>cassini</em>, and a squid-proxy named <em>ortelius</em> and a database-host named <em>ptolemy</em> &#8211; both to handle the load the wikimedia-projects will create when they will use openstreetmap-maps.</p>
<p>At last two terminal-servers were installed to make it easier for the roots to handle broken servers.</p>
<p>Thanks to Mark and Multichil!</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Faster, Better, Stronger</title>
		<link>http://journal.toolserver.org/entry/2009/04/29/faster-better-stronger/</link>
		<comments>http://journal.toolserver.org/entry/2009/04/29/faster-better-stronger/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 10:53:41 +0000</pubDate>
		<dc:creator>daniel</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[osm]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=26</guid>
		<description><![CDATA[Yesterday, Wikimedia Deutschland has ordered five new Servers, three of which will be added to the toolserver cluster. They will be delivered (hopefully) in two to three weeks, and will go online perhaps a week or two after that. Here&#8217;s what the servers will be used for:
The first server will replace Zedler as our database [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, Wikimedia Deutschland has ordered five new Servers, three of which will be added to the toolserver cluster. They will be delivered (hopefully) in two to three weeks, and will go online perhaps a week or two after that. Here&#8217;s what the servers will be used for:</p>
<p>The first server will replace Zedler as our database server for the s2 cluster. Zedler is our oldest database server, and has lately been overloaded nearly constantly. We may keep Zedler around with a backup copy of s2, but it will mostlky be idle. We&#8217;ll see what use we can put it to later on.</p>
<p>The second server will take Hemlock&#8217;s duty of serving the home directories, and it will become the host system of the stable server. This means the stable server becomes virtualized (probably as a Solaris zone). Willow (the current server for stable projects) will then be free, we will probably make it into a second login server where users can run bots. This should take some load off Nightshade.</p>
<p>The thirs server is the &#8220;OpenStreetMap Toolserver&#8221;: it will be for the <a href="http://en.wikipedia.org/wiki/OpenStreetMap">OpenStreeMap</a> project what the Toolserver cluster has so far been for Wikimedia projects: a place to play with data and host bots and web applications. The <a href="https://wiki.toolserver.org/view/Rules">Toolserver rules</a> have been changed to accomodate this.</p>
<p>All in all, we will then have 12 servers in the Toolserver cluster.</p>
<p>The two remaining servers that have been ordered yesterday will also be used for the OpenStreetMap project, but not in the context of the Toolserver. They will be used to integrate interactive maps from OpenStreetMap directly into wikipedia Articles. More information about the <a href="http://meta.wikimedia.org/wiki/OpenStreetMap">OSM integration project</a> is avialable on meta.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/04/29/faster-better-stronger/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wherein a new server is delivered to the wrong place, but then remains there</title>
		<link>http://journal.toolserver.org/entry/2009/01/23/wherein-a-new-server-is-delivered-to-the-wrong-place-but-then-remains-there/</link>
		<comments>http://journal.toolserver.org/entry/2009/01/23/wherein-a-new-server-is-delivered-to-the-wrong-place-but-then-remains-there/#comments</comments>
		<pubDate>Fri, 23 Jan 2009 19:51:45 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=10</guid>
		<description><![CDATA[So, a while ago, Sun donated some servers to Wikimedia.  We managed to earmark one of them for the Toolserver, but for various reasons, it couldn&#8217;t be delivered to Amsterdam, so it ended up at our Tampa facility.  It&#8217;s been sitting there ever since, until yesterday, when we managed to rack it and [...]]]></description>
			<content:encoded><![CDATA[<p>So, a while ago, Sun donated some servers to Wikimedia.  We managed to earmark one of them for the Toolserver, but for various reasons, it couldn&#8217;t be delivered to Amsterdam, so it ended up at our Tampa facility.  It&#8217;s been sitting there ever since, until yesterday, when we managed to rack it and set up its OS and array.  The server is a Sun Fire X4150, with 2 quad-core 2.8GHz CPUs, 32GB RAM, 4 146GB system disks, and an external array with 12 15&#8242;000 rpm 146GB SAS disks.  Originally, it was meant to be a database server, but there&#8217;s not much point putting a single database server at Tampa, so now we&#8217;re looking for other uses.</p>
<p>The most obvious is to provide a US replica for <tt>cache.stable.toolserver.org</tt>, the caching proxy in front of the stable server; this will improve WikiMiniAtlas performance for North American users (the only tool currently using the cache).  But that&#8217;s only a tiny load, so it would be a waste to use the entire server for just that.  Other possibilities include moving our webapps (e.g. JIRA) there, which would free up some resources from the server hosting them in Amsterdam. No doubt, there are several other things we could use it for&#8230; please feel free to offer suggestions.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/01/23/wherein-a-new-server-is-delivered-to-the-wrong-place-but-then-remains-there/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Wherein MySQL replication, binlogs and the problems with s3 are explained</title>
		<link>http://journal.toolserver.org/entry/2009/01/11/wherein-mysql-replication-binlogs-and-the-problems-with-s3-are-explained/</link>
		<comments>http://journal.toolserver.org/entry/2009/01/11/wherein-mysql-replication-binlogs-and-the-problems-with-s3-are-explained/#comments</comments>
		<pubDate>Sun, 11 Jan 2009 05:18:57 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=9</guid>
		<description><![CDATA[Many people have noticed that since the maintenance a couple of weeks ago, replication of s3 (the cluster that holds frwiki, and most of the smaller wikis) is halted, which means Toolserver tools are returning outdated information.  So, what is replication, and why isn&#8217;t it working?
One of the main features of the Toolserver is [...]]]></description>
			<content:encoded><![CDATA[<p>Many people have noticed that since the maintenance a couple of weeks ago, replication of s3 (the cluster that holds frwiki, and most of the smaller wikis) is halted, which means Toolserver tools are returning outdated information.  So, what is replication, and why isn&#8217;t it working?</p>
<p>One of the main features of the Toolserver is that it holds a copy of the Wikimedia <a href="http://www.mysql.com">MySQL</a> databases, and allows users (and tools) to run <a href="http://en.wikipedia.org/wiki/SQL">SQL</a> queries on them.  The databases are updated in real time, using a MySQL feature called <em>replication</em>.  Replication is a way of applying changes to one database &#8212; in this case, Wikimedia&#8217;s master database &#8212; to a <em>slave</em> server, the Toolserver.  It works by recording each change on the master to a file, called a <em>binlog</em> (short for &#8220;binary log&#8221;).  The slave server retrieves this file, and applies each change to its own copy of the database, thus bringing it up to date with the master.</p>
<p>If, for some reason, the slave server isn&#8217;t replicating, meaning no changes are being applied to it, the slave&#8217;s copy of the database will gradually become more and more out of date compared to the master.  This difference, usually expressed as a time (e.g. seconds, hours or days) is called the replication lag, or <em>replag</em>.  The most common cause of replag is the slave not being able to apply the changes fast enough.  For example, in an hour, it might only be able to apply 40 minutes&#8217; worth of changes.  This would cause 20 minutes of replag.  As the slave catches up (by applying changes faster than real time), the replag decreases until it reaches 0 again.</p>
<p>Another cause of replag is the slave not replicating at all.  This is what happened during the recent maintenance; as the servers were offline, no replication was happening.  When the platform came back up, there was about 6 days&#8217; worth of replag.  For the s1 (en.wikipedia.org) and s2 (de.wikipedia.org and a few others), there was no problem.  Although the lag was high, it has caught up now, and things are working normally.  However, for s3, an unexpected problem was encountered: <strong>someone deleted the binlogs!</strong></p>
<p>Now, the problem with binlogs is that they take up disk space on the master server, and Wikimedia&#8217;s s3 master is currently rather short of disk space.  To free some space, an admin deleted binlogs older than a couple of days, not realising that the Toolserver still needed them.  Since the record of changes to the master no longer exists, it&#8217;s impossible for the slave to replicate.  To restart replication, we need to dump (export) a copy of the master database, and import it to the slave, replacing the existing copy of the s3 database.  This copy will be new enough that binlogs exist from the time of the dump until now, allowing replication to restart.  Unfortunately, this process takes some time to do, and the database is unavailable while it&#8217;s happening.  Shortly (hopefully, in less than a month), we will be adding a new database server, and when that happens, a dump/import is required anyway.  To avoid having to do this twice, we decided to wait until the new server is ready before re-importing s3.</p>
<p>From a users&#8217; point of view, the most obvious effect of this will be that tools which query wikis on the s3 database will return outdated information until the problem is fixed.  We realise this is rather inconvenient; sorry.</p>
<p>This has happened a couple time before, but hopefully won&#8217;t happen again; after this, I created a way for Wikimedia admins to easily see what binlogs the Toolserver needs, so they can avoid deleting them.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/01/11/wherein-mysql-replication-binlogs-and-the-problems-with-s3-are-explained/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On PHP, fork() and FastCGI</title>
		<link>http://journal.toolserver.org/entry/2008/08/15/on-php-fork-and-fastcgi/</link>
		<comments>http://journal.toolserver.org/entry/2008/08/15/on-php-fork-and-fastcgi/#comments</comments>
		<pubDate>Fri, 15 Aug 2008 23:39:13 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=8</guid>
		<description><![CDATA[So, a large number of tools we host are written in PHP.  For privacy and security reasons, these scripts need to run as the user who owns them, instead of the web server user, which means we can&#8217;t use mod_php, the most common method of using PHP with Apache.
Until now we used something called [...]]]></description>
			<content:encoded><![CDATA[<p>So, a large number of tools we host are written in PHP.  For privacy and security reasons, these scripts need to run as the user who owns them, instead of the web server user, which means we can&#8217;t use <tt>mod_php</tt>, the most common method of using PHP with Apache.</p>
<p>Until now we used something called <tt>mod_suphp</tt>.  This is an Apache module which receives PHP requests, and handles them using <tt>suexec</tt> and the CGI PHP binary, <tt>php-cgi</tt>.  While this works fine, it creates a large overhead on every request: one <tt>fork()</tt> and <tt>exec()</tt> to invoke <tt>suexec</tt>, and another <tt>exec</tt> for <tt>suexec</tt> to run <tt>php-cgi</tt>.</p>
<p>Fortunately, PHP supports a CGI-replacement called <a href="http://www.fastcgi.com/">FastCGI</a>.  FastCGI runs requests using a persistent daemon process; the web server connects to the process for each request.  Since there&#8217;s no forking, this is much faster than traditional CGI.</p>
<p>Even though PHP supports FastCGI, neither Apache nor PHP support per-user FastCGI processes &#8212; because we want each request to run as the user who owns the script, a seperate PHP daemon is needed for each user.  While we could run a seperate PHP for every user, it would use a lot of resources for no reason (we have 300+ users, not all of whom use PHP).  The web server configuration to send each request to the right PHP would also be error-prone and difficult to maintain.</p>
<p>Instead, I wrote <a href="http://www.flyingparchment.org.uk/pages/switchboard"><tt>switchboard</tt></a>.  <tt>switchboard</tt> is a daemon which receives FastCGI requests from the web server, and dynamically creates PHP processes on-demand, running as the appropriate user.  After a request is finished, the <tt>php-cgi</tt> process is kept around to serve the next request for that user.  As a side effect, it supports per-user limits on number of processes, which helps to reduce the impact of runaway scripts.</p>
<p>I haven&#8217;t done much performance testing, but initial results suggest it significantly reduces load.  Right now (due to bugs in earlier versions) it&#8217;s not enabled by default for PHP scripts, which makes it harder to see the difference, but once it&#8217;s been stable for a while, I&#8217;ll probably make it the default handler for PHP.</p>
<p>For more information, and downloads: <a href="http://www.flyingparchment.org.uk/pages/switchboard">http://www.flyingparchment.org.uk/pages/switchboard</a></p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2008/08/15/on-php-fork-and-fastcgi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New servers!</title>
		<link>http://journal.toolserver.org/entry/2008/06/21/new-servers/</link>
		<comments>http://journal.toolserver.org/entry/2008/06/21/new-servers/#comments</comments>
		<pubDate>Sat, 21 Jun 2008 22:44:48 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=7</guid>
		<description><![CDATA[A month or two ago, we bought two new servers for the toolserver.  One of these (nightshade) was to be the new login server (replacing hemlock, which would remain as the web server), and the other (willow) the new stable server (replacing the current stable, which is a virtual server on vandale).  These [...]]]></description>
			<content:encoded><![CDATA[<p>A month or two ago, we bought two new servers for the toolserver.  One of these (nightshade) was to be the new login server (replacing hemlock, which would remain as the web server), and the other (willow) the new stable server (replacing the current stable, which is a virtual server on vandale).  These are Dell PowerEdge 1950 IIIs, each with 2 quad-core 2GHz Xeons (E5405), 8GB RAM and 2 146GB SAS disks.  Compared to the current login server, a dual Opteron, this is quite an improvement.</p>
<p>Unfortunately, when they arrived, we realised we&#8217;d forgotten to buy DRAC cards, so we couldn&#8217;t use them.  (Dell Remote Access Controller is a management card that allows remote power control and serial console access to servers.)  Last week, we finally got two DRAC cards installed, so I began setting up the new servers.  Mostly, this was pretty simple.  A few minor problems:</p>
<ul>
<li>The Debian Linux installer doesn&#8217;t support the Broadcom NetExtreme II network card, because it requires a so-called &#8220;non-free&#8221; firmware to be loaded.  I fixed this by creating a custom initrd containing the firmware file, after which installation was relatively simple.</li>
<li>Solaris does support the NetExtreme II, but it doesn&#8217;t support the PERC6/i RAID controller.  Again, the fix for this was building a custom install miniroot containing the LSI driver for this card (<tt>mega_sas</tt>).</li>
<li>Serial console redirection on these servers defaults to &#8220;continue after POST&#8221;, which has an unfortunate interaction when GRUB is configured to use the serial console; the server appears to hang after POST.  This is easy to change, but annoying if you don&#8217;t notice it.</li>
</ul>
<p>Since we now have two user servers, the next issue was sharing user account information between them.  The simplest way to do this is just sycning the account data (<tt>/etc/passwd</tt>, <tt>/etc/shadow</tt> and <tt>/etc/group</tt>) between the two servers.  There are a few problems with this, though: you have to wait until the next sync for updates to take effect on both servers; and any differences between the two on each server (for example, Debian likes to add new users during package installations) will be lost.</p>
<p>So, the two common options for shared account information are <a href="http://en.wikipedia.org/wiki/NIS">NIS</a> and <a href="http://en.wikipedia.org/wiki/LDAP">LDAP</a>.  NIS is an older system that provides basic key/value maps (e.g. looking up the username in the &#8216;passwd&#8217; map, and getting the corresponding entry from /etc/passwd).  On the other hand, LDAP provides a database of objects, each of which can have any number of user-defined attributes.  The advantage of LDAP is that each user object can have additional information beyond POSIX account information (for example, email accounts).  I decided to use LDAP for the toolserver, mainly for this reason.</p>
<p>The LDAP server I used was Sun <a href="http://www.sun.com/software/products/directory_srvr_ee/index.jsp">Directory Server Enterprise Edition</a>.  Despite the name, this is a fairly lightweight LDAP server, with good documentation.  In the past I&#8217;ve used <a href="http://www.openldap.org/">OpenLDAP</a>, but I found it quite fiddly to configure, and not so well documented.  The Linux version of DSEE is only supported on RedHat and SUSE systems, but it installed and ran fine on Debian.</p>
<p>Importing the existing account data to LDAP using the PADL <a href="http://www.padl.com/OSS/MigrationTools.html">LDAP MigrationTools</a> was quite straightforward.  For some reason, Debian has two different LDAP NSS modules; <tt>libnss-ldap</tt> and <tt>libnss-ldapd</tt>.  I chose the latter, which uses a daemon (<tt>nslcd</tt>) to proxy requests between user applications and the LDAP server.  After configuring NSS and PAM to use LDAP, everything seems to be working.</p>
<p>The new stable server was simpler to set up, as it&#8217;s replacing the existing server rather than augmenting it.  Installing Solaris and the web server was straightforward; all that was left was copying a few configuration files from the old stable server.</p>
<p>One thing I did differently was using <a href="http://pkgbuild.sourceforge.net/"><tt>pkgbuild</tt></a> for installing software packages, rather than compiling by hand.  <tt>pkgbuild</tt> builds Solaris packages from RPM spec files; as specs contain all the information required to build a package in a single file, it&#8217;s easy to create repositories of spec files which can be built using <tt>pkgbuild</tt>.  This is much easier to manage than the old stable server, where things were simply built and installed without using packages.  And, as there&#8217;s already a large existing repository of such specs (<a href="http://pkgbuild.sourceforge.net/spec-files-extra/">spec-files-extra</a>) to work from, it&#8217;s easy to add new software.</p>
<p>These new servers should provide enough capacity for the toolserver to last a long time, as well as improving both interactive performance and web serving.  (But look out for a future post about <tt>switchboard</tt>, a fast replacement for suphp using FastCGI &#8212; if I ever get around to finishing it.)</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2008/06/21/new-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On tools and memory use</title>
		<link>http://journal.toolserver.org/entry/2008/06/02/on-tools-and-memory-use/</link>
		<comments>http://journal.toolserver.org/entry/2008/06/02/on-tools-and-memory-use/#comments</comments>
		<pubDate>Mon, 02 Jun 2008 03:32:52 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=6</guid>
		<description><![CDATA[A lot of Toolserver users run a MediaWiki bot framework called pywikipediabot to perform maintenance tasks on Wikipedia, such as fixing spelling errors or double redirects, and adding interwiki links to articles.
Unfortunately, some of these bots are rather excessive in their memory use.  In the past, it was usual to see a few interwiki [...]]]></description>
			<content:encoded><![CDATA[<p>A lot of Toolserver users run a MediaWiki bot framework called <a href="http://pywikipediabot.sourceforge.net/">pywikipediabot</a> to perform maintenance tasks on Wikipedia, such as fixing spelling errors or <a href="http://en.wikipedia.org/wiki/Wikipedia:Double_redirects">double redirects</a>, and adding <a href="http://meta.wikimedia.org/wiki/Help:Interwiki_linking">interwiki links</a> to articles.</p>
<p>Unfortunately, some of these bots are rather excessive in their memory use.  In the past, it was usual to see a few interwiki bots using 500MB+ each.  Recently, support for caching to disk (instead of using RAM) <a href="http://lists.wikimedia.org/pipermail/toolserver-l/2008-May/001373.html">was added</a> to pywikipediabot.  Hopefully, this will help reduce the memory use of these bots.  Next on the list: a Java program called &#8220;Linky&#8221;, which is apparently some kind of IRC bot.  There are 9 copies of Linky running on the toolserver, using 100MB each—that&#8217;s 900MB, or nearly ⅛th of the system&#8217;s total memory—dedicated to just that.</p>
<p>The problem with running a shared server is that users often don&#8217;t realise how many resources they&#8217;re using. If you&#8217;re running a bot at home, using 100MB RAM is probably fine.  But when you have to share available resources with other people, you need to start looking more closely at your resource use, and perhaps spend some time reducing it.  (If you think about it for a moment, 100MB just for an IRC bot is a little excessive, isn&#8217;t it?)</p>
<p>The main technical restriction for memory use on the toolserver is a homemade daemon called <a href="http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/slayerd/">slayerd</a>, which starts killing users&#8217; processses when they use more than the limit (currently set at 1GB).  This is mainly intended to stop runaway processes than enforce policy—usually an administrator will start killing processes long before 1GB, if they&#8217;re noticed.  The only way to really fix the problem is to educate users about the problem.  Of course, the problem then is treading the line between gentle reminders, and whining ;-)</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2008/06/02/on-tools-and-memory-use/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Patch day at the Toolserver</title>
		<link>http://journal.toolserver.org/entry/2008/06/01/patch-day-at-the-toolserver/</link>
		<comments>http://journal.toolserver.org/entry/2008/06/01/patch-day-at-the-toolserver/#comments</comments>
		<pubDate>Sun, 01 Jun 2008 03:46:33 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=5</guid>
		<description><![CDATA[Sunday morning (5-7AM UTC) is the schedule maintenance window for the Toolserver.  During this time, we apply patches (software updates), and do any other work that might result in downtime.  Now, patching a system is an inherently dangerous process: you&#8217;re changing the software running on the system, and there&#8217;s always a chance of [...]]]></description>
			<content:encoded><![CDATA[<p>Sunday morning (5-7AM UTC) is the schedule maintenance window for the Toolserver.  During this time, we apply patches (software updates), and do any other work that might result in downtime.  Now, patching a system is an inherently dangerous process: you&#8217;re changing the software running on the system, and there&#8217;s always a chance of new bugs being introduced.  Of course, patches go through QA testing before being released, but sometimes problems still slip through.  And, since many patches are large, and have to be installed in single user mode (where no services are running), even a harmless patch means downtime during installation.</p>
<h3>Live Upgrade</h3>
<p>At the Toolserver, we mitigate patching problems with a process known as <em>Live Upgrade</em>.  Live Upgrade works by creating a copy of the running system (called a <em>boot environment</em>), patching the copy, then rebooting into the new system.  There is no risk to the running system while patches are being installed &#8211; because they are applied to the alternate boot environment &#8211; which means the system can continue to run while patches are being installed.  And, because the original boot environment is untouched, if any patches do cause problems, we can simply reboot into the old boot environment and diagnose the problem from there.  Live Upgrade can even be used to upgrade Solaris itself to a new version &#8211; meaning downtime for an operating system upgrade is reduced to a single reboot.  If the upgrade fails, simply reboot into the old boot environment.</p>
<p>There are some prequisites to use Live Upgrade.  Most importantly, there needs to be free disk space to create the new boot environment on.  There are two common ways to do this.  #1, Split the root mirror, and #2, allocate unused disk space for the new environment.</p>
<h3>#1: Split the root mirror</h3>
<p>Most servers have mirrored root disks, so failure of one disk doesn&#8217;t cause downtime.  If the root disk has been mirrored using Solaris Volume Manager (the software RAID functionality in Solaris), Live Upgrade has built-in support for creating a copy of the existing environment by detaching one side of the mirror.  This turns a single device (a mirror of two disks) into two new devices: the individual disks.  Because both sides of the mirror are already identical, this method also means there&#8217;s no need for a lengthy copy of the current root filesystem.</p>
<p>The downside of this method is that splitting the mirror means it&#8217;s no longer redundant.  If the disk containing the new boot environment fails during the Live Upgrade process, it can&#8217;t be restarted until a replacement disk has been installed.  Even worse, if the disk containing the <em>current</em> boot environment fails, the entire system will probably need to be reinstalled.  Failure of a disk during a 30-minute window is perhaps unlikely, but the possibility needs to be considered.</p>
<p>On x86 systems, there&#8217;s another disadvantage to this method: since the new boot environment is on a different disk, the boot loader (e.g. GRUB) has to be relocated to the new disk, <em>and</em> the BIOS must be told to boot from this new disk.  This is easy to forget, which causes surprising problems.  (On SPARC systems, Live Upgrade can automatically change the default boot device in the system&#8217;s firmware.)</p>
<h3>#2: Allocate unused space for Live Upgrade</h3>
<p>In this method, the root mirror (if there is one) remains intact.  Instead of detaching a device to create space, the system is partitioned at installed time to contain an additional unused slice which is the same size as the root slice.  When Live Upgrade is invoked, it copies the current root filesystem into the unused partition.  When it&#8217;s complete, the unused partition is the new root filesystem, and the existing partition becomes unused.</p>
<p>The most obvious downside of this method is that it needs more space.  However, in a typical server configuration, two disks will be dedicated to the root filesystem.  Disks these days are large enough (typically at least 72GB) that there&#8217;s easily enough space to spare.  (We use 20GB root filesystems, which are around 50% used.)  The upside, compared to method #1, is that no redundancy is lost during the Live Upgrade operation.</p>
<h3>Using Live Upgrade</h3>
<p>We use method #2 at the toolserver, for the reasons I mentioned above.  So, how do we actually use Live Upgrade to apply patches?</p>
<p>The first step is to create the new boot environment.  I&#8217;ll demonstrate this on <em>yarrow</em>, one of our database servers.  Yarrow&#8217;s current root filesystem is <tt>/dev/md/dsk/d1</tt>.  We&#8217;ll be creating the new boot environment on <tt>/dev/md/dsk/d7</tt>.  This is pretty simple:</p>
<pre>root@yarrow:~#lucreate -n be1 -m /:/dev/md/dsk/d7:ufs -x /aux0</pre>
<p>This tells Live Upgrade to create a new boot environment, called <em>be1</em>, whose root (<tt>/</tt>) filesystem will be <tt>/dev/md/dsk/d7</tt>.  After some time (around an hour), <tt>lucreate</tt> finishes, and the new environment is ready.  Next, we need to apply the patches we want to it.</p>
<p>We use a tool called <tt><a href="http://www.par.univie.ac.at/solaris/pca/">pca</a></tt> to download the patches we need.  Since we want <tt>pca</tt> to operate on the new boot environment, we mount the environment&#8217;s filesystems, and use the <tt>-R</tt> switch to <tt>pca</tt> to make it operate on that filesystem.</p>
<pre>root@yarrow:~#lumount be1
/.alt.be1
root@yarrow:~#pca -R /.alt.be1 -d
Downloading xref file to /var/tmp/patchdiag.xref
Trying http://sunsolve.sun.com/patchdiag.xref (1/1)
[...]</pre>
<p>(We could also use <tt>pca -l</tt> first, which would list the patches it&#8217;s going to download.)</p>
<p>After pca is finished (it produces more output while it&#8217;s running, which I haven&#8217;t shown here), the patches we need to install on <tt>be1</tt> are now in <tt>/root/patches</tt>, because that&#8217;s where we configured <tt>pca</tt> to download them to.  Next, we extract the patches (which come in .zip files):</p>
<pre>root@yarrow:~root/patches#ls
119060-41.zip   124864-04.zip   126510-05.zip   127144-03.zip   127965-05.zip   137018-02.zip   138049-01.zip
119961-03.zip   125333-03.zip   127002-04.zip   127923-04.zip   128401-01.zip   137131-01.zip   138076-01.zip
root@yarrow:~root/patches#for p in *.zip; do unzip -q $p; done
root@yarrow:~root/patches#rm *.zip</pre>
<p>Now we use <tt>luupgrade</tt> to apply these patches to the new boot environment:</p>
<pre>root@yarrow:~root/patches#luupgrade -t -n be1 -s $PWD *
Validating the contents of the media .
The media contains 14 software patches that can be added.
Mounting the BE .
Adding patches to the BE .
[...]</pre>
<p>Lastly, we need to <tt>luactivate</tt> the new boot environment, so it becomes active at the next reboot:</p>
<pre>root@yarrow:~root/patches#luactivate be1
[...]
Activation of boot environment  successful.</pre>
<p>Now we reboot, and the new system comes up with our patches installed.</p>
<p>You can read more about Live Upgrade in the <a href="http://docs.sun.com/app/docs/doc/820-4041"><em>Solaris Live Upgrade and Upgrade Planning</em></a> manual on <a href="http://docs.sun.com">docs.sun.com</a> (it&#8217;s probably a good idea to read that before you try this at home!).</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2008/06/01/patch-day-at-the-toolserver/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Integrating Apache Roller with Crowd</title>
		<link>http://journal.toolserver.org/entry/2008/05/29/integrating-apache-roller-with-crowd/</link>
		<comments>http://journal.toolserver.org/entry/2008/05/29/integrating-apache-roller-with-crowd/#comments</comments>
		<pubDate>Thu, 29 May 2008 22:55:05 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=4</guid>
		<description><![CDATA[I chose Apache Roller for the Toolserver blog.  Roller is mature software, widely used on large sites (e.g. blogs.sun.com), it&#8217;s written in Java which makes it easy to deploy, and lastly it can be integrated with Atlassian Crowd, the centralised authentication software we use for Toolserver web properties.
Unfortunately, although Roller itself was easy to [...]]]></description>
			<content:encoded><![CDATA[<p>I chose <a href="http://roller.apache.org">Apache Roller</a> for the Toolserver blog.  Roller is mature software, widely used on large sites (e.g. <a href="http://blogs.sun.com/">blogs.sun.com</a>), it&#8217;s written in Java which makes it easy to deploy, and lastly it can be integrated with <a href="http://www.atlassian.com/software/crowd/">Atlassian Crowd</a>, the centralised authentication software we use for Toolserver web properties.</p>
<p>Unfortunately, although Roller itself was easy to set up, making it work properly with Crowd took a bit more effort.  Firstly, there are three ways to integrate Roller with Crowd: LDAP (if your Crowd directory is using LDAP), the <a href="http://confluence.atlassian.com/display/CROWDEXT/Crowd+JAAS+Login+Module">Crowd connector for JAAS</a> (a standardised security framework for Java), and the Crowd connector for <a href="http://www.acegisecurity.org/">Acegi Security</a>, the security framework Roller itself uses.  After a lot of trial and error, the easiest to set up of all of these appears to be Acegi.  So, here is how I did it:</p>
<p>Firstly, make sure you&#8217;re using Crowd 1.4.2 or later.  Earlier versions (at least 1.4, which I tried) have a bug in the Acegi connector that makes it unusable.  You don&#8217;t need to upgrade Crowd itself, just make sure you have the 1.4.2 client libraries.  Copy the Crowd connector to Roller&#8217;s <tt>WEB-INF/lib</tt>:</p>
<pre>$ cp $CROWD_DIR/client/lib/* WEB-INF/lib/
$ cp $CROWD_DIR/client/crowd-integration-client-1.4.2.jar WEB-INF/lib/</pre>
<p>The Crowd client libraries include newer versions of some libraries which Roller also includes.  If you don&#8217;t remove these older versions, Crowd will try to use them, and things won&#8217;t work:</p>
<pre>$ rm WEB-INF/lib/commons-httpclient-2.0.2.jar
$ rm WEB-INF/lib/ehcache-1.1.jar</pre>
<p>Now you can follow the <a href="http://confluence.atlassian.com/display/CROWD/Integrating+Crowd+with+Acegi+Security">Crowd Acegi</a> set up instructions.  When it talks about your Acegi Security configuration, this is Roller&#8217;s <tt>WEB-INF/security.xml</tt> file.</p>
<p>After you&#8217;re done, a little bit more work is still needed.  Firstly, the <tt>applicationContext-CrowdClient.xml</tt> from Crowd didn&#8217;t work for me; it needs a little bit of editing.  Extract the file from the Crowd jar:</p>
<pre>$ jar xvf lib/crowd-integration-client-1.4.2.jar applicationContext-CrowdClient.xml</pre>
<p>The top of the file should look like this:</p>
<pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd"&gt;</pre>
<p>Change that to read:</p>
<pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN"
"http://www.springframework.org/dtd/spring-beans.dtd"&gt;
&lt;beans&gt;</pre>
<p>Then edit <tt>web.xml</tt> so it refers to <tt>/WEB-INF/applicationContext-CrowdClient.xml</tt> instead of the one in <tt>classpath:/</tt>.  My <tt>web.xml</tt> looks like this:</p>
<pre>&lt;context-param&gt;
&lt;param-name&gt;contextConfigLocation&lt;/param-name&gt;
&lt;param-value&gt;
/WEB-INF/applicationContext-CrowdClient.xml
/WEB-INF/security.xml
&lt;/param-value&gt;
&lt;/context-param&gt;</pre>
<p>Finally, you need to map Crowd groups to Roller groups.  Create two new groups (e.g. <strong>roller-editor</strong> and <strong>roller-admin</strong>) in Crowd.  Then edit <tt>security.xml</tt> again, and find the &#8220;AUTHENTICATION&#8221; section.  It should have several lines that look like this:</p>
<pre>/roller-ui/login-redirect**=admin,editor</pre>
<p>On each line, change &#8220;<strong>admin</strong>&#8221; to &#8220;<strong>ROLE_roller-admin</strong>&#8220;, and &#8220;<strong>editor</strong>&#8221; to &#8220;<strong>ROLE_roller-editor</strong>&#8220;.</p>
<p>That&#8217;s it!  Hopefully, your Roller installation should now authenticate from Crowd.  Easy, huh?  (Remember to re-apply these customisations each time you upgrade Roller&#8230;)</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2008/05/29/integrating-apache-roller-with-crowd/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
