<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Toolserver Journal</title>
	<atom:link href="http://journal.toolserver.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://journal.toolserver.org</link>
	<description>Updates from the Toolserver administration</description>
	<lastBuildDate>Tue, 17 Nov 2009 01:12:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Wherein the Toolserver becomes more reliable</title>
		<link>http://journal.toolserver.org/entry/2009/11/17/wherein-the-toolserver-becomes-more-reliable/</link>
		<comments>http://journal.toolserver.org/entry/2009/11/17/wherein-the-toolserver-becomes-more-reliable/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 01:12:29 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=53</guid>
		<description><![CDATA[Some months ago, the Wikimedia Foundation approved a $40,000 grant to Wikimedia Deutschland, for the purpose of improving Toolserver reliability.  We&#8217;ve now implemented the first part of this plan: redundant NFS and LDAP.
When we first proposed the grant, the plan (which you can read more about at the above link) was to purchase 3 [...]]]></description>
			<content:encoded><![CDATA[<p>Some months ago, the Wikimedia Foundation <a href="http://journal.toolserver.org/entry/2009/07/29/wikimedia-foundation-grants-40000-to-the-toolserver/">approved</a> a $40,000 grant to Wikimedia Deutschland, for the purpose of improving Toolserver reliability.  We&#8217;ve now implemented the first part of this plan: redundant NFS and LDAP.</p>
<p>When we first proposed the grant, the plan (which you can read more about at the above link) was to purchase 3 database servers, which we would use to provide a redundant backup for the 3 current servers.  However, before we made the purchase, we realised that for the same amount of money, we could purchase 2 database servers, 2 smaller servers and a disk array.  The Foundation approved this change, and that&#8217;s what we ended up buying.</p>
<p>The purpose of the two small servers and array was to provide redundant service for NFS and LDAP.  These services are critical to the platform operation; if either is offline, the entire platform is down.  Previously, both were hosted on a single server (<tt>hyacinth</tt>), which meant the entire Toolserver depended on this server being up.  As well as hurting reliability, this made it very difficult to do any maintenance on that server.  </p>
<p>Now, however, the NFS and LDAP data is stored on the disk array, which is connected to two servers (<tt>turnera</tt> and <tt>damiana</tt>) running <a href="http://en.wikipedia.org/wiki/Solaris_Cluster">Solaris Cluster</a> software.  If one server breaks, or we need to do maintenance on it, the services are automatically moved to the other server, with no interruption in service.  The array itself has two redundant, independent controllers, making failure quite unlikely.</p>
<p>The previous NFS/LDAP server, which is now idle, has exactly the same specification as a database server.  We will be using this as the third redundant database (along with the two we purchased with the grant) to provide redundant access to the MySQL databases.  More news on that later.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/11/17/wherein-the-toolserver-becomes-more-reliable/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Platform outage on August 24th</title>
		<link>http://journal.toolserver.org/entry/2009/08/25/platform-outage-on-august-24th/</link>
		<comments>http://journal.toolserver.org/entry/2009/08/25/platform-outage-on-august-24th/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 20:09:28 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=50</guid>
		<description><![CDATA[From afternoon on August 24th until August 25th, the Toolserver was offline due to an unscheduled outage.  Everything is now back online, and the technical details of the outage are documented here for anyone who&#8217;s interested.
]]></description>
			<content:encoded><![CDATA[<p>From afternoon on August 24th until August 25th, the Toolserver was offline due to an unscheduled outage.  Everything is now back online, and the technical details of the outage are documented <a href="https://confluence.toolserver.org/display/tech/Platform+outage+2009-08-24">here</a> for anyone who&#8217;s interested.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/08/25/platform-outage-on-august-24th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wikimedia Foundation Grants $40,000 to the Toolserver</title>
		<link>http://journal.toolserver.org/entry/2009/07/29/wikimedia-foundation-grants-40000-to-the-toolserver/</link>
		<comments>http://journal.toolserver.org/entry/2009/07/29/wikimedia-foundation-grants-40000-to-the-toolserver/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 13:49:41 +0000</pubDate>
		<dc:creator>daniel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=46</guid>
		<description><![CDATA[The Wikimedia Foundation has approved a grant of $40,000 towards improving the Toolserver&#8217;s reliability. We requested the grant in April, and we are very happy it worked out. The background of the grant is that the most central feature of the Toolserver, live replication of the nearly 800 wiki databases, is far too shaky. If [...]]]></description>
			<content:encoded><![CDATA[<p>The Wikimedia Foundation has approved a grant of $40,000 towards improving the Toolserver&#8217;s reliability. We <a title="Wikimedia chapters/WMF grants/WM DE/Improve toolserver reliability" href="http://meta.wikimedia.org/wiki/Wikimedia_chapters/WMF_grants/WM_DE/Improve_toolserver_reliability">requested the grant in April</a>, and we are very happy it worked out. The background of the grant is that the most central feature of the Toolserver, live replication of the nearly 800 wiki databases, is far too shaky. If it breaks for a day or so, or we have any kind of corruption, we need to import a full new dump, causing days and weeks of outdated information for Toolserver users (and for the users of Toolserver users&#8217; tools). It also means that during such times, there is no up to date off-site backup of the wiki databases.</p>
<p>To improve this situation, we plan to buy three new database servers, so we can keep <em>two</em> copies of each database, instead of just one. This way, one copy will remain available when the other breaks, and we will be able to fix things without too much interruption. The new servers will very likely be the same as our other newer database servers, namely, Sun Fire X4250s with 32GB RAM and 16 internal disks with 146 GB each. We hope to have these online some time in September or October. This should greatly improve the availability of live replication, and thus of any tools relying on real time information.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/07/29/wikimedia-foundation-grants-40000-to-the-toolserver/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Where is cassini?</title>
		<link>http://journal.toolserver.org/entry/2009/06/14/where-is-cassini/</link>
		<comments>http://journal.toolserver.org/entry/2009/06/14/where-is-cassini/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 14:36:00 +0000</pubDate>
		<dc:creator>dab</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[osm]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=43</guid>
		<description><![CDATA[After the hardware-installation were finished yesterday River began to set up the servers that will run solaris, while I began to set up cassini that will run debian.
After I was trough the tftp-network-installation-hell again (took me only 2h and a dozen reboots to let the installation starting&#8230;), we discovered that the host doesn&#8217;t have a [...]]]></description>
			<content:encoded><![CDATA[<p>After the <a href="http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/">hardware-installation</a> were finished yesterday River began to set up the servers that will run solaris, while I began to set up <em>cassini</em> that will run debian.</p>
<p>After I was trough the tftp-network-installation-hell again (took me only 2h and a dozen reboots to let the installation starting&#8230;), we discovered that the host doesn&#8217;t have a hardware-raid. A fast investigation result in that the osm-squid <em>ortelius </em>(which is nearly identical to <em>cassini</em>) and <em>cassini</em> were swap at the hardware installation. Because <em>cassini</em> should run a little copy of the osm-database itself, using it without hardware-raid is no option.</p>
<p>We are unsure what is the best way to handle the situation and how long it will take until <em>cassini</em> can used at the moment.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/06/14/where-is-cassini/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New servers are there</title>
		<link>http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/</link>
		<comments>http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 14:14:53 +0000</pubDate>
		<dc:creator>dab</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[Add new tag]]></category>
		<category><![CDATA[osm]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=35</guid>
		<description><![CDATA[Yesterday Mark and Multichil were at our colo again and finished the hardware-installation of our new servers.
The toolserver-cluster got two new boxes: daphe, that is going to replace zedler as databasehost of the s2-cluster, and hyacinth that is going to work as host for our /home-directory.
Also they installed new servers for the openstreetmap/wikimedia-cooperation: A openstreetmap-toolserver [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday Mark and Multichil were at our colo again and finished the hardware-installation of our new servers.</p>
<p>The toolserver-cluster got two new boxes: <em>daphe</em>, that is going to replace <em>zedler</em> as databasehost of the s2-cluster, and <em>hyacinth</em> that is going to work as host for our <em>/home</em>-directory.</p>
<p>Also they installed new servers for the openstreetmap/wikimedia-cooperation: A openstreetmap-toolserver on that people can play with osm-data (like our toolserver for wikimedia) named <em>cassini</em>, and a squid-proxy named <em>ortelius</em> and a database-host named <em>ptolemy</em> &#8211; both to handle the load the wikimedia-projects will create when they will use openstreetmap-maps.</p>
<p>At last two terminal-servers were installed to make it easier for the roots to handle broken servers.</p>
<p>Thanks to Mark and Multichil!</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/06/14/new-servers-are-there/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wherein msnbot behaves badly, and is banished</title>
		<link>http://journal.toolserver.org/entry/2009/06/09/wherein-msnbot-behaves-badly-and-is-banished/</link>
		<comments>http://journal.toolserver.org/entry/2009/06/09/wherein-msnbot-behaves-badly-and-is-banished/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 02:27:52 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=32</guid>
		<description><![CDATA[So, I&#8217;ve just blocked msnbot (which is, I assume, the search spider for Microsoft Live Search) from indexing the Toolserver.  Many spiders, such as Google and Yahoo!, index our website every day, and cause no problems; in fact, we don&#8217;t even notice them.  Msnbot is different.  Specifically, it seems to have no [...]]]></description>
			<content:encoded><![CDATA[<p>So, I&#8217;ve just blocked msnbot (which is, I assume, the search spider for Microsoft Live Search) from indexing the Toolserver.  Many spiders, such as Google and Yahoo!, index our website every day, and cause no problems; in fact, we don&#8217;t even notice them.  Msnbot is different.  Specifically, it seems to have no rate limiting.  Microsoft claim it will only request pages around once every 10 seconds; in reality, it was making 5-10 requests <em>per</em> second.  Unfortunately, the page in question was a slow CGI script, and msnbot seemed to have obtained a list of every possible parameter it could pass to the script, which it then did, as fast as possible, until the web server was so overloaded it could hardly serve user requests:</p>
<pre>wolfsbane     up   53+10:55,     1 user,   load 53.93, 55.49, 55.22</pre>
<p>It doesn&#8217;t seem to have noticed that it&#8217;s blocked.  It&#8217;s still hammering away as fast as it can, and getting nothing but 403 in reply.  I&#8217;ve even added it to robots.txt, but it doesn&#8217;t seem to have noticed that either yet.  Fortunately, our web server is quite fast at returning 403, so the load is looking much happier:</p>
<pre>wolfsbane     up   53+11:58,     4 users,  load 0.68, 1.15, 3.44</pre>
<p>After I blocked it, I tried to find a contact at Microsoft to report the problem too&mdash;as the spider clearly isn&#8217;t behaving like they expect, I thought they might appreciate a warning.  Well, I can now report that Live Search <em>really</em> don&#8217;t want to be contacted.  The closest thing I could find to a contact form, linked from the &#8220;troubleshooting problems with msnbot&#8221; page, had a list of categories for me to choose from.  None of them was even slightly related to search.  Some Googling suggested that &#8220;msnbot@microsoft.com&#8221; might work, but nope (&#8221;Returned mail: user unknown&#8221;).  There&#8217;s a feedback link on the MSN front page, but who knows where that would go, and whether the feedback would ever reach someone who could deal with it?  (Certainly not me, as they clearly state that they won&#8217;t reply to your feedback.)</p>
<p>I gave up in the end.  If someone reading this happens to have a contact at Microsoft who would be interested in this issue, please feel free to let them know.  Otherwise, I imagine Live Search users will just have to live without the Toolserver.</p>
<p>PS: I know msnbot (supposedly) supports the Crawl-Delay parameter in robots.txt.  But given what I&#8217;ve seen today, I don&#8217;t particularly want to rely on this, even if it does, some day, reload our robots.txt.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/06/09/wherein-msnbot-behaves-badly-and-is-banished/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Faster, Better, Stronger</title>
		<link>http://journal.toolserver.org/entry/2009/04/29/faster-better-stronger/</link>
		<comments>http://journal.toolserver.org/entry/2009/04/29/faster-better-stronger/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 10:53:41 +0000</pubDate>
		<dc:creator>daniel</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[osm]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://journal.toolserver.org/?p=26</guid>
		<description><![CDATA[Yesterday, Wikimedia Deutschland has ordered five new Servers, three of which will be added to the toolserver cluster. They will be delivered (hopefully) in two to three weeks, and will go online perhaps a week or two after that. Here&#8217;s what the servers will be used for:
The first server will replace Zedler as our database [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, Wikimedia Deutschland has ordered five new Servers, three of which will be added to the toolserver cluster. They will be delivered (hopefully) in two to three weeks, and will go online perhaps a week or two after that. Here&#8217;s what the servers will be used for:</p>
<p>The first server will replace Zedler as our database server for the s2 cluster. Zedler is our oldest database server, and has lately been overloaded nearly constantly. We may keep Zedler around with a backup copy of s2, but it will mostlky be idle. We&#8217;ll see what use we can put it to later on.</p>
<p>The second server will take Hemlock&#8217;s duty of serving the home directories, and it will become the host system of the stable server. This means the stable server becomes virtualized (probably as a Solaris zone). Willow (the current server for stable projects) will then be free, we will probably make it into a second login server where users can run bots. This should take some load off Nightshade.</p>
<p>The thirs server is the &#8220;OpenStreetMap Toolserver&#8221;: it will be for the <a href="http://en.wikipedia.org/wiki/OpenStreetMap">OpenStreeMap</a> project what the Toolserver cluster has so far been for Wikimedia projects: a place to play with data and host bots and web applications. The <a href="https://wiki.toolserver.org/view/Rules">Toolserver rules</a> have been changed to accomodate this.</p>
<p>All in all, we will then have 12 servers in the Toolserver cluster.</p>
<p>The two remaining servers that have been ordered yesterday will also be used for the OpenStreetMap project, but not in the context of the Toolserver. They will be used to integrate interactive maps from OpenStreetMap directly into wikipedia Articles. More information about the <a href="http://meta.wikimedia.org/wiki/OpenStreetMap">OSM integration project</a> is avialable on meta.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/04/29/faster-better-stronger/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wherein the Toolserver journal is reborn in a much less annoying fashion</title>
		<link>http://journal.toolserver.org/entry/2009/04/28/wherein-the-toolserver-journal-is-reborn-in-a-much-less-annoying-fashion/</link>
		<comments>http://journal.toolserver.org/entry/2009/04/28/wherein-the-toolserver-journal-is-reborn-in-a-much-less-annoying-fashion/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 22:47:39 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=24</guid>
		<description><![CDATA[So, for a while we used Apache Roller for the journal.  Roller was something I&#8217;d used before (as a user, not an admin), and it could be integrated into our web SSO system, so it seemed like a good fit.  Unfortunately, it didn&#8217;t work out so well.  Roller is rather clunky to [...]]]></description>
			<content:encoded><![CDATA[<p>So, for a while we used <a href="http://roller.apache.org">Apache Roller</a> for the journal.  Roller was something I&#8217;d used before (as a user, not an admin), and it could be integrated into our web SSO system, so it seemed like a good fit.  Unfortunately, it didn&#8217;t work out so well.  Roller is rather clunky to use, and has a few small bugs (mostly related to SSO) that made it somewhat unpleasant to use.  Today, it decided not to allow a new user to log in, and wouldn&#8217;t provide any sort of useful error message besides Java stack traces.  So, I decided it was time for a change.</p>
<p>The Toolserver journal is now running on <a href="http://wordpress.org">WordPress</a>.  WordPress is much nicer to use, looks nicer for users, and has a large community of users.  It was also very easy to set up, and allowed us to import the old posts from Roller.  The only downside is the lack of an up-to-date LDAP integration plugin.  However, I don&#8217;t think that should be too hard to write&#8230;</p>
<p>PS: In case Planet gets confused by the change and re-displays all our old posts&#8211;sorry!</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/04/28/wherein-the-toolserver-journal-is-reborn-in-a-much-less-annoying-fashion/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Wherein a new server is delivered to the wrong place, but then remains there</title>
		<link>http://journal.toolserver.org/entry/2009/01/23/wherein-a-new-server-is-delivered-to-the-wrong-place-but-then-remains-there/</link>
		<comments>http://journal.toolserver.org/entry/2009/01/23/wherein-a-new-server-is-delivered-to-the-wrong-place-but-then-remains-there/#comments</comments>
		<pubDate>Fri, 23 Jan 2009 19:51:45 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=10</guid>
		<description><![CDATA[So, a while ago, Sun donated some servers to Wikimedia.  We managed to earmark one of them for the Toolserver, but for various reasons, it couldn&#8217;t be delivered to Amsterdam, so it ended up at our Tampa facility.  It&#8217;s been sitting there ever since, until yesterday, when we managed to rack it and [...]]]></description>
			<content:encoded><![CDATA[<p>So, a while ago, Sun donated some servers to Wikimedia.  We managed to earmark one of them for the Toolserver, but for various reasons, it couldn&#8217;t be delivered to Amsterdam, so it ended up at our Tampa facility.  It&#8217;s been sitting there ever since, until yesterday, when we managed to rack it and set up its OS and array.  The server is a Sun Fire X4150, with 2 quad-core 2.8GHz CPUs, 32GB RAM, 4 146GB system disks, and an external array with 12 15&#8242;000 rpm 146GB SAS disks.  Originally, it was meant to be a database server, but there&#8217;s not much point putting a single database server at Tampa, so now we&#8217;re looking for other uses.</p>
<p>The most obvious is to provide a US replica for <tt>cache.stable.toolserver.org</tt>, the caching proxy in front of the stable server; this will improve WikiMiniAtlas performance for North American users (the only tool currently using the cache).  But that&#8217;s only a tiny load, so it would be a waste to use the entire server for just that.  Other possibilities include moving our webapps (e.g. JIRA) there, which would free up some resources from the server hosting them in Amsterdam. No doubt, there are several other things we could use it for&#8230; please feel free to offer suggestions.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/01/23/wherein-a-new-server-is-delivered-to-the-wrong-place-but-then-remains-there/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Wherein MySQL replication, binlogs and the problems with s3 are explained</title>
		<link>http://journal.toolserver.org/entry/2009/01/11/wherein-mysql-replication-binlogs-and-the-problems-with-s3-are-explained/</link>
		<comments>http://journal.toolserver.org/entry/2009/01/11/wherein-mysql-replication-binlogs-and-the-problems-with-s3-are-explained/#comments</comments>
		<pubDate>Sun, 11 Jan 2009 05:18:57 +0000</pubDate>
		<dc:creator>river</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://wordpress.toolserver.org/?p=9</guid>
		<description><![CDATA[Many people have noticed that since the maintenance a couple of weeks ago, replication of s3 (the cluster that holds frwiki, and most of the smaller wikis) is halted, which means Toolserver tools are returning outdated information.  So, what is replication, and why isn&#8217;t it working?
One of the main features of the Toolserver is [...]]]></description>
			<content:encoded><![CDATA[<p>Many people have noticed that since the maintenance a couple of weeks ago, replication of s3 (the cluster that holds frwiki, and most of the smaller wikis) is halted, which means Toolserver tools are returning outdated information.  So, what is replication, and why isn&#8217;t it working?</p>
<p>One of the main features of the Toolserver is that it holds a copy of the Wikimedia <a href="http://www.mysql.com">MySQL</a> databases, and allows users (and tools) to run <a href="http://en.wikipedia.org/wiki/SQL">SQL</a> queries on them.  The databases are updated in real time, using a MySQL feature called <em>replication</em>.  Replication is a way of applying changes to one database &#8212; in this case, Wikimedia&#8217;s master database &#8212; to a <em>slave</em> server, the Toolserver.  It works by recording each change on the master to a file, called a <em>binlog</em> (short for &#8220;binary log&#8221;).  The slave server retrieves this file, and applies each change to its own copy of the database, thus bringing it up to date with the master.</p>
<p>If, for some reason, the slave server isn&#8217;t replicating, meaning no changes are being applied to it, the slave&#8217;s copy of the database will gradually become more and more out of date compared to the master.  This difference, usually expressed as a time (e.g. seconds, hours or days) is called the replication lag, or <em>replag</em>.  The most common cause of replag is the slave not being able to apply the changes fast enough.  For example, in an hour, it might only be able to apply 40 minutes&#8217; worth of changes.  This would cause 20 minutes of replag.  As the slave catches up (by applying changes faster than real time), the replag decreases until it reaches 0 again.</p>
<p>Another cause of replag is the slave not replicating at all.  This is what happened during the recent maintenance; as the servers were offline, no replication was happening.  When the platform came back up, there was about 6 days&#8217; worth of replag.  For the s1 (en.wikipedia.org) and s2 (de.wikipedia.org and a few others), there was no problem.  Although the lag was high, it has caught up now, and things are working normally.  However, for s3, an unexpected problem was encountered: <strong>someone deleted the binlogs!</strong></p>
<p>Now, the problem with binlogs is that they take up disk space on the master server, and Wikimedia&#8217;s s3 master is currently rather short of disk space.  To free some space, an admin deleted binlogs older than a couple of days, not realising that the Toolserver still needed them.  Since the record of changes to the master no longer exists, it&#8217;s impossible for the slave to replicate.  To restart replication, we need to dump (export) a copy of the master database, and import it to the slave, replacing the existing copy of the s3 database.  This copy will be new enough that binlogs exist from the time of the dump until now, allowing replication to restart.  Unfortunately, this process takes some time to do, and the database is unavailable while it&#8217;s happening.  Shortly (hopefully, in less than a month), we will be adding a new database server, and when that happens, a dump/import is required anyway.  To avoid having to do this twice, we decided to wait until the new server is ready before re-importing s3.</p>
<p>From a users&#8217; point of view, the most obvious effect of this will be that tools which query wikis on the s3 database will return outdated information until the problem is fixed.  We realise this is rather inconvenient; sorry.</p>
<p>This has happened a couple time before, but hopefully won&#8217;t happen again; after this, I created a way for Wikimedia admins to easily see what binlogs the Toolserver needs, so they can avoid deleting them.</p>
]]></content:encoded>
			<wfw:commentRss>http://journal.toolserver.org/entry/2009/01/11/wherein-mysql-replication-binlogs-and-the-problems-with-s3-are-explained/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
