Archive for June, 2009

Where is cassini?

After the hardware-installation were finished yesterday River began to set up the servers that will run solaris, while I began to set up cassini that will run debian.

After I was trough the tftp-network-installation-hell again (took me only 2h and a dozen reboots to let the installation starting…), we discovered that the host doesn’t have a hardware-raid. A fast investigation result in that the osm-squid ortelius (which is nearly identical to cassini) and cassini were swap at the hardware installation. Because cassini should run a little copy of the osm-database itself, using it without hardware-raid is no option.

We are unsure what is the best way to handle the situation and how long it will take until cassini can used at the moment.

,

No Comments

New servers are there

Yesterday Mark and Multichil were at our colo again and finished the hardware-installation of our new servers.

The toolserver-cluster got two new boxes: daphe, that is going to replace zedler as databasehost of the s2-cluster, and hyacinth that is going to work as host for our /home-directory.

Also they installed new servers for the openstreetmap/wikimedia-cooperation: A openstreetmap-toolserver on that people can play with osm-data (like our toolserver for wikimedia) named cassini, and a squid-proxy named ortelius and a database-host named ptolemy – both to handle the load the wikimedia-projects will create when they will use openstreetmap-maps.

At last two terminal-servers were installed to make it easier for the roots to handle broken servers.

Thanks to Mark and Multichil!

, ,

No Comments

Wherein msnbot behaves badly, and is banished

So, I’ve just blocked msnbot (which is, I assume, the search spider for Microsoft Live Search) from indexing the Toolserver. Many spiders, such as Google and Yahoo!, index our website every day, and cause no problems; in fact, we don’t even notice them. Msnbot is different. Specifically, it seems to have no rate limiting. Microsoft claim it will only request pages around once every 10 seconds; in reality, it was making 5-10 requests per second. Unfortunately, the page in question was a slow CGI script, and msnbot seemed to have obtained a list of every possible parameter it could pass to the script, which it then did, as fast as possible, until the web server was so overloaded it could hardly serve user requests:

wolfsbane     up   53+10:55,     1 user,   load 53.93, 55.49, 55.22

It doesn’t seem to have noticed that it’s blocked. It’s still hammering away as fast as it can, and getting nothing but 403 in reply. I’ve even added it to robots.txt, but it doesn’t seem to have noticed that either yet. Fortunately, our web server is quite fast at returning 403, so the load is looking much happier:

wolfsbane     up   53+11:58,     4 users,  load 0.68, 1.15, 3.44

After I blocked it, I tried to find a contact at Microsoft to report the problem too—as the spider clearly isn’t behaving like they expect, I thought they might appreciate a warning. Well, I can now report that Live Search really don’t want to be contacted. The closest thing I could find to a contact form, linked from the “troubleshooting problems with msnbot” page, had a list of categories for me to choose from. None of them was even slightly related to search. Some Googling suggested that “msnbot@microsoft.com” might work, but nope (”Returned mail: user unknown”). There’s a feedback link on the MSN front page, but who knows where that would go, and whether the feedback would ever reach someone who could deal with it? (Certainly not me, as they clearly state that they won’t reply to your feedback.)

I gave up in the end. If someone reading this happens to have a contact at Microsoft who would be interested in this issue, please feel free to let them know. Otherwise, I imagine Live Search users will just have to live without the Toolserver.

PS: I know msnbot (supposedly) supports the Crawl-Delay parameter in robots.txt. But given what I’ve seen today, I don’t particularly want to rely on this, even if it does, some day, reload our robots.txt.

1 Comment