A lot of Toolserver users run a MediaWiki bot framework called pywikipediabot to perform maintenance tasks on Wikipedia, such as fixing spelling errors or double redirects, and adding interwiki links to articles.
Unfortunately, some of these bots are rather excessive in their memory use. In the past, it was usual to see a few interwiki bots using 500MB+ each. Recently, support for caching to disk (instead of using RAM) was added to pywikipediabot. Hopefully, this will help reduce the memory use of these bots. Next on the list: a Java program called “Linky”, which is apparently some kind of IRC bot. There are 9 copies of Linky running on the toolserver, using 100MB each—that’s 900MB, or nearly ⅛th of the system’s total memory—dedicated to just that.
The problem with running a shared server is that users often don’t realise how many resources they’re using. If you’re running a bot at home, using 100MB RAM is probably fine. But when you have to share available resources with other people, you need to start looking more closely at your resource use, and perhaps spend some time reducing it. (If you think about it for a moment, 100MB just for an IRC bot is a little excessive, isn’t it?)
The main technical restriction for memory use on the toolserver is a homemade daemon called slayerd, which starts killing users’ processses when they use more than the limit (currently set at 1GB). This is mainly intended to stop runaway processes than enforce policy—usually an administrator will start killing processes long before 1GB, if they’re noticed. The only way to really fix the problem is to educate users about the problem. Of course, the problem then is treading the line between gentle reminders, and whining ;-)
#1 by Darkoneko at June 2nd, 2008
God, 100MB for an IRC bot… ? I know java tend to be a memory whore, but still, that’s impressive…
#2 by Tim Landscheidt at June 3rd, 2008
BTW, what is the purpose of running multiple instances of Interwiki bots (with the same code base)? Should not one bot have the same effect?
#3 by Darkoneko at June 3rd, 2008
One is the time a robot takes to make a full loop of the article list (especially on the biggest wikis)
The other is the fact they are launched from several different wikipedia (and since their analyse starts from the origin wiki, the article list is different). if, for example, someone starts an article on [[aaa]] on frwiki, put interwikis on it but does not update the other wiki’s interwiki list (common case), that page will only be seen from a bot having the french wikipedia as origin.
sorry if I’m not too clear :)
#4 by Tim Landscheidt at June 5th, 2008
I’m still not convinced :-). A thoughtfully engineered single bot should consume considerably less ressources than a bunch of not synchronized ones. But as long as RAM and CPU speed are cheap … :-)
#5 by Darkoneko at June 5th, 2008
Well, I never said they were not useful :)
I don’t think we need more than 2 bots starting from each major wiki. but there’s no organisation.
#6 by Platonides at June 10th, 2008
I have a linking bot working with less than 2Mb (and still most of it is probably the runtime). It’s true that it’s not in java but… How did they code it so badly to need 100Mb??