Playing Fair (Jan 15 2008)


log in

Advanced search

Message boards : Technical News : Playing Fair (Jan 15 2008)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 700293 - Posted: 16 Jan 2008, 0:37:05 UTC

Yeah... we're really pushing the boundaries of our mysql database these days. I'm finally catching up on several years' of backlogged archives and inserting zillions of rows to credited_job and this, on top of general increased usage, is gumming up the works. In fact, optimizing this table alone during today's outage took three hours (normally only a few minutes) - which explains the extreme length of today's downtime. I guess we'll have to turn of credited_job optimization until we actually use the table.

This brings up several questions, the first of which was asked in a previous thread: Why are you guys using mysql instead of a more robust commercial product? Two main reasons: BOINC projects generally are small academic ventures with limited funds, and BOINC is an open-source project itself utilizing other open-source pieces of software. So all you need is a relatively cheap linux box which comes with php, apache, mysql, etc. and it's pretty much plug and play. Remember the project specific data, i.e. the science database, can be whatever you want. In our case, it's Informix. Why Informix? We got it for free 10 years ago - we now have 10 years of experience using it as a group and it is still free to us. Would we consider changing to Oracle/SQL server/etc.? If somebody wants to buy such a license and donate a man/year to change all our back end software to do so, then we would perhaps entertain the thought, but we have higher priorities, especially as Informix works perfectly well at this point. It's the BOINC/mysql part that needs help, and we're sticking with it for reasons stated above, and with SETI@home being the flagship project of BOINC we don't want to diverge from the standard.

In other news, it seems the every day there's a different reason our web sites are so darn slow. Yesterday afternoon we were getting hit by some seemingly nefarious activity which I was able to block quite easily once I discovered it. But we were also getting hit by some scraping of stats pages via a robot (called BoincBot) that was not obeying robots.txt. I blocked these hits as well. We don't allow such activity on our web sites. If you want BOINC stats you can download the daily xml dumps just like everybody else.

On the bright side, we obtained another server donation yesterday from a private party: a 1U dual-opteron (2.4GHz) server with 16GB memory. I installed FC8 on it just now, though there was a little bit of tweaking to get that to go. There's no DVD drive in the thing (only a CD drive) and for some reason the was some disconnect with the 3ware disk controller such that the linux installer couldn't see the two root drives. I ultimately took that out of the equation and plugged the drives straight into the SATA ports on the motherboard. All's well and it's getting all yummed up now.

So we're looking for a KVM-over-IP, at least 16 ports (24 preferable), easy-to-use but secure connections via a web browser, etc. Any thoughts? The Belkin Omniview seems the cheapest/easiest, but only allows one person to connect to the whole unit at a time - not a showstopper. Any suggestions, experience with such devices, etc. out there?

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Neil Blaikie
Volunteer tester
Avatar
Send message
Joined: 17 May 99
Posts: 142
Credit: 6,466,200
RAC: 6
Canada
Message 700315 - Posted: 16 Jan 2008, 1:27:59 UTC

The Belkin is an excellent choice, as you mentioned not the best with one user connection at a time.

What is good though is that you can control Power Distribution units (if you have them?) allowing you to perform remote hard reboots.
Given the constraints that you unfortunately have to work under I would say the Belkin will be more than adequate for your needs.

Another one to possible look at as well is the Startech Server Remote Control External KVM control over IP. I personally haven't used this but have heard good things about it.

Without hesitation and the downside for you (money issue) I would recommend Raritan (Dominion KX432) but at $7100 US for 32 ports too steep compared to the Belkin.

Hope this small and brief info helps a bit, am sure there will be more people that will come up with opinions soon.

Keep up the good work guys



____________

Profile Dr. C.E.T.I.
Avatar
Send message
Joined: 29 Feb 00
Posts: 15993
Credit: 690,597
RAC: 6
United States
Message 700389 - Posted: 16 Jan 2008, 5:34:58 UTC


Great Post Matt - and excellent News to hear about the Server Donation . . .

Nice going Berkeley all around . . . and Thanks to Each of You


____________
BOINC Wiki . . .

Science Status Page . . .

Profile Uioped1
Volunteer tester
Avatar
Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 2
United States
Message 700393 - Posted: 16 Jan 2008, 5:48:32 UTC - in response to Message 700293.

It's the BOINC/mysql part that needs help...


You have mentioned numerous times that the stats queries in particular need work.

I've looked through the seti svn repository via the web interface and can't seem to locate those queries. Have the modifications that have been made to the boinc sample stats export been checked in to a publicly accessible repository somewhere?

Thanks.

Profile adrianxw
Avatar
Send message
Joined: 14 Jul 99
Posts: 155
Credit: 155,425
RAC: 0
Denmark
Message 700828 - Posted: 17 Jan 2008, 16:43:09 UTC

Hello Matt,

In other news, it seems the every day there's a different reason our web sites are so darn slow. Yesterday afternoon we were getting hit by some seemingly nefarious activity which I was able to block quite easily once I discovered it. But we were also getting hit by some scraping of stats pages via a robot (called BoincBot) that was not obeying robots.txt. I blocked these hits as well. We don't allow such activity on our web sites. If you want BOINC stats you can download the daily xml dumps just like everybody else.

This caught my eye as our teams SETI stats have not been updating for a couple of days, (bold added by me).

I can manually access the RPC URL, but when my automaton/bot, (which is called BOINCBot - like many others I expect), tries to do the same thing, it gets 403'd.

The implication...
scraping of stats pages

... there, is that I am screen scraping. That is not the case. I am using the web RPC's exactly as documented in the BOINC Software Development pages.

Can you shed some light on this?
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 700878 - Posted: 17 Jan 2008, 20:57:03 UTC - in response to Message 700828.

I am using the web RPC's exactly as documented in the BOINC Software Development pages.

Can you shed some light on this?


Fair enough. I don't know what documentation you're talking about. Please point the pages out to me. Basically, the activity I saw looked just like web scraping. I asked Dave if this was kosher and he said no. I blocked it. That's basically the whole story. Maybe we have misleading text somewhere.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile adrianxw
Avatar
Send message
Joined: 14 Jul 99
Posts: 155
Credit: 155,425
RAC: 0
Denmark
Message 701067 - Posted: 18 Jan 2008, 6:57:08 UTC
Last modified: 18 Jan 2008, 7:19:34 UTC

I don't know how much detail you want me to go into, suffice to say, if you start here you will find the calling criteria for the various RPC's. The one I use most is project/team_email_list.php?teamid=X&account_key=Y&xml=1. I don't need the additional info provided when I specify the account_key so omit it, it reduces the amount of XML returned. I have been looking at using team_lookup.php as well, but have never had a reply to my query in this thread.

I am still using the original LWP::UserAgent in the bot, if you would prefer I use the somewhat newer curl:: primitives, that is quite easy to do.

I cannot, of course, guarantee that every bot out there called BOINCBot is doing it that way. The original code I inherited used the same bot to screen scrape the user pages. I believe the same software was being used by other teams. I migrated our stats system away from the screen scraper because it was unreliable, I went the RPC route after advice I received here and at Einstein. Those software mods were done about a year ago however.

I can, of course, change the name of the Bot if that is what is required. I am a software developer, and like many of my kind, when I need a name for something my thought process is...

1. The software requires I name the bot
2. the bot is for BOINC
3. the name is unimportant to the rest of my system
4. BOINCBot

If you want more detail, let me know.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Conrad Human
Volunteer tester
Send message
Joined: 17 Nov 00
Posts: 67
Credit: 2,009,224
RAC: 0
South Africa
Message 701096 - Posted: 18 Jan 2008, 9:47:36 UTC - in response to Message 701067.

Is it possible to rename your bot to something unique?
like BoincBot46681 ?

maybe is it not even your bot that is doing this



BOINCBot


____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38336
Credit: 561,630,103
RAC: 645,733
United States
Message 701101 - Posted: 18 Jan 2008, 9:58:18 UTC - in response to Message 701096.
Last modified: 19 Jan 2008, 1:01:38 UTC

Is it possible to rename your bot to something unique?
like BoincBot46681 ?

maybe is it not even your bot that is doing this



BOINCBot


I don't think the name of the stats acquiring script has anything to do with it. I believe that adrianxw acknowledges that his program is what is in question, as he is no longer able to get the stats he is looking for.
The discussion is about how the script acquires the information, whether it is within the scope of what the Boinc rules advise and allow, and adrianxw believes he is doing so within those rules.
The name of the script (bot, whatever) could be anything including 'monster screenscraping get yer stats bot', but he seems to be sincere in thinking that he is working within the rules and doesn't know why he has been locked out.
I think that further discussion here between him and Matt will work things out.
Matt is just trying to shunt away excessive loads on the servers, and adrianxw is just trying to get the stats that he wants.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile adrianxw
Avatar
Send message
Joined: 14 Jul 99
Posts: 155
Credit: 155,425
RAC: 0
Denmark
Message 701110 - Posted: 18 Jan 2008, 10:36:13 UTC
Last modified: 18 Jan 2008, 10:40:20 UTC

Is it possible to rename your bot to something unique?

Easily, but basically, "what he has rather eloquently said"!
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile adrianxw
Avatar
Send message
Joined: 14 Jul 99
Posts: 155
Credit: 155,425
RAC: 0
Denmark
Message 701178 - Posted: 18 Jan 2008, 18:37:21 UTC

Our stats have started updating again. So either, thanks for that, or, (if you haven't done anything), sorry for bothering you and a big oops!
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 701285 - Posted: 18 Jan 2008, 22:04:06 UTC

First off, renaming BoincBot is besides the point. The name isn't the issue. In fact, renaming it is tantamount to hacking as I'm blocking it by name.

That said, I just unblocked it. I guess I didn't realize we had so many of these mechanisms available and documented, so I don't want to be "false advertising." Though I think we I need to address this in the BOINC community.

As well, I'm finding most of our problems are not with BoincBots as much as actual real DOS activity from elsewhere. So as you were.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Brian Silvers
Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 701357 - Posted: 19 Jan 2008, 1:50:06 UTC - in response to Message 701285.

First off, renaming BoincBot is besides the point. The name isn't the issue. In fact, renaming it is tantamount to hacking as I'm blocking it by name.


Oh how COULD you go there? 50,000 lashes with the proverbial wet noodle!

Conrad Human
Volunteer tester
Send message
Joined: 17 Nov 00
Posts: 67
Credit: 2,009,224
RAC: 0
South Africa
Message 702052 - Posted: 20 Jan 2008, 20:09:22 UTC - in response to Message 701285.

Sorry
I did not know u were blocking these bots by name and bypassing your ban of boincbot was not my intension
my intension was to get an unique name for his bot so it would have been easier to get linked to his userid
Conrad

First off, renaming BoincBot is besides the point. The name isn't the issue. In fact, renaming it is tantamount to hacking as I'm blocking it by name.


____________

Message boards : Technical News : Playing Fair (Jan 15 2008)

Copyright © 2014 University of California