Team-stats and Berkeley load


log in

Advanced search

Questions and Answers : Web site : Team-stats and Berkeley load

Author Message
Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,139,952
RAC: 14,067
Netherlands
Message 872787 - Posted: 6 Mar 2009, 10:10:52 UTC

I have started to (re)build the SETI-stats of our team Seti@Netherlands to "their former glory".

The first action was to rebuild the "statsengine" that reads the Berkeley Data. The software that was used didn't work very good and needed a complete rebuild. This part has been completed and works very good now. (I spent special attention to not unnecessarily downloading the GZ files)

I have also altered/updated our forum sig stats update tool, there I found this handy URL:

http://setiathome.berkeley.edu/userw.php?id=3339763

I am thinking of using this URL / information to create more accurate/recent stats for our team. But what is acceptable usage?
I/we are thinking of this:
A S@NL member needs to have a RAC of more then 50 or will be passed by (position change). In our team that will amount to a 500 accounts.

Now, I don't want to strain the resources of the Berkeley network, so the "userw.php" is ideal and I have read that this php-code (even the entire system) caches the SQL-output (by the hour I believe).

Is it a problem when I would request this information (those aprox. 500 users) from 7:30 to 24:00 GMT+1 for every hour?
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24681
Credit: 522,659
RAC: 40
United States
Message 873605 - Posted: 8 Mar 2009, 2:40:06 UTC - in response to Message 872787.

I have started to (re)build the SETI-stats of our team Seti@Netherlands to "their former glory".

The first action was to rebuild the "statsengine" that reads the Berkeley Data. The software that was used didn't work very good and needed a complete rebuild. This part has been completed and works very good now. (I spent special attention to not unnecessarily downloading the GZ files)

I have also altered/updated our forum sig stats update tool, there I found this handy URL:

http://setiathome.berkeley.edu/userw.php?id=3339763

I am thinking of using this URL / information to create more accurate/recent stats for our team. But what is acceptable usage?
I/we are thinking of this:
A S@NL member needs to have a RAC of more then 50 or will be passed by (position change). In our team that will amount to a 500 accounts.

Now, I don't want to strain the resources of the Berkeley network, so the "userw.php" is ideal and I have read that this php-code (even the entire system) caches the SQL-output (by the hour I believe).

Is it a problem when I would request this information (those aprox. 500 users) from 7:30 to 24:00 GMT+1 for every hour?

Don't. Screen scraping is hard on the bandwith of the project and can get the IP used to do so bannished from the site.

Use the http://setiathome.berkeley.edu/stats/ directory instead, and please check the time stamp before downloading.
____________


BOINC WIKI

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,139,952
RAC: 14,067
Netherlands
Message 873908 - Posted: 8 Mar 2009, 23:34:57 UTC - in response to Message 873605.


Use the http://setiathome.berkeley.edu/stats/ directory instead, and please check the time stamp before downloading.

I already said:
(I spent special attention to not unnecessarily downloading the GZ files)

I am a programmer by profession, I am not the newby on the block.


Don't. Screen scraping is hard on the bandwith of the project and can get the IP used to do so bannished from the site.

I found nothing about excessive usage and banishing. I did find this:
http://boinc.berkeley.edu/trac/ticket/268

This mentions 5 to 10 requests per second!? I am talking about 1 request of aprox 400 bytes per 7 seconds!?
Bandwidth, well I don't even download the hosts.gz, that gives me "bandwidth credit" for aprox 1.000.000.000 userw.php files a day..

But stop.. I don't want to start out on the wrong foot here.. I could just have done it, and with what I have red (like in that ticket) I thought I wasn't doing weird things. I thought I was being nice to ask what is excepted. I didn't expect to get a slap on the wrist with a decent question.
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Aurora Borealis
Volunteer tester
Avatar
Send message
Joined: 14 Jan 01
Posts: 2981
Credit: 5,087,448
RAC: 2,312
Canada
Message 873919 - Posted: 8 Mar 2009, 23:52:16 UTC

You may want to contact mattl at ssl dot berkeley dot edu with your intentions and ask if this will cause any problems just to be on the safe side.
He's the one that keeps an eye on the system for bots and such, and would be the one to take action.

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,139,952
RAC: 14,067
Netherlands
Message 883039 - Posted: 7 Apr 2009, 11:47:12 UTC

Ow.. almost forgot:

Thanx Aurora! I wrote to Matt and got the "go-a-head" :D (some time ago, before "thumper" ;) )

One slight worry though:

....and if there are problems, you'll know.

I asked of the "knowing part" would be in the form of a mail, PM or an IPblock.. I do hope it won't be the later ;)

For now my team already has proudly announced the new stats-system, in Dutch, as are the stats themselves (maybee with the idea below it will be nice to make them in English as well :) )

The are still some issues that need to be taken care off before I will start on this, but if requested I will be more then pleased to inform you of my progress.. (here?)

Best regards,

Eesger
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24681
Credit: 522,659
RAC: 40
United States
Message 883042 - Posted: 7 Apr 2009, 12:07:12 UTC - in response to Message 873908.


Use the http://setiathome.berkeley.edu/stats/ directory instead, and please check the time stamp before downloading.

I already said:
(I spent special attention to not unnecessarily downloading the GZ files)

I am a programmer by profession, I am not the newby on the block.


Don't. Screen scraping is hard on the bandwith of the project and can get the IP used to do so bannished from the site.

I found nothing about excessive usage and banishing. I did find this:
http://boinc.berkeley.edu/trac/ticket/268

This mentions 5 to 10 requests per second!? I am talking about 1 request of aprox 400 bytes per 7 seconds!?
Bandwidth, well I don't even download the hosts.gz, that gives me "bandwidth credit" for aprox 1.000.000.000 userw.php files a day..

But stop.. I don't want to start out on the wrong foot here.. I could just have done it, and with what I have red (like in that ticket) I thought I wasn't doing weird things. I thought I was being nice to ask what is excepted. I didn't expect to get a slap on the wrist with a decent question.

Sorry I was a little abrupt. However, I do know that a screen scraper had their IP banished from the site yesterday. You do not get an email, you just suddenly can't get to the site anymore.
____________


BOINC WIKI

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12324
Credit: 2,626,514
RAC: 994
Netherlands
Message 883043 - Posted: 7 Apr 2009, 12:11:47 UTC - in response to Message 883039.

For now my team already has proudly announced the new stats-system, in Dutch, as are the stats themselves (maybee with the idea below it will be nice to make them in English as well :) )

Wonderlijk Nederlands daar en evenzo wonderlijk Engels hier. Als je vertaal hulp nodig hebt, roep maar. Ook al ben ik dan geen lid van S@NL en al zal ik het ook nooit worden, hierbij toch het aanbod. :-)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,139,952
RAC: 14,067
Netherlands
Message 883051 - Posted: 7 Apr 2009, 13:28:46 UTC - in response to Message 883042.

Sorry I was a little abrupt. However, I do know that a screen scraper had their IP banished from the site yesterday. You do not get an email, you just suddenly can't get to the site anymore.

Apology accepted of course. I wanted to PM you.. but didn't know how to start... ;) We started out on the wrong foot. I really do understand the risc of the idea I am working out.. This "system" can/will be tunable to your wishes (as to what is acceptable) and since I believe in "asking before doing"... ;) (you understand.. ;)

OK.. then may I ask a last "worst case scenario" question?
What if, even with all my best intentions, software coding etc, our server does get blocked(*).
Obviously I'll find out practically immediately (assuming I have "the system up and running").
I hereby promise to create a "dead man's switch" in this part of the stats-code and inform "the S@NL board" on how to use it (just imagine I am on holiday when it happens..).
can I ask you (John McLeod VII) or mail Matt to release the IP number? Otherwise we would have to start pulling stunts to keep the daily stats update going....

(*) there always is the risk of making/creating a disastrous software bug resulting in...
I log practically everything, and check the logs regularly to see if everything goes as planned..
But a (complex) piece of software can be a bit like life sometimes.. you never know for 100% sure.. ;)

____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24681
Credit: 522,659
RAC: 40
United States
Message 883053 - Posted: 7 Apr 2009, 13:39:15 UTC - in response to Message 883051.

Sorry I was a little abrupt. However, I do know that a screen scraper had their IP banished from the site yesterday. You do not get an email, you just suddenly can't get to the site anymore.

Apology accepted of course. I wanted to PM you.. but didn't know how to start... ;) We started out on the wrong foot. I really do understand the risc of the idea I am working out.. This "system" can/will be tunable to your wishes (as to what is acceptable) and since I believe in "asking before doing"... ;) (you understand.. ;)

OK.. then may I ask a last "worst case scenario" question?
What if, even with all my best intentions, software coding etc, our server does get blocked(*).
Obviously I'll find out practically immediately (assuming I have "the system up and running").
I hereby promise to create a "dead man's switch" in this part of the stats-code and inform "the S@NL board" on how to use it (just imagine I am on holiday when it happens..).
can I ask you (John McLeod VII) or mail Matt to release the IP number? Otherwise we would have to start pulling stunts to keep the daily stats update going....

(*) there always is the risk of making/creating a disastrous software bug resulting in...
I log practically everything, and check the logs regularly to see if everything goes as planned..
But a (complex) piece of software can be a bit like life sometimes.. you never know for 100% sure.. ;)

Matt and Eric will respond if you send a nice email about a problem, and there have been stats sites that have had coding errors that were blocked, and then unblocked after some negitiation as to the fix.

Most stats sites chek the timestamp on the XML a few times a day, and when the timestamp changes, download the xml. I don't know of any that follow this plan that get their IP blocked.

Most of the problem with screen scraping is actually the DataBase access, and for S@H the DB is always a bottleneck.
____________


BOINC WIKI

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,139,952
RAC: 14,067
Netherlands
Message 883054 - Posted: 7 Apr 2009, 13:42:53 UTC - in response to Message 883043.

Wonderlijk Nederlands daar en evenzo wonderlijk Engels hier. Als je vertaal hulp nodig hebt, roep maar. Ook al ben ik dan geen lid van S@NL en al zal ik het ook nooit worden, hierbij toch het aanbod. :-)

Wonderlijk.. ik las mijn laatste post nog even door, en idd.. zitten wel een paar 'grappige' typo's in. Nu is dat wat minder belangrijk, maar mocht je 'storende' typefouten in de S@NL-site vinden, 'don't be a stranger' even melden op 't forum is het makkelijkste (voor mij;)) maar een PM'tje kan natuurlijk ook ;)
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,139,952
RAC: 14,067
Netherlands
Message 883064 - Posted: 7 Apr 2009, 14:06:00 UTC - in response to Message 883053.
Last modified: 7 Apr 2009, 14:08:13 UTC

.....
OK.. then may I ask a last "worst case scenario" question?
What if, even with all my best intentions, software coding etc, our server does get blocked(*).
.......

Matt and Eric will respond if you send a nice email about a problem, and there have been stats sites that have had coding errors that were blocked, and then unblocked after some negitiation as to the fix.

Most stats sites chek the timestamp on the XML a few times a day, and when the timestamp changes, download the xml. I don't know of any that follow this plan that get their IP blocked.


I download the tables.xml (usually around 650 bytes) every half an hour to check the 'update_time' (and plan to use the 'nusers_total' & 'nteams_total' as check-values.. (not yet implemented))
I could have done a "get url" on the stats-dir, but that would be about 1.600 bytes.. and a xml is easier & safer to read..

Most of the problem with screen scraping is actually the DataBase access, and for S@H the DB is always a bottleneck.

And that is indeed exactly what I am going to be using.. As I have mentioned I'll nicely spread the requests over the hour and max the system out at 1 request each 5 seconds (and I'll promise to visit the Berkeley statspages less, will save you at least a 100 sql-requests a day :p.. and promote our stats rigorously among our members.. who knows maybe this system may even lower the load!)

But (more) seriously, I'll go ahead and work this out. If it does become a problem I'll / we'll find out and nicely ask to release the IP (promising to shut down this part of the system...) The maximum loss is the time I will be putting in the software.. but hey it's a hobby ;) And I honestly believe that with a (quite) active team like ours these "super up-to-date stats" may even lower your load "in the end balance" and lead it to our server..

We/you'll see ;)
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,139,952
RAC: 14,067
Netherlands
Message 895685 - Posted: 16 May 2009, 23:04:09 UTC

OK,

I have "the system up and running after two months of "overtime" ;)

I've got it using our/mine "SNL-RAC" (which drops faster to zero then the std. RAC) so I now have it querying the userw.php once per only 10 seconds resulting in aprox. 0.017% load in an aprox 400 query's per second using less then 500 queries over two hours..

That low a percentage is I think quite unnoticeable in the total load :D

(And our team now has now very nice up to date stats :D)

Next step.. getting to the graphics ;) (the plan is to get this up and running by the end of this year ;)
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Questions and Answers : Web site : Team-stats and Berkeley load

Copyright © 2014 University of California