Team-stats and Berkeley load |
![]() |
| log in |
Questions and Answers : Web site : Team-stats and Berkeley load
| Author | Message |
|---|---|
|
I have started to (re)build the SETI-stats of our team Seti@Netherlands to "their former glory". | |
| ID: 872787 · | |
I have started to (re)build the SETI-stats of our team Seti@Netherlands to "their former glory". Don't. Screen scraping is hard on the bandwith of the project and can get the IP used to do so bannished from the site. Use the http://setiathome.berkeley.edu/stats/ directory instead, and please check the time stamp before downloading. ____________ BOINC WIKI | |
| ID: 873605 · | |
I already said: (I spent special attention to not unnecessarily downloading the GZ files) I am a programmer by profession, I am not the newby on the block.
I found nothing about excessive usage and banishing. I did find this: http://boinc.berkeley.edu/trac/ticket/268 This mentions 5 to 10 requests per second!? I am talking about 1 request of aprox 400 bytes per 7 seconds!? Bandwidth, well I don't even download the hosts.gz, that gives me "bandwidth credit" for aprox 1.000.000.000 userw.php files a day.. But stop.. I don't want to start out on the wrong foot here.. I could just have done it, and with what I have red (like in that ticket) I thought I wasn't doing weird things. I thought I was being nice to ask what is excepted. I didn't expect to get a slap on the wrist with a decent question. ____________ The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS | |
| ID: 873908 · | |
|
You may want to contact mattl at ssl dot berkeley dot edu with your intentions and ask if this will cause any problems just to be on the safe side. | |
| ID: 873919 · | |
|
Ow.. almost forgot: ....and if there are problems, you'll know. I asked of the "knowing part" would be in the form of a mail, PM or an IPblock.. I do hope it won't be the later ;) For now my team already has proudly announced the new stats-system, in Dutch, as are the stats themselves (maybee with the idea below it will be nice to make them in English as well :) ) The are still some issues that need to be taken care off before I will start on this, but if requested I will be more then pleased to inform you of my progress.. (here?) Best regards, Eesger ____________ The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS | |
| ID: 883039 · | |
Sorry I was a little abrupt. However, I do know that a screen scraper had their IP banished from the site yesterday. You do not get an email, you just suddenly can't get to the site anymore. ____________ BOINC WIKI | |
| ID: 883042 · | |
For now my team already has proudly announced the new stats-system, in Dutch, as are the stats themselves (maybee with the idea below it will be nice to make them in English as well :) ) Wonderlijk Nederlands daar en evenzo wonderlijk Engels hier. Als je vertaal hulp nodig hebt, roep maar. Ook al ben ik dan geen lid van S@NL en al zal ik het ook nooit worden, hierbij toch het aanbod. :-) ____________ Jord - BOINC FAQ Service - BOINC User Wiki Real is just a matter of perception. | |
| ID: 883043 · | |
Sorry I was a little abrupt. However, I do know that a screen scraper had their IP banished from the site yesterday. You do not get an email, you just suddenly can't get to the site anymore. Apology accepted of course. I wanted to PM you.. but didn't know how to start... ;) We started out on the wrong foot. I really do understand the risc of the idea I am working out.. This "system" can/will be tunable to your wishes (as to what is acceptable) and since I believe in "asking before doing"... ;) (you understand.. ;) OK.. then may I ask a last "worst case scenario" question? What if, even with all my best intentions, software coding etc, our server does get blocked(*). Obviously I'll find out practically immediately (assuming I have "the system up and running"). I hereby promise to create a "dead man's switch" in this part of the stats-code and inform "the S@NL board" on how to use it (just imagine I am on holiday when it happens..). can I ask you (John McLeod VII) or mail Matt to release the IP number? Otherwise we would have to start pulling stunts to keep the daily stats update going.... (*) there always is the risk of making/creating a disastrous software bug resulting in... I log practically everything, and check the logs regularly to see if everything goes as planned.. But a (complex) piece of software can be a bit like life sometimes.. you never know for 100% sure.. ;) ____________ The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS | |
| ID: 883051 · | |
Sorry I was a little abrupt. However, I do know that a screen scraper had their IP banished from the site yesterday. You do not get an email, you just suddenly can't get to the site anymore. Matt and Eric will respond if you send a nice email about a problem, and there have been stats sites that have had coding errors that were blocked, and then unblocked after some negitiation as to the fix. Most stats sites chek the timestamp on the XML a few times a day, and when the timestamp changes, download the xml. I don't know of any that follow this plan that get their IP blocked. Most of the problem with screen scraping is actually the DataBase access, and for S@H the DB is always a bottleneck. ____________ BOINC WIKI | |
| ID: 883053 · | |
Wonderlijk Nederlands daar en evenzo wonderlijk Engels hier. Als je vertaal hulp nodig hebt, roep maar. Ook al ben ik dan geen lid van S@NL en al zal ik het ook nooit worden, hierbij toch het aanbod. :-) Wonderlijk.. ik las mijn laatste post nog even door, en idd.. zitten wel een paar 'grappige' typo's in. Nu is dat wat minder belangrijk, maar mocht je 'storende' typefouten in de S@NL-site vinden, 'don't be a stranger' even melden op 't forum is het makkelijkste (voor mij;)) maar een PM'tje kan natuurlijk ook ;) ____________ The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS | |
| ID: 883054 · | |
..... I download the tables.xml (usually around 650 bytes) every half an hour to check the 'update_time' (and plan to use the 'nusers_total' & 'nteams_total' as check-values.. (not yet implemented)) I could have done a "get url" on the stats-dir, but that would be about 1.600 bytes.. and a xml is easier & safer to read.. Most of the problem with screen scraping is actually the DataBase access, and for S@H the DB is always a bottleneck. And that is indeed exactly what I am going to be using.. As I have mentioned I'll nicely spread the requests over the hour and max the system out at 1 request each 5 seconds (and I'll promise to visit the Berkeley statspages less, will save you at least a 100 sql-requests a day :p.. and promote our stats rigorously among our members.. who knows maybe this system may even lower the load!) But (more) seriously, I'll go ahead and work this out. If it does become a problem I'll / we'll find out and nicely ask to release the IP (promising to shut down this part of the system...) The maximum loss is the time I will be putting in the software.. but hey it's a hobby ;) And I honestly believe that with a (quite) active team like ours these "super up-to-date stats" may even lower your load "in the end balance" and lead it to our server.. We/you'll see ;) ____________ The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS | |
| ID: 883064 · | |
|
OK, | |
| ID: 895685 · | |
Questions and Answers : Web site : Team-stats and Berkeley load
| Copyright © 2013 University of California |