Small Word (Sep 20 2007)

Author	Message
Sirius B Volunteer tester Send message Joined: 26 Dec 00 Posts: 24891 Credit: 3,081,182 RAC: 7	Message 646212 - Posted: 22 Sep 2007, 0:32:35 UTC - in response to Message 646184. the servers are not down they sre just having to deal with a large volume of traffic due to small work units and 10 day caches Thanks Lee ID: 646212 ·

Robert Send message Joined: 2 May 00 Posts: 5 Credit: 12,853,177 RAC: 10	Message 646260 - Posted: 22 Sep 2007, 2:13:51 UTC - in response to Message 646132. An adjustment is in order, and I suspect that it will be done when enough data is in place to do so. Mark, I'm sure there's enough data now! As you know, it's so annoying when someone just buggers off and leaves the rest of us high and dry for another couple of months. The BOINC system needs a heartbeat detector so that if someone doesn't make any contact at all for a week, then all uncrunched WUs are returned to the pool. If they are actively crunching through their cache, then deadlines could be extended for any WUs in progress but at risk of not completing. The worst that can happen is that if they do come back, then they have to download some new WUs. An itsy bitsy inconvenience compared to our enduring frustration! Some people have obviously gone AWOL, and an admin override to return WUs to the pool would be nice. Make it clearer in the docs, that it is very unpolite to leave a project in limbo. Oh, and while I'm at it, implementing redirect after post when submitting a message here is the correct way to write a bulletin board, to avoid double posts and paying twice etc. Andy. If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. Add me to the list also... I can't even find this person in the user list! Thats my IP I run several computers whats the issue ID: 646260 ·

Philadelphia Volunteer tester Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0	Message 646302 - Posted: 22 Sep 2007, 3:27:22 UTC - in response to Message 646260. An adjustment is in order, and I suspect that it will be done when enough data is in place to do so. Mark, I'm sure there's enough data now! As you know, it's so annoying when someone just buggers off and leaves the rest of us high and dry for another couple of months. The BOINC system needs a heartbeat detector so that if someone doesn't make any contact at all for a week, then all uncrunched WUs are returned to the pool. If they are actively crunching through their cache, then deadlines could be extended for any WUs in progress but at risk of not completing. The worst that can happen is that if they do come back, then they have to download some new WUs. An itsy bitsy inconvenience compared to our enduring frustration! Some people have obviously gone AWOL, and an admin override to return WUs to the pool would be nice. Make it clearer in the docs, that it is very unpolite to leave a project in limbo. Oh, and while I'm at it, implementing redirect after post when submitting a message here is the correct way to write a bulletin board, to avoid double posts and paying twice etc. Andy. If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. Add me to the list also... I can't even find this person in the user list! Thats my IP I run several computers whats the issue There are 52 WU's that were downloaded between August 13 and the 18th's, over a month ago, that are not being crunched. ID: 646302 ·

Robert Send message Joined: 2 May 00 Posts: 5 Credit: 12,853,177 RAC: 10	Message 646314 - Posted: 22 Sep 2007, 3:52:22 UTC - in response to Message 646302. An adjustment is in order, and I suspect that it will be done when enough data is in place to do so. Mark, I'm sure there's enough data now! As you know, it's so annoying when someone just buggers off and leaves the rest of us high and dry for another couple of months. The BOINC system needs a heartbeat detector so that if someone doesn't make any contact at all for a week, then all uncrunched WUs are returned to the pool. If they are actively crunching through their cache, then deadlines could be extended for any WUs in progress but at risk of not completing. The worst that can happen is that if they do come back, then they have to download some new WUs. An itsy bitsy inconvenience compared to our enduring frustration! Some people have obviously gone AWOL, and an admin override to return WUs to the pool would be nice. Make it clearer in the docs, that it is very unpolite to leave a project in limbo. Oh, and while I'm at it, implementing redirect after post when submitting a message here is the correct way to write a bulletin board, to avoid double posts and paying twice etc. Andy. If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. Add me to the list also... I can't even find this person in the user list! Thats my IP I run several computers whats the issue There are 52 WU's that were downloaded between August 13 and the 18th's, over a month ago, that are not being crunched. Not sure , whats happening there, all wu's I can track at the moment have been crunched. will have another beer and check my servers ID: 646314 ·

Gary Roberts Volunteer tester Send message Joined: 31 Oct 99 Posts: 95 Credit: 2,301,228 RAC: 0	Message 646343 - Posted: 22 Sep 2007, 6:02:22 UTC - in response to Message 646314. .... Thats my IP I run several computers whats the issue There are 52 WU's that were downloaded between August 13 and the 18th's, over a month ago, that are not being crunched. Not sure , whats happening there, all wu's I can track at the moment have been crunched. will have another beer and check my servers Despite your assertion to the contrary, the host with the 52 "hostages" is NOT one of your machines. You said that it had your IP - which is something that you cannot know since IP addresses are not published. Actually I presume you meant host ID and it certainly isn't one of yours. You don't even seem to have a dual core 6600 in your list, let alone a matching host ID. Why did you think it was one of your machines? ID: 646343 ·

Henk Haneveld Volunteer tester Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1	Message 646443 - Posted: 22 Sep 2007, 9:19:47 UTC - in response to Message 645989. Last modified: 22 Sep 2007, 9:20:15 UTC The BOINC system needs a heartbeat detector so that if someone doesn't make any contact at all for a week, then all uncrunched WUs are returned to the pool. A week is to short. I have a old host that does about 1 wu per day. But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things. A heartbeat detector is in principle ok but it should look at a interval of at least a month. ID: 646443 ·

Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0	Message 646456 - Posted: 22 Sep 2007, 10:03:49 UTC - in response to Message 646443. Last modified: 22 Sep 2007, 10:11:13 UTC The BOINC system needs a heartbeat detector so that if someone doesn't make any contact at all for a week, then all uncrunched WUs are returned to the pool. A week is to short. I have a old host that does about 1 wu per day. But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things. A heartbeat detector is in principle ok but it should look at a interval of at least a month. A week isn't too short, you'll still be contacting the server occasionally while processing a long WU. A heartbeat is just to say "I'm here and working", and probably a daily event. If you do 1 WU a day, and a 2 day cache, then you'll have 2-3 WUs in your queue. If you run continously then most that anyone would wait for your results is 2-3 days. If you run 8 hours a day, then the most anyone would wait is 8-9 days. If you take on Einstein too, and can't complete your seti cache, then they should be returned to the pool in advance of the deadline. It is really no big deal to give up unprocessed WUs. If your host contacts SETI daily, then it will know you're alive and could extend the deadline for a WU in progress and could even extend deadlines for cached WUs, though this situation should never arise as they should have been returned. If you switch off your machine for a week, and don't make any contact (regardless of WU size or machine power) then your queued WUs would return to the pool, and life goes on for everyone else. You get new ones when you reconnect, but of course lose the one that the machine was working on. If you do plan to go away without the machine, then set no new work, otherwise try to suspend after a WU has completed. Perhaps an AWAY button would be an idea to return queued WUs to the pool, while allowing the current WU to complete. Einstein has some 56 hour WUs (P4 estimate) with a deadline of 3 weeks. SETI has 8 hour WUs with a deadline of 8 weeks. Go figure. The current system works, but I think is flawed and could be done more intelligently and equitably for all. ID: 646456 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51470 Credit: 1,018,363,574 RAC: 1,004	Message 646457 - Posted: 22 Sep 2007, 10:03:59 UTC - in response to Message 646443. The BOINC system needs a heartbeat detector so that if someone doesn't make any contact at all for a week, then all uncrunched WUs are returned to the pool. A week is to short. I have a old host that does about 1 wu per day. But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things. A heartbeat detector is in principle ok but it should look at a interval of at least a month. I agree. There are many slow crunchers out there that still contribute and should not be cut off. If the servers could determine whether there is any work really in progress, no matter how slow, the host should not be disturbed. "Time is simply the mechanism that keeps everything from happening all at once." ID: 646457 ·

Henk Haneveld Volunteer tester Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1	Message 646474 - Posted: 22 Sep 2007, 11:20:25 UTC - in response to Message 646456. The BOINC system needs a heartbeat detector so that if someone doesn't make any contact at all for a week, then all uncrunched WUs are returned to the pool. A week is to short. I have a old host that does about 1 wu per day. But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things. A heartbeat detector is in principle ok but it should look at a interval of at least a month. A week isn't too short, you'll still be contacting the server occasionally while processing a long WU. A heartbeat is just to say "I'm here and working", and probably a daily event. If you do 1 WU a day, and a 2 day cache, then you'll have 2-3 WUs in your queue. If you run continously then most that anyone would wait for your results is 2-3 days. If you run 8 hours a day, then the most anyone would wait is 8-9 days. If you take on Einstein too, and can't complete your seti cache, then they should be returned to the pool in advance of the deadline. It is really no big deal to give up unprocessed WUs. If your host contacts SETI daily, then it will know you're alive and could extend the deadline for a WU in progress and could even extend deadlines for cached WUs, though this situation should never arise as they should have been returned. If you switch off your machine for a week, and don't make any contact (regardless of WU size or machine power) then your queued WUs would return to the pool, and life goes on for everyone else. You get new ones when you reconnect, but of course lose the one that the machine was working on. If you do plan to go away without the machine, then set no new work, otherwise try to suspend after a WU has completed. Perhaps an AWAY button would be an idea to return queued WUs to the pool, while allowing the current WU to complete. Einstein has some 56 hour WUs (P4 estimate) with a deadline of 3 weeks. SETI has 8 hour WUs with a deadline of 8 weeks. Go figure. The current system works, but I think is flawed and could be done more intelligently and equitably for all. If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline. Your idea of a host telling the server "I'm here and working", will create a lot of extra traffic on the already very busy servers. ID: 646474 ·

Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0	Message 646477 - Posted: 22 Sep 2007, 11:29:17 UTC - in response to Message 646474. If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline. Your idea of a host telling the server "I'm here and working", will create a lot of extra traffic on the already very busy servers. as indeed it would if everyone upgraded to Quad cores! What's the bandwidth problem with a simple packet? ID: 646477 ·

HDRW Send message Joined: 18 Oct 02 Posts: 14 Credit: 189,189 RAC: 0	Message 646490 - Posted: 22 Sep 2007, 12:29:47 UTC If it's suspected that a host isn't working (by heartbeat or whatever method), rather than returning their WUs to the pool, why not issue them to others as the 3rd cruncher? The current standard is that 2 people get each WU, but is there anything in the logic that stops a third one being added at some point? That way all 3 people should get the credit if they do finally submit the result. Or maybe I've misunderstood how it works... Cheers, Howard ID: 646490 ·

Henk Haneveld Volunteer tester Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1	Message 646506 - Posted: 22 Sep 2007, 12:52:18 UTC - in response to Message 646490. Last modified: 22 Sep 2007, 12:53:26 UTC If it's suspected that a host isn't working (by heartbeat or whatever method), rather than returning their WUs to the pool, why not issue them to others as the 3rd cruncher? The current standard is that 2 people get each WU, but is there anything in the logic that stops a third one being added at some point? That way all 3 people should get the credit if they do finally submit the result. Or maybe I've misunderstood how it works... Cheers, Howard Good idea. And if a host is running Boinc 5.10.xx then the not returned result will get the "Aborted by server" message. An other thing came to mind just now. There is no need for a heartbeat message from a host. The server already knows when the last contact was. It is in the computer info on your account page. ID: 646506 ·

Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0	Message 646511 - Posted: 22 Sep 2007, 13:02:11 UTC - in response to Message 646506. An other thing came to mind just now. There is no need for a heartbeat message from a host. The server already knows when the last contact was. It is in the computer info on your account page. Same thing, different name. The client contacts the server at least as often as the cache setting, so the server can know this. It just doesn't care about delinquent users! ID: 646511 ·

S@NL - XP_Freak Send message Joined: 10 Jul 99 Posts: 99 Credit: 6,248,265 RAC: 0	Message 646521 - Posted: 22 Sep 2007, 13:58:47 UTC - in response to Message 646064. Last modified: 22 Sep 2007, 14:02:03 UTC If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. It's already down to 40. :) And in 3 days time it will be down to 23. Goodbye Seti Classic ID: 646521 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 646523 - Posted: 22 Sep 2007, 14:00:28 UTC - in response to Message 646477. If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline. Your idea of a host telling the server "I'm here and working", will create a lot of extra traffic on the already very busy servers. as indeed it would if everyone upgraded to Quad cores! What's the bandwidth problem with a simple packet? For a host which is not always connected, none of these ideas make sense. The deadlines range from 8.68 days to about 113 days because the splitter estimate calculations assume there's a crunch time range of about 13 times. The actual range is much less, and there's no logical reason not to change those calculations to match. Joe ID: 646523 ·

Philadelphia Volunteer tester Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0	Message 646526 - Posted: 22 Sep 2007, 14:02:24 UTC - in response to Message 646521. If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. It's already down to 40. :) While that is good news, the bad news is the reason for the drop is a result of expired deadlines, which isn't good. If those deadlines had been months out, as some are now, who knows when they would have expired. There really should be a work around for computers that at least appear to be idle; a month qualifies as idle to me :) ID: 646526 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 646537 - Posted: 22 Sep 2007, 14:29:32 UTC - in response to Message 646526. If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. It's already down to 40. :) While that is good news, the bad news is the reason for the drop is a result of expired deadlines, which isn't good. If those deadlines had been months out, as some are now, who knows when they would have expired. There really should be a work around for computers that at least appear to be idle; a month qualifies as idle to me :) You mean like this guy? http://setiathome.berkeley.edu/workunit.php?wuid=147057956 I guess he decided SETI wasn't for him. I'll find out in a couple of weeks. PROUD MEMBER OF Team Starfire World BOINC ID: 646537 ·

KWSN THE Holy Hand Grenade! Volunteer tester Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0	Message 646545 - Posted: 22 Sep 2007, 14:37:56 UTC - in response to Message 646523. [snip] What's the bandwidth problem with a simple packet? For a host which is not always connected, none of these ideas make sense. [snip] Joe I agree, some hosts don't connect, except when specifically directed to: E.G. computers still on dial-up (like me!) My computers (except one, on a cable modem - at a remote location) usually spend about 23 hrs a day on "network activity suspended", because the modem isn't connected - or the modem (one modem, computers networked) is handling my other Internet traffic, which I don't want slowed down by BOINC traffic... . Hello, from Albany, CA!... ID: 646545 ·

KWSN THE Holy Hand Grenade! Volunteer tester Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0	Message 646547 - Posted: 22 Sep 2007, 14:45:43 UTC Last modified: 22 Sep 2007, 14:47:43 UTC Everyone - please remember that there may be reasons why a person "drops out" of SETI other than getting tired of the project or the project's woes... I've had to drop WU's because a) something major happened to my BOINC installation, causing me to have to detach and re-install; (losing all issued WU's in progress...) and b) my OS decided to take a dive, again losing all WU's in progress. (NTM the use of that computer for a number of days!) . Hello, from Albany, CA!... ID: 646547 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 646575 - Posted: 22 Sep 2007, 15:37:51 UTC Last modified: 22 Sep 2007, 15:38:16 UTC I don't see any reason why these ideas shouldn't be put forward as a serious proposal, notwithstanding dialup, EDF or anything else. The longest supported cache length is 10 days. Double that for luck: so say "Any host which has not contacted the server for over 3 weeks (21 days), and which has WUs 'in progress' assigned to it, isn't pulling its weight: it is causing stress to the project by blocking scarce database and storage resources." In the first instance, we don't have to do anything about the idle host. Just mark its WUs for re-issue, and let them be sent again to the next candidate who asks. It becomes a bit like an initial replication of two-and-a-bit, and in the process slightly reduces the demand on the splitters (since the re-issued WU datapak is already, by definition, on the download server). That should ensure that the quorum is met within about a month, and the idle host's result becomes 'Redundant - cancelled by server' (subject to the usual vagaries of download error, compute error, no consensus, etc. etc. - take it as read that these would generate re-sends as normal). If the idle host wakes up again at this point, nothing is lost: whatever the reason for the absence (holiday, summer heat, breakdown, road trip without internet access, priority given to other BOINC projects), s@h work continues according to the rules of BOINC. Newer clients will react to the server cancellation, and download replacement work: older clients will just crunch it anyway, and claim (and be awarded) credits for the work even though it's redundant. The more interesting question is: what would happen if the idle host remains AWOL and incommunicado, even after the quorum is formed? There could be another two months or more to go before the ultimate deadline. Personally, I think the project should bite the bullet, and be firm about this: if a host has not made *any contact at all* with the project for *over a month, and a quorum has been formed, then the WU should be assimilated unconditionally and the associated results deleted and purged from the BOINC database. We might get a few - a very, very few - who wake up like Rip van Winkle, crunch their ancient WUs, and try to cash them in for credit: and then complain when the server rejects them. But I think the good of the project is more important than the feelings of this particular category of users. It would need to be explained, politely but firmly, that Berkeley needed its storage space back, and has acted accordingly. And we should instill in users - current and new - a sense of community. In my book, it is plain antisocial* to download a whole bunch of WUs and then deliberately walk away without crunching them, aborting them, detaching from the project, or in some such way telling the servers that the results won't be coming back. Note that I say 'deliberately'. Nobody would criticise anyone who loses a few WUs to a comms or computer glitch, or even a whole cache-full to a hard drive failure or suchlike. Events like that are unplanned, and can usually be sorted out in under a month anyway: if you can re-attach under the same ID, or merge hosts, you can nowadays get the lost WUs reissued and crunch them as if nothing had happened. ID: 646575 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.