Message boards :
Technical News :
Small Word (Sep 20 2007)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next
Author | Message |
---|---|
Sirius B Send message Joined: 26 Dec 00 Posts: 24912 Credit: 3,081,182 RAC: 7 |
the servers are not down they sre just having to deal with a large volume of traffic due to small work units and 10 day caches Thanks Lee |
Robert Send message Joined: 2 May 00 Posts: 5 Credit: 12,853,177 RAC: 10 |
An adjustment is in order, and I suspect that it will be done when enough data is in place to do so. Thats my IP I run several computers whats the issue |
Philadelphia Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0 |
An adjustment is in order, and I suspect that it will be done when enough data is in place to do so. There are 52 WU's that were downloaded between August 13 and the 18th's, over a month ago, that are not being crunched. |
Robert Send message Joined: 2 May 00 Posts: 5 Credit: 12,853,177 RAC: 10 |
An adjustment is in order, and I suspect that it will be done when enough data is in place to do so. Not sure , whats happening there, all wu's I can track at the moment have been crunched. will have another beer and check my servers |
Gary Roberts Send message Joined: 31 Oct 99 Posts: 95 Credit: 2,301,228 RAC: 0 |
.... Despite your assertion to the contrary, the host with the 52 "hostages" is NOT one of your machines. You said that it had your IP - which is something that you cannot know since IP addresses are not published. Actually I presume you meant host ID and it certainly isn't one of yours. You don't even seem to have a dual core 6600 in your list, let alone a matching host ID. Why did you think it was one of your machines? |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool. A week is to short. I have a old host that does about 1 wu per day. But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things. A heartbeat detector is in principle ok but it should look at a interval of at least a month. |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool. A week isn't too short, you'll still be contacting the server occasionally while processing a long WU. A heartbeat is just to say "I'm here and working", and probably a daily event. If you do 1 WU a day, and a 2 day cache, then you'll have 2-3 WUs in your queue. If you run continously then most that anyone would wait for your results is 2-3 days. If you run 8 hours a day, then the most anyone would wait is 8-9 days. If you take on Einstein too, and can't complete your seti cache, then they should be returned to the pool in advance of the deadline. It is really no big deal to give up unprocessed WUs. If your host contacts SETI daily, then it will know you're alive and could extend the deadline for a WU *in progress* and could even extend deadlines for cached WUs, though this situation should never arise as they should have been returned. If you switch off your machine for a week, and don't make any contact (regardless of WU size or machine power) then your queued WUs would return to the pool, and life goes on for everyone else. You get new ones when you reconnect, but of course lose the one that the machine was working on. If you do plan to go away without the machine, then set no new work, otherwise try to suspend after a WU has completed. Perhaps an AWAY button would be an idea to return queued WUs to the pool, while allowing the current WU to complete. Einstein has some 56 hour WUs (P4 estimate) with a deadline of 3 weeks. SETI has 8 hour WUs with a deadline of 8 weeks. Go figure. The current system works, but I think is flawed and could be done more intelligently and equitably for all. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool. I agree. There are many slow crunchers out there that still contribute and should not be cut off. If the servers could determine whether there is any work really in progress, no matter how slow, the host should not be disturbed. "Time is simply the mechanism that keeps everything from happening all at once." |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool. If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline. Your idea of a host telling the server "I'm here and working", will create a lot of extra traffic on the already very busy servers. |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline. as indeed it would if everyone upgraded to Quad cores! What's the bandwidth problem with a simple packet? |
HDRW Send message Joined: 18 Oct 02 Posts: 14 Credit: 189,189 RAC: 0 |
If it's suspected that a host isn't working (by heartbeat or whatever method), rather than returning their WUs to the pool, why not issue them to others as the 3rd cruncher? The current standard is that 2 people get each WU, but is there anything in the logic that stops a third one being added at some point? That way all 3 people should get the credit if they do finally submit the result. Or maybe I've misunderstood how it works... Cheers, Howard |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
If it's suspected that a host isn't working (by heartbeat or whatever method), rather than returning their WUs to the pool, why not issue them to others as the 3rd cruncher? The current standard is that 2 people get each WU, but is there anything in the logic that stops a third one being added at some point? That way all 3 people should get the credit if they do finally submit the result. Good idea. And if a host is running Boinc 5.10.xx then the not returned result will get the "Aborted by server" message. An other thing came to mind just now. There is no need for a heartbeat message from a host. The server already knows when the last contact was. It is in the computer info on your account page. |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
An other thing came to mind just now. There is no need for a heartbeat message from a host. The server already knows when the last contact was. It is in the computer info on your account page. Same thing, different name. The client contacts the server at least as often as the cache setting, so the server can know this. It just doesn't care about delinquent users! |
S@NL - XP_Freak Send message Joined: 10 Jul 99 Posts: 99 Credit: 6,248,265 RAC: 0 |
If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. It's already down to 40. :) And in 3 days time it will be down to 23. Goodbye Seti Classic |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline. For a host which is not always connected, none of these ideas make sense. The deadlines range from 8.68 days to about 113 days because the splitter estimate calculations assume there's a crunch time range of about 13 times. The actual range is much less, and there's no logical reason not to change those calculations to match. Joe |
Philadelphia Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0 |
If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. While that is good news, the bad news is the reason for the drop is a result of expired deadlines, which isn't good. If those deadlines had been months out, as some are now, who knows when they would have expired. There really should be a work around for computers that at least appear to be idle; a month qualifies as idle to me :) |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them. You mean like this guy? http://setiathome.berkeley.edu/workunit.php?wuid=147057956 I guess he decided SETI wasn't for him. I'll find out in a couple of weeks. PROUD MEMBER OF Team Starfire World BOINC |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
[snip] I agree, some hosts don't connect, except when specifically directed to: E.G. computers still on dial-up (like me!) My computers (except one, on a cable modem - at a remote location) usually spend about 23 hrs a day on "network activity suspended", because the modem isn't connected - or the modem (one modem, computers networked) is handling my other Internet traffic, which I don't want slowed down by BOINC traffic... . Hello, from Albany, CA!... |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
Everyone - please remember that there may be reasons why a person "drops out" of SETI other than getting tired of the project or the project's woes... I've had to drop WU's because a) something major happened to my BOINC installation, causing me to have to detach and re-install; (losing all issued WU's in progress...) and b) my OS decided to take a dive, again losing all WU's in progress. (NTM the use of that computer for a number of days!) . Hello, from Albany, CA!... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I don't see any reason why these ideas shouldn't be put forward as a serious proposal, notwithstanding dialup, EDF or anything else. The longest supported cache length is 10 days. Double that for luck: so say "Any host which has not contacted the server for over 3 weeks (21 days), and which has WUs 'in progress' assigned to it, isn't pulling its weight: it is causing stress to the project by blocking scarce database and storage resources." In the first instance, we don't have to do anything about the idle host. Just mark its WUs for re-issue, and let them be sent again to the next candidate who asks. It becomes a bit like an initial replication of two-and-a-bit, and in the process slightly reduces the demand on the splitters (since the re-issued WU datapak is already, by definition, on the download server). That should ensure that the quorum is met within about a month, and the idle host's result becomes 'Redundant - cancelled by server' (subject to the usual vagaries of download error, compute error, no consensus, etc. etc. - take it as read that these would generate re-sends as normal). If the idle host wakes up again at this point, nothing is lost: whatever the reason for the absence (holiday, summer heat, breakdown, road trip without internet access, priority given to other BOINC projects), s@h work continues according to the rules of BOINC. Newer clients will react to the server cancellation, and download replacement work: older clients will just crunch it anyway, and claim (and be awarded) credits for the work even though it's redundant. The more interesting question is: what would happen if the idle host remains AWOL and incommunicado, even after the quorum is formed? There could be another two months or more to go before the ultimate deadline. Personally, I think the project should bite the bullet, and be firm about this: if a host has not made any contact at all with the project for over a month, and a quorum has been formed, then the WU should be assimilated unconditionally and the associated results deleted and purged from the BOINC database. We might get a few - a very, very few - who wake up like Rip van Winkle, crunch their ancient WUs, and try to cash them in for credit: and then complain when the server rejects them. But I think the good of the project is more important than the feelings of this particular category of users. It would need to be explained, politely but firmly, that Berkeley needed its storage space back, and has acted accordingly. And we should instill in users - current and new - a sense of community. In my book, it is plain antisocial to download a whole bunch of WUs and then deliberately walk away without crunching them, aborting them, detaching from the project, or in some such way telling the servers that the results won't be coming back. Note that I say 'deliberately'. Nobody would criticise anyone who loses a few WUs to a comms or computer glitch, or even a whole cache-full to a hard drive failure or suchlike. Events like that are unplanned, and can usually be sorted out in under a month anyway: if you can re-attach under the same ID, or merge hosts, you can nowadays get the lost WUs reissued and crunch them as if nothing had happened. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.