Message boards :
Technical News :
Small Word (Sep 20 2007)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next
Author | Message |
---|---|
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
Everyone - please remember that there may be reasons why a person "drops out" of SETI other than getting tired of the project or the project's woes... I've had to drop WU's because a) something major happened to my BOINC installation, causing me to have to detach and re-install; (losing all issued WU's in progress...) and b) my OS decided to take a dive, again losing all WU's in progress. (NTM the use of that computer for a number of days!) I once had to drop all work units because I opened a Trojan horse that was so new that Symantec had no idea about it. I wound up having to abort all my results, upload the new trojan to Symantec, perform a backup, reformat, and reinstall. Symantec later emailed me stating that I sent something in that they did not now about, and had a link to definitions that detected the new threat. Unfortunately, by that time, I had already nuked my hard drive. I did not want to have my hijacked laptop possibly be part of any spam botnet. The definitions were able to catch a piece of the Trojan I accidentally backed up, though. Anyways, I am pretty happy with Norton because it has caught some emails that contained BIOS-erasing Trojan horses and other really nasty malware packages. I understand that someone has to be the first victim of some malware who uploads it to Symantec or other antivirus company before anyone can be protected from it. However, if it happens too often, I will get upset and switch. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
There is no heartbeat detector time that works. I have a couple of hosts that entered EDF for CPDN with a collection of other tasks on the host. All of those other tasks were completed and returned by the deadline, but the hosts were scheduled to be in EDF for well over 6 months. These hosts only contacted CPDN to upload intermediate work. They did not contact any of the other projects. One of those is still going and has a completion time in January for the CPDN task (late, but not much of a problem for CPDN). The other crashed the CPDN task. Just be patient. Your credit will come eventually. BOINC WIKI |
Henk Haneveld Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1 |
I can agree to the proposal of Richard Haselgrove. The only thing that is not completely right is the maximum cache size. It is 10 days of connection interval + 10 days of additional cache for a total of 20 days. I don't think many people will have this setting but is it theoretical possible. So I think 1 month is the better cut off time. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Everyone - please remember that there may be reasons why a person "drops out" of SETI other than getting tired of the project or the project's woes... I would suggest that the vast majority of crunchers are not even aware of the "project's woes." BOINC is loaded, and otherwise ignored. |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool. There is nothing wrong with the system, only the implementation of it by individual projects. Workunits can send "trickles" to the project server that indicate its progress. For example, CPDN does this, and its projects servers use this as a "client heartbeat" for the issued workunits. Since their WU take 6-12 months to crunch each, trickles are critical for efficiency of the project. I think Einstein and perhaps others could be improved by using trickles. SETI's WU are really too small IMHO for trickles and their servers are already overloaded without another server function to run. |
Tommy Send message Joined: 26 Jul 00 Posts: 9 Credit: 530,369 RAC: 0 |
MATT, ARE WE HAVING PROBLEMS IN GRANTING CREDIT? I HAVE NEVER HAD SO MANY PENDING WORK UNITS, OVER 600+ HOURS. MY PERCEPTION IS THAT IT STARTED BUILDING UP AFTER THE LAST BACKUP OUTAGE, COULD BE WRONG ON THAT. By now you probably agree with me there is a problem. I have been a user for over 6 years now (have some understanding on how the system works - very low level and do not want to know more, all I want to do is have my 3 PC's crunch their little procssors until they are content. I now have have 19 WU pending, that has NEVER happened to me! The rate of credit declined sharply either on Monday or Tuesday. I have been reading all the post today, some very good thoughts out there. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
By now you probably agree with me there is a problem. There is no problem. You just have to wait for others to return their results. Grant Darwin NT |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 31006 Credit: 53,134,872 RAC: 32 |
Nothing wrong or out of the odinary. As more people cache more work it will take longer for the results to get returned. Expect the granting of credit to get slower. Your machine alone returning a workunit means nothing. Another machine must also return it and with the same results. Just checked and I have 43 for about 3000 cobblestones. Higher than when 3 computers were needed for a result but average since they went down to 2 computers. Why you ask? Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life. Gary MATT, ARE WE HAVING PROBLEMS IN GRANTING CREDIT? I HAVE NEVER HAD SO MANY PENDING WORK UNITS, OVER 600+ HOURS. MY PERCEPTION IS THAT IT STARTED BUILDING UP AFTER THE LAST BACKUP OUTAGE, COULD BE WRONG ON THAT. |
Palomar Jack Send message Joined: 9 Nov 04 Posts: 44 Credit: 503,405 RAC: 0 |
Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life. No, it's not "life", it's inconsiderate. For example, if you "recruited" Aunt Bertha's computer and it's is only on a couple of of hours a week to check email and you really think it needs to run a DC project, you don't need a ten day cache. It won't run out of work if there's a two or three day outage, it just won't. And even if it did, just how much credit are you going to lose? A dozen or so off of your RAC will not... make... a difference. |
RandyC Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0 |
Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life. Which just goes to show that every user's situation is different. So different users/machines need different cache settings. BOINC allows that. That's life in the DC world. |
n7rfa Send message Joined: 13 Apr 04 Posts: 370 Credit: 9,058,599 RAC: 0 |
Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life. Some folks have gotten "gun shy" because of the recent problems with getting work. As a result, they've upped their cache size to make sure that they will get through any rough times. Then there are those clients that for one reason or another haven't connected to the Berkeley servers for weeks. Their work won't be re-issued until it hits the deadline and then you have to wait some more for the new system to return the result. I'm connected 24x7 and personally, I've upped my cache from 3-5 days to 7 days. This allows for a 4 day weekend, 1 day for Berkeley to fix things, the Tuesday outage, and the splitters to get caught up on either Wednesday or Thursday. I'm currently running my Windows systems down to zero so I can work on them. Once I'm done with the work, I'll go back to 7 days. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life. If you only run the computer for a couple hours per week, the % (% of time computer is running & while running, % of time BOINC is allowed to run) stats would drop, affecting the calculation for a 10 day cache. That user won't get a full 10 day cache for very long. And if they only run the machine a couple of hours a day and bumped the cache up to 10, the % stats are already low, which will prevent the user from getting a full 10 day cache. BOINC is designed so that your scenario of inconsiderate people just doesn't happen. So any pending credits you have are due to the speed of the crunchers themselves and not an over-inflated cache setting. Not everyone has a fast computer. As said, that's life. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life. Well, the user will still get a 10 day cache, but 10 days of their usual pattern of BOINC activity, rather than 10 days of 24/7 mega-overclock multicore multiCPU, which is what people tend to notice. BOINC is designed so that your scenario of inconsiderate people just doesn't happen. LOL - now there's a statement! Could we offer BOINC to the UN, as the solution to all those inconsiderate people who keep starting wars, killing each other, that sort of thing? Seriously, BOINC reacts well to past events - sometimes by adjusting averages, sometimes by taking immediate corrective action (viz. RDCF). But it doesn't predict future events ("Motherboard going to fail in nine days time, I see. Better stop fetching more work, then."), it doesn't prevent external failures ("You knocked the network cable out of the wall jack while you were hoovering last night. I've plugged it back in for you."), and it doesn't prevent its human operator from making spontaneous changes of plan ("Someone was rude to me on a forum. I'm not going to crunch for that lot again."). That's where the inconsiderate/antisocial behaviour comes in - not just having a cache (I have one myself, though less than 10 days) but having a very large cache and just sitting on it, not returning work so that the project, and other crunchers, can clear the results and move on. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Well, the user will still get a 10 day cache, but 10 days of their usual pattern of BOINC activity, rather than 10 days of 24/7 mega-overclock multicore multiCPU, which is what people tend to notice. Umm.. yeah, that's what I said. 8-) Seriously, BOINC reacts well to past events - sometimes by adjusting averages, sometimes by taking immediate corrective action (viz. RDCF). But it doesn't predict future events ("Motherboard going to fail in nine days time, I see. Better stop fetching more work, then."), it doesn't prevent external failures ("You knocked the network cable out of the wall jack while you were hoovering last night. I've plugged it back in for you."), and it doesn't prevent its human operator from making spontaneous changes of plan ("Someone was rude to me on a forum. I'm not going to crunch for that lot again."). That's where the inconsiderate/antisocial behaviour comes in - not just having a cache (I have one myself, though less than 10 days) but having a very large cache and just sitting on it, not returning work so that the project, and other crunchers, can clear the results and move on. No, it doesn't know future events, but the second it becomes the past, it learns from it. So no, it won't know about a motherboard failure, etc. But it will eventually notice that you're not running at 100% 24/7 and adjust accordingly. It will straighten itself out. I thought I've shown myself to be a reasonable person. I didn't know someone would take a statement of mine to the extreme and infer something that certainly wasn't intended. (Ah, who am I kidding? That's the nature of humans! To take things to extremes when making a point. LOL) Your point is taken and understood, but in doing so, you lost or obfuscated the meaning of my post. |
Andy Lee Robinson Send message Joined: 8 Dec 05 Posts: 630 Credit: 59,973,836 RAC: 0 |
By now you probably agree with me there is a problem. I have no problem with having to wait while other users are working through their caches, but I keep a 1-2 day cache so that I don't keep others waiting. I do (as should all of us) have a problem with those that have buggered off, for whatever reason, and there's only a blunt and unintelligent tool of WU expiry to recover from that situation. The deadlines for that are just far too long. I'm simply proposing a better solution that will enhance workflow, and make the system more efficient and responsive and make more people happy. The credit delay is minimal annoyance, but as a perfectionist and professional system analyst, suboptimal systems irritate me. WU management can and should be improved. Any user that has gone AWOL should have the WUs returned to the pool. Period. I'll leave it to dynamicists to decide thresholds. Those that opt to process a large cache offline can set a flag to be ignored. Those that are connected 24x7 should be more involved and the core should have more awareness of the participants' status. It is simply unacceptable that parts of a supercomputer should just disappear without intervention for months on end while running a job!!! Andy. |
ML1 Send message Joined: 25 Nov 01 Posts: 21209 Credit: 7,508,002 RAC: 20 |
... It is simply unacceptable that parts of a supercomputer should just disappear without intervention for months on end while running a job!!! That is actually a design parameter of Boinc... The system recovers gracefully even if a proportion of the supporting hosts disappear without trace! However, I guess the credit whores will always be upset at whatever small delay is incurred for their credits... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
However, I guess the credit whores will always be upset at whatever small delay is incurred for their credits... Martin, I hope you will agree, from my posting history, that I'm not a credit whore (and I'm sure that you weren't directing that comment at me - no offence taken). But I get upset too. I get upset because the scientific aims of the project are put at risk, because the ancient and underfunded hardware is stressed beyond its limits by keeping an excessive and unnecessary volume of archival material in working, front-line, active storage. I agree that BOINC is designed to tolerate and recover from stresses like this, but I still think, and say, that it's antisocial to test those stress limits when they can be avoided with a little thoughtfulness. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13854 Credit: 208,696,464 RAC: 304 |
The deadlines for that are just far too long. Shorter deadlines would be good; 2-3 weeks for the longest of Work Units should be more than enough for the slowest of crunchers. It'd help reduce the load on the servers by having less "in progress" work. But as it is the credit does come though eventually it's not a big deal, just an extremely minor one IMHO. Grant Darwin NT |
john_morriss Send message Joined: 5 Nov 99 Posts: 72 Credit: 1,969,221 RAC: 48 |
If we're looking for bad examples, try Computer # 3639917 It's been MIA since Aug 30, sitting on 120 uncrunched Results, incluidng one of mine. It's part of a dozen or so computers run by Technical Edge. All the others are reporting in almost daily... I also noticed that this computer is paired on one WU with Computer # 3742648 who was heard from Sept 10, with 446 Results. That's a lot of strain on the database from just two computers... Interesting what you can find out just poking around... |
WimTea Send message Joined: 15 Feb 02 Posts: 34 Credit: 909,865 RAC: 0 |
Who can beat comp #3768714 with 719 results, not heard from since Sept 5th... Really, I don't care if it take a week or two months for a result to get granted credits by getting validated, as long as it meets the projects' scientific goals I consider that to be perfectly OK. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.