Extended Outage July 23 2010 Problems

Author	Message
Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 1018934 - Posted: 24 Jul 2010, 0:30:44 UTC Here is the post outage report of problems Please consider a Donation to the Seti Project. ID: 1018934 ·

soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0	Message 1019008 - Posted: 24 Jul 2010, 3:18:29 UTC - in response to Message 1018934. After the outtage was over, for some reason stopped requesting CPU work (on I5 intel machine).. with 4 units uncrunched, estimated at about 1 hour each for 3 processors. Took nap, computer had dry CPU's.. hit upload, and came clear. 6.10.58, not attached to other projects, latest nvidia drivers, no custom settings/applications. No limits shown. Cache was set to 5 days. Janice ID: 1019008 ·

Uli Volunteer tester Send message Joined: 6 Feb 00 Posts: 10923 Credit: 5,996,015 RAC: 1	Message 1019033 - Posted: 24 Jul 2010, 4:05:12 UTC No problems here, except the boards are slowing down. Pluto will always be a planet to me. Seti Ambassador Not to late to order an Anni Shirt ID: 1019033 ·

Ianab Volunteer tester Send message Joined: 11 Jun 08 Posts: 732 Credit: 20,635,586 RAC: 5	Message 1019153 - Posted: 24 Jul 2010, 10:36:19 UTC Downloads of new work were painfully slow with a few retrys, probably due to the higher limits this time around. Not enough congestion to cause a real problem, but I think a slightly lower limit initially would make things smoother, then ramp up as the traffic slows down? Ian ID: 1019153 ·

JohnDK Volunteer tester Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127	Message 1019249 - Posted: 24 Jul 2010, 16:47:31 UTC Not sure if it's a problem, but it's strange how difficult it is to get more tasks at the moment, the cricket shows it's only running half capacity, but 9 times or more out of 10 there's no work sent when requested. btw I'm not missing WUs at all just now, just wondering. ID: 1019249 ·

JohnDK Volunteer tester Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127	Message 1019256 - Posted: 24 Jul 2010, 17:11:45 UTC Current result creation rate is very low, guess that's why it's so diff to getting WUs. ID: 1019256 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1019281 - Posted: 24 Jul 2010, 19:27:32 UTC - in response to Message 1019256. Current result creation rate is very low, guess that's why it's so diff to getting WUs. As long as there's "Results ready to send" of both types, creation merely adds to the end of that queue. If anything, low creation rate reduces database load somewhat and allows other processes to run a bit more efficiently. Joe ID: 1019281 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 1019286 - Posted: 24 Jul 2010, 19:44:20 UTC - in response to Message 1019256. Current result creation rate is very low, guess that's why it's so diff to getting WUs. Looking through the new work I've got I found a lot of -2s and -3s meaning they've been out before. Most were time outs and either aborts or client detaches. One is a -6 that has been waiting since Feb. 26th for someone else to complete it. It looks like a big portion of them were ghosts timing out. But anyway, my point is that result creation rate isn't all that important as far as the amount of work going out so long as there are ghosts, aborts, timeouts, and detaches to send. PROUD MEMBER OF Team Starfire World BOINC ID: 1019286 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1019365 - Posted: 25 Jul 2010, 2:31:11 UTC - in response to Message 1019281. Last modified: 25 Jul 2010, 2:32:38 UTC Current result creation rate is very low, guess that's why it's so diff to getting WUs. As long as there's "Results ready to send" of both types, creation merely adds to the end of that queue. If anything, low creation rate reduces database load somewhat and allows other processes to run a bit more efficiently. Joe The real bottleneck, besides high network traffic, is the Download Feeder process. It only holds 100 WUs at a time, a mix of S@H Enhanced CPU/GPU, Astropulse, and Cuda tasks. It refills on a 5-6 second cycle, so even though there are several hundred thousand "Results Ready to Send", there are actually only 100 available in each 6-second cycle. If your request for work hits the Scheduling Server during the portion of the Feeder cycle when it has no tasks of the type you are requesting, you get the "No Tasks Available" message, and have to try again later. So as Joe and perryjay said, as long as there are plenty of "Results Ready to Send", the actual creation rate is irrelevant. ID: 1019365 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 1019392 - Posted: 25 Jul 2010, 5:54:03 UTC Jeff, Eric Looking at the Cricket Graphs and knowing I have one machine that has been attempting get work all day... There is a process that is Stuck for Downloads. Regards Please consider a Donation to the Seti Project. ID: 1019392 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13745 Credit: 208,696,464 RAC: 304	Message 1019398 - Posted: 25 Jul 2010, 6:21:47 UTC - in response to Message 1019392. Looking at the Cricket Graphs and knowing I have one machine that has been attempting get work all day... There is a process that is Stuck for Downloads. Could be the machine, traffic is around the level it has been after the last few outages once the per client Work Unit lmit has been hit by everyone; i hit the limit ages back but have been getting work after each Work Unit is completed. Grant Darwin NT ID: 1019398 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 1019723 - Posted: 26 Jul 2010, 15:41:06 UTC Problems?? GHOSTS, GHOSTS, GHOSTS!!!!! There has to be something the guys at Berkeley can do to slow these things down! I've got a ton of them again. A lot of them are past ghosts from others. One of them is already a -6 so it might kill it if it gets sent in again. I have too much work to do a run-down/detach right now so I will have to hold on to them until Friday at the least. PROUD MEMBER OF Team Starfire World BOINC ID: 1019723 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 1019963 - Posted: 27 Jul 2010, 4:54:36 UTC Last modified: 27 Jul 2010, 4:54:49 UTC AS Quoted from Joe My latest CPU MB WUs are way underestimated - like as 17 minutes and 6 minutes; this will cause catastrophic failures (-177, here we come) going forward. I have to agree that DA seems to have fumbled the ball again. And it's a pity, as things seemed to be working quite well the last week or two. I have about 100 of these now; I have suspended new tasks until I find a way to handle these. Must I abort them, because they will all error -177 out? Or is there something simple I can do with them? Thanks for your help! There are at least two relatively simple fixes. The more sophisticated one is Fred M's new rescheduler which you can get from http://www.efmer.eu/forum_tt/index.php?topic=428.0. It can boost the rsc_fpops_bound values for all S@H MB tasks to 5e17 which amounts to more than a year on even the fastest hosts. That removes the protection against a hung task which the bound is meant to provide, but there's no other downside AFAIK. The even simpler alternative is to shut BOINC down completely and do a global replace in client_state.xml of all <rsc_fpops_bound> with <rsc_fpops_bound>3. That boosts the bound by a factor of 4 at least, but affects all tasks for all projects. If you can wait until the beginning of the outage, doing that just twice gives a boost of at least 34. That should be sufficient protection against -177 errors. Joe So there are some things afoot. Please consider a Donation to the Seti Project. ID: 1019963 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 1020219 - Posted: 28 Jul 2010, 1:06:14 UTC Okay, the odd one... My machine with the 9800 GT with a bunch of the shorties (~3 min each) and a DCF of 0.145 is now showing a DCF 1.0004 and the run times look normal. The machine with the 250 has not caught up yet. Regards Please consider a Donation to the Seti Project. ID: 1020219 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 1020398 - Posted: 28 Jul 2010, 16:55:32 UTC Last modified: 28 Jul 2010, 16:55:44 UTC I have had a user state that they recieved a Detach and Reattach message for no apparent reason. Can anyone else confirm this? Regards Please consider a Donation to the Seti Project. ID: 1020398 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13745 Credit: 208,696,464 RAC: 304	Message 1020413 - Posted: 28 Jul 2010, 17:43:26 UTC - in response to Message 1020398. Last modified: 28 Jul 2010, 17:43:44 UTC Hasn't happened to me, but i remember quite a few posts about it during the first couple of extended outages. Grant Darwin NT ID: 1020413 ·

Lonnie Christensen Volunteer tester Send message Joined: 1 Feb 04 Posts: 7 Credit: 3,091,656 RAC: 0	Message 1020511 - Posted: 29 Jul 2010, 3:30:22 UTC I can't seem to get any work out of Seti. I have eighteen computers and they are set to execpt four days ahead of work. I can't even get one........... ID: 1020511 ·

Uli Volunteer tester Send message Joined: 6 Feb 00 Posts: 10923 Credit: 5,996,015 RAC: 1	Message 1020534 - Posted: 29 Jul 2010, 4:34:03 UTC - in response to Message 1020511. Welcome to the boards Lonnie. I won't hide or move your post, but you won't be able to get any work until tomorrow sometime, Berkeley time, just as the title of this threat states. Pluto will always be a planet to me. Seti Ambassador Not to late to order an Anni Shirt ID: 1020534 ·

parl Send message Joined: 22 May 04 Posts: 95 Credit: 4,476,976 RAC: 0	Message 1020843 - Posted: 30 Jul 2010, 17:16:55 UTC Oddly enough, the u/l and d/l servers as well as the splitters are currently not operational. So no work yet. ID: 1020843 ·

parl Send message Joined: 22 May 04 Posts: 95 Credit: 4,476,976 RAC: 0	Message 1020846 - Posted: 30 Jul 2010, 17:21:02 UTC OK. I was able to u/l and report, even though the Server status page says the u/l server is Disabled. ID: 1020846 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.