Panic Mode On (78) Server Problems?

Author	Message
Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0	Message 1304911 - Posted: 11 Nov 2012, 14:01:15 UTC - in response to Message 1304863. My RAC has been steadily declining and I have noticed that the task list for two of my rigs show most of the assigned tasks under Error, and the status as abandoned. Could anyone tell me why this might occur. The rigs still have all the tasks and are crunching them, but obviously, not gaining any credit for the work being done. Should I reset the rig or is this something that will get sorted out automatically? The same happened to me on one computer. After thousands of scheduler timeouts, one request apparently got mangled/misinterpreted enough, server thought I reset the project and decided to abandon all tasks. That's at least what I think what happened... In any case, you should delete (abort) all those tasks with boinc manager, as they will not get automatically deleted and server will just ignore them (you won't get any credit; they were already marked abandoned for you and re-sent to other crunchers). ID: 1304911 ·

Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182	Message 1304961 - Posted: 11 Nov 2012, 16:19:17 UTC - in response to Message 1304877. The servers are still in recover after the AP splitting, no doubt it'll be some time before everyone's ghosts are resent, There is a scheduler Bug fix in the works, hopefully it'll be deployed at Seti Beta on Monday, not expecting it to be a total cure, just a step in the right direction, Claggy Yes, it will be some time. I am now under 3000 ghosts on one computer and with the cache limit it will be a long time before they are all sent to me. But at least downloads are working as they should. Most of the CPU tasks are being set to run at High Priority once they are scheduled to run. It is interesting to note that only 5 of the 8 CPU tasks will be High Priority at a time, they start in normal running mode then become High Priority as a WU gets finished. Just interesting to note. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours ID: 1304961 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1304998 - Posted: 11 Nov 2012, 18:13:53 UTC - in response to Message 1304877. There is a scheduler Bug fix in the works, hopefully it'll be deployed at Seti Beta on Monday, not expecting it to be a total cure, just a step in the right direction, Will that fix the Scheduler timeout problems, or fix the increasing number of Ghosts that are created when the Scheduler keeps timing out? Grant Darwin NT ID: 1304998 ·

Michael W.F. Miles Send message Joined: 24 Mar 07 Posts: 268 Credit: 34,410,870 RAC: 0	Message 1305011 - Posted: 11 Nov 2012, 18:39:04 UTC What get me here is after fixing the connection issue with a proxy server I am now having to check on this machine every four hours as I only am getting enough work to keep it going for four hours before the LIMITS kick in. I have been running Ghostdet and two days ago I was getting 54% ghosts tasks Yesterday was 15% Today 0% with 200 tasks on board. They are all mostly shorties though. Seems to be working itself out. Now everytime I get a work request in the servers will only do ONE TO ONE Report one, get one. I hope this gets solved really fast as my patience is wearing thin as most are We have built the fastest computer system in the world, lets keep it busy ID: 1305011 ·

fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0	Message 1305017 - Posted: 11 Nov 2012, 19:01:29 UTC How does one go about finding a proxy that is safe to use? Frank ID: 1305017 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1305020 - Posted: 11 Nov 2012, 19:06:52 UTC - in response to Message 1304998. There is a scheduler Bug fix in the works, hopefully it'll be deployed at Seti Beta on Monday, not expecting it to be a total cure, just a step in the right direction, Will that fix the Scheduler timeout problems, or fix the increasing number of Ghosts that are created when the Scheduler keeps timing out? It'll fix the Bug of resending work to the wrong device, ie Boinc asks for CPU work only, but gets resends to the GPU instead (which wasn't asking for work), and so timing out any VLARs it encounters. Claggy ID: 1305020 ·

Vipin Palazhi Send message Joined: 29 Feb 08 Posts: 286 Credit: 167,386,578 RAC: 0	Message 1305226 - Posted: 12 Nov 2012, 2:49:02 UTC - in response to Message 1304911. The same happened to me on one computer. After thousands of scheduler timeouts, one request apparently got mangled/misinterpreted enough, server thought I reset the project and decided to abandon all tasks. That's at least what I think what happened... In any case, you should delete (abort) all those tasks with boinc manager, as they will not get automatically deleted and server will just ignore them (you won't get any credit; they were already marked abandoned for you and re-sent to other crunchers). Thanks Khangollo, I shall do that. ID: 1305226 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1305282 - Posted: 12 Nov 2012, 5:52:14 UTC - in response to Message 1305226. While i was at work, for some reson my internet connection died. When i was able to reconnect, and upload all the work that had piled up, naturally the Scheduler timed out on all request for work & reporting. Even with No New Tasks set it took several attempts to get a response from the Scheduler. And even now, with only one task to report on one system, and a cou7ple on the other, all that i'm getting are Scheduler timeout errors. Few more hours & i'll be completely out of work, before the weekly outage where i was expecting to run out of GPU work at least. Grant Darwin NT ID: 1305282 ·

Fred E. Volunteer tester Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0	Message 1305351 - Posted: 12 Nov 2012, 13:47:33 UTC While i was at work, for some reson my internet connection died. When i was able to reconnect, and upload all the work that had piled up, naturally the Scheduler timed out on all request for work & reporting. Even with No New Tasks set it took several attempts to get a response from the Scheduler. And even now, with only one task to report on one system, and a cou7ple on the other, all that i'm getting are Scheduler timeout errors. Few more hours & i'll be completely out of work, before the weekly outage where i was expecting to run out of GPU work at least. Also getting timeouts, even on NNT. Dropping my max reported setting. I'll also run out because I've got mostly shorties. Low limits and shorties = cruelty to crunchers! I crunched 3 tasks this morning with 60 day deadlines - 10 Jan. I don't remember seeing that before. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. ID: 1305351 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1305380 - Posted: 12 Nov 2012, 15:25:16 UTC I haven't put my hands on my i7 for a few days, but I will have to when i get home from work today. What concerns me, though, is that my account page says it only has 83 in progress, well below its limit of 200. Before all the trouble started, it typically ran in the 1100-1600 range. I know it just downloaded some new units from Einstein, but I don't know what the cause/effect relationship is. Is it getting Einstein because it can't get Seti, or is it feeling debt to Einstein and favoring it for now? My other two machines each reported one unit back to Einstein over the weekend without asking for more, leaving one of them with only Seti work on board. And I just slid back a position in my joining date class. :-( David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1305380 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1305408 - Posted: 12 Nov 2012, 17:16:32 UTC Media alert...... The kitties have inbound WUs!!!! "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1305408 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1305533 - Posted: 12 Nov 2012, 20:29:04 UTC - in response to Message 1305408. Last modified: 12 Nov 2012, 20:31:31 UTC Media alert...... The kitties have inbound WUs!!!! Purrrrr...... [edit]Purr also for the fact that, according to the weather thing in my signature at the time I posted this, we actually got up to 33F here. Woo hoo. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1305533 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 1305604 - Posted: 12 Nov 2012, 23:03:11 UTC - in response to Message 1305408. You do seem to have a metric buttload of GPU tasks even though your CPU pile finally dropped bellow 100 on two kitties. I think the all tasks web page still needs a filter for CPU vs GPU tasks, or at least a count. "Life is just nature's way of keeping meat fresh." - The Doctor ID: 1305604 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1305632 - Posted: 13 Nov 2012, 0:13:01 UTC - in response to Message 1305604. You do seem to have a metric buttload of GPU tasks even though your CPU pile finally dropped bellow 100 on two kitties. I think the all tasks web page still needs a filter for CPU vs GPU tasks, or at least a count. Well, this does it for me, but for a particular machine with particular software. On a Windows machine you'd need either cygwin or some replacement for wc (word count); ISTR there's a DOS equivalent of grep -- find? Bug: the grep for 'fermi' returns a line not associated with jobs in progress, so the third line overcounts by one. [eesridr:BOINC] > cat showjobs date grep 'received_time' client_state.xml\|wc grep 'fermi' client_state.xml\|wc [eesridr:BOINC] > . showjobs Tue Nov 13 00:06:50 GMT 2012 687 687 36411 589 589 23560 ID: 1305632 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1305640 - Posted: 13 Nov 2012, 0:29:02 UTC - in response to Message 1305604. You do seem to have a metric buttload of GPU tasks even though your CPU pile finally dropped bellow 100 on two kitties. I think the all tasks web page still needs a filter for CPU vs GPU tasks, or at least a count. Those were all part of my caches before the problems started. I had about 75,000 WUs in play before the trashing began. It's down to 32,000 now. 9 rigs, remember. 20 GPUs. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1305640 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 1305645 - Posted: 13 Nov 2012, 0:56:52 UTC - in response to Message 1305640. I was just talking about one of the rigs that recently got CPU units. You still had around 1500 GPU units for the 3 GPUs. @500 seconds per GPU unit that's nearly 3 days worth left. Even if you get down to 100 per GPU that's still a half a day's worth. What did you normally run your queue as? 10 days. It doesn't make a difference in bandwidth usage in the long run once the whole seti@home ecosystem hits steady state, it'll just mean that when a super cruncher's nVidia card goes off the rails they can only shaft at most 100 wingman per GPU as oppose to thousands. (Please check your, not directed at you msattler just nVidia users in general, results daily to catch when you system starts to produce mostly inconclusive/error/invalid GPU results.) "Life is just nature's way of keeping meat fresh." - The Doctor ID: 1305645 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1305773 - Posted: 13 Nov 2012, 12:12:47 UTC - in response to Message 1305645. I was just talking about one of the rigs that recently got CPU units. You still had around 1500 GPU units for the 3 GPUs. @500 seconds per GPU unit that's nearly 3 days worth left. Even if you get down to 100 per GPU that's still a half a day's worth. What did you normally run your queue as? 10 days. It doesn't make a difference in bandwidth usage in the long run once the whole seti@home ecosystem hits steady state, it'll just mean that when a super cruncher's nVidia card goes off the rails they can only shaft at most 100 wingman per GPU as oppose to thousands. (Please check your, not directed at you msattler just nVidia users in general, results daily to catch when you system starts to produce mostly inconclusive/error/invalid GPU results.) Each 690 crunch a WU in less than 7 min runing 3 WU at time on each GPU (it have 2) about 48 per hour or more, so in a big cruncher (3x690) a 100 WU cache is simpy ridiculous, not last for 1 hour. I have 2x690 sleeping on a bed waiting they rissing the limits, with the actual limits is a waste of time/resources put them to work, simply they will not receive the WU they need to work. ID: 1305773 ·

WezH Volunteer tester Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95	Message 1305804 - Posted: 13 Nov 2012, 19:55:14 UTC Yay! Back from normal Tuesday time-out. (btw, people in lab are really morning people...) Let's see what comes next... Cricket on top now, AP splitting disabled... Let's see and hope for better... "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - ID: 1305804 ·

Fred E. Volunteer tester Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0	Message 1305811 - Posted: 13 Nov 2012, 20:09:03 UTC UTC Yay! Back from normal Tuesday time-out. (btw, people in lab are really morning people...) Let's see what comes next... Cricket on top now, AP splitting disabled... Let's see and hope for better... Yes, they took it down just before 6am California time. Doesn't look any better to me. Had timeouts on work requests, so I went NNT and some of those timed out, but I finally reported my completions. First work request generated some new ghosts and I haven't got them yet. Down to 1.5 hrs of gpu work. Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. ID: 1305811 ·

WezH Volunteer tester Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95	Message 1305822 - Posted: 13 Nov 2012, 20:30:01 UTC - in response to Message 1305811. Yes, they took it down just before 6am California time. Doesn't look any better to me. Had timeouts on work requests, so I went NNT and some of those timed out, but I finally reported my completions. First work request generated some new ghosts and I haven't got them yet. Down to 1.5 hrs of gpu work. My computers logs says that they too it down before 5:25, at that time came first "project in maintenance" -message. I don't like to wake up that early... Well, timeouts is normal after maintenance period, IMHO. Let's wait about 12 hours and hope for better. ID: 1305822 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.