Panic Mode On (79) Server Problems?

Author	Message
kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1308098 - Posted: 20 Nov 2012, 14:02:45 UTC Hopefully, it's ghost task reissues. Which should be returned a bit more quickly by hungry hosts, helping to clean up the database. Results in the field and results awaiting validation have both been dropping. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1308098 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1308106 - Posted: 20 Nov 2012, 14:40:14 UTC - in response to Message 1308023. On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. I know there are cache/queue changes along the way through the build history, but each WU has a 25-day deadline, so wouldn't it still make sense to run the soonest deadlines first (which also happen to be the ones that were acquired first)? I mean, it works out in the end I'm sure, but it's just weird. Not true. FIFO does not necessarliy equate to EDF (earliest deadline first). A task's deadline is determined by how long it is estimated to take to run. So it is possible to download a bunch of tasks yesterday estimated to take 20 hours to run and have deadlines in late January, and then a bunch today estimated to take 1 hour and have deadlines in mid-December. And don't forget, if you're running more than one project, Boinc has to balance all of them, and different projects do their time estimates and deadlines differently. However, if all of the tasks you're looking at have the same time estimate, then I agree, it is weird for them not to run FIFO, which presumably is also EDF. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1308106 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1308127 - Posted: 20 Nov 2012, 16:02:34 UTC - in response to Message 1308106. On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. I know there are cache/queue changes along the way through the build history, but each WU has a 25-day deadline, so wouldn't it still make sense to run the soonest deadlines first (which also happen to be the ones that were acquired first)? I mean, it works out in the end I'm sure, but it's just weird. Not true. FIFO does not necessarliy equate to EDF (earliest deadline first). A task's deadline is determined by how long it is estimated to take to run. So it is possible to download a bunch of tasks yesterday estimated to take 20 hours to run and have deadlines in late January, and then a bunch today estimated to take 1 hour and have deadlines in mid-December. And don't forget, if you're running more than one project, Boinc has to balance all of them, and different projects do their time estimates and deadlines differently. However, if all of the tasks you're looking at have the same time estimate, then I agree, it is weird for them not to run FIFO, which presumably is also EDF. If tasks from the same download batch don't appear to run in FIFO (and be very careful to observe that you haven't applied a sort order to one of the columns in BOINC Manager, before you jump to that conclusion), then it's a long-standing bug which applies some slight randomisation to the display order when data is transferred from the server to the BOINC Client to the BOINC Manager. In short, it's cosmetic only. BOINC v6.10.58 is still a very old version. We applied a lot of pressure to get that bug (and many others) fixed - I forget just when. The latest ones - I'm running v7.0.38 - have had display order and running order in perfect step for a long time - possibly even since sometime in the v6.12.xx range - but I wouldn't advise upgrading just for this. Like I said, it's cosmetic only. ID: 1308127 ·

fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0	Message 1308180 - Posted: 20 Nov 2012, 23:19:34 UTC Welp... Looks like the cricket graph just bottomed out. :( ID: 1308180 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1308181 - Posted: 20 Nov 2012, 23:21:07 UTC - in response to Message 1308180. Last modified: 20 Nov 2012, 23:22:02 UTC Welp... Looks like the cricket graph just bottomed out. :( Just waiting for the next server status update... [Edit] Which is almost totally green... ID: 1308181 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1308237 - Posted: 21 Nov 2012, 3:25:40 UTC - in response to Message 1308127. On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. I know there are cache/queue changes along the way through the build history, but each WU has a 25-day deadline, so wouldn't it still make sense to run the soonest deadlines first (which also happen to be the ones that were acquired first)? I mean, it works out in the end I'm sure, but it's just weird. Not true. FIFO does not necessarliy equate to EDF (earliest deadline first). A task's deadline is determined by how long it is estimated to take to run. So it is possible to download a bunch of tasks yesterday estimated to take 20 hours to run and have deadlines in late January, and then a bunch today estimated to take 1 hour and have deadlines in mid-December. And don't forget, if you're running more than one project, Boinc has to balance all of them, and different projects do their time estimates and deadlines differently. However, if all of the tasks you're looking at have the same time estimate, then I agree, it is weird for them not to run FIFO, which presumably is also EDF. If tasks from the same download batch don't appear to run in FIFO (and be very careful to observe that you haven't applied a sort order to one of the columns in BOINC Manager, before you jump to that conclusion), then it's a long-standing bug which applies some slight randomisation to the display order when data is transferred from the server to the BOINC Client to the BOINC Manager. In short, it's cosmetic only. BOINC v6.10.58 is still a very old version. We applied a lot of pressure to get that bug (and many others) fixed - I forget just when. The latest ones - I'm running v7.0.38 - have had display order and running order in perfect step for a long time - possibly even since sometime in the v6.12.xx range - but I wouldn't advise upgrading just for this. Like I said, it's cosmetic only. Thank you for the very informative insight to my observation. Since the tasks are all APs, they all have a 25-day deadline from when they were issued. In 6.2.19, they would crunch in FIFO, unless for some crazy reason high priority mode kicked in. I switched to 6.10.58 a few days ago and for example, I have a pile of APs that are due Dec 3, but ones for Dec 6 were running in high priority instead. High priority has since ended, and the ones due Dec 3 and 4 still haven't been touched, but 6-8 are being crunched pretty much in order. I do notice that the sort order in Manager operates a little differently than in the older version, but I figure it will sort itself out eventually. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1308237 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 1308241 - Posted: 21 Nov 2012, 4:38:41 UTC Well I'm back to NNT if I want to get an acknowledge from the server of the tasks being reported. Allowing tasks greets me with 11/20/2012 11:21:44 PM \| SETI@home \| work fetch resumed by user 11/20/2012 11:25:39 PM \| SETI@home \| Sending scheduler request: To fetch work. 11/20/2012 11:25:39 PM \| SETI@home \| Requesting new tasks for CPU and ATI 11/20/2012 11:25:47 PM \| \| Project communication failed: attempting access to reference site 11/20/2012 11:25:47 PM \| SETI@home \| Scheduler request failed: Failure when receiving data from the peer 11/20/2012 11:25:49 PM \| \| Internet access OK - project servers may be temporarily down. 11/20/2012 11:27:13 PM \| SETI@home \| Sending scheduler request: To fetch work. 11/20/2012 11:27:13 PM \| SETI@home \| Requesting new tasks for CPU and ATI 11/20/2012 11:32:26 PM \| \| Project communication failed: attempting access to reference site 11/20/2012 11:32:26 PM \| SETI@home \| Scheduler request failed: Timeout was reached 11/20/2012 11:32:28 PM \| \| Internet access OK - project servers may be temporarily down. I currently have 15 ghosts, all GPU units, and I'm now down to 104 units "In Progress" counting those ghosts. I'm probably down to under 2 days worth of units for either the CPU or GPU and the only reason I haven't run out of GPU units, other that it's a weak GPU, is I routinely suspend GPU crunching to play games or watch movies. Well Turkey day is coming up, I guess even a computer could use the break. "Life is just nature's way of keeping meat fresh." - The Doctor ID: 1308241 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1308266 - Posted: 21 Nov 2012, 7:30:30 UTC - in response to Message 1308241. Last modified: 21 Nov 2012, 7:40:48 UTC Well, since the outage i've picked up some work. I'm also getting new errors when trying to contact the Scheduler. Still getting the timeouts, but to add to that i'm now getting "Server returned nothing (no headers, no data)" & "Failure when receiving data from the peer". As before, even with NNT set, it appears to depend on the wind direction & how you hold your tongue while clicking repatedly on the retry button as to whether or not you will get a response from the Scheduler. Grant Darwin NT ID: 1308266 ·

fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0	Message 1308321 - Posted: 21 Nov 2012, 12:20:03 UTC As expected, with the AP splitters not running this morning I am able to connect and and the tasks are flowing quite well. Frank ID: 1308321 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1308364 - Posted: 21 Nov 2012, 14:51:04 UTC My i7 is finally up to its full 200 WU limit. I suppose this means it finished enough Einstein GPU work to ask Seti for some, and the Seti servers were actually able to deliver it. I also see that the five APs I got yesterday are done already, four valid and one pending. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1308364 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1308417 - Posted: 21 Nov 2012, 18:11:11 UTC - in response to Message 1308266. Last modified: 21 Nov 2012, 18:24:41 UTC Still getting the timeouts, but to add to that i'm now getting "Server returned nothing (no headers, no data)" & "Failure when receiving data from the peer". And "Couldn't connect to server" pops up occasionally as well. Well, it was occasionally. 20min of clicking on update with NNT set & that's the only response i've got. EDIT- just had a look at the server status page & it appears the Scheduler has been disabled. Grant Darwin NT ID: 1308417 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1308437 - Posted: 21 Nov 2012, 18:32:48 UTC Well, something's afoot in da lab... Scheduling server is disabled and the crickets take a dive. Hmmmmmmmmmmm. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1308437 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1308445 - Posted: 21 Nov 2012, 18:54:20 UTC Message from Eric: We've got some things to try. Let us know if it starts working. I find these two helpful: findstr /C:"[SETI@home] Scheduler request failed: " stdoutdae.txt >sched_failures-%computername%.txt findstr /C:"[SETI@home] Scheduler request completed: " stdoutdae.txt >sched_successes-%computername%.txt They work in the "command prompt" environment in Windows. Save them (separately or together) in one or two files in BOINC's Data directory: give the files names with the extension ".cmd" Then, double-clicking the file(s) will quickly give you an overview of how well the scheduler requests have been going. Don't swamp Eric with data, but if a few of us (those who feel confident working with that minimalist instruction - don't bother if you're not comfortable doing that) keep an eye on his experiments and provide feedback, it may help. Remember your logs will be timestamped in your local timezone - please supply the UTC offset so he can match them up with the server changes. ID: 1308445 ·

mikeej42 Send message Joined: 26 Oct 00 Posts: 109 Credit: 791,875,385 RAC: 9	Message 1308495 - Posted: 21 Nov 2012, 20:23:15 UTC - in response to Message 1308445. After the change(s) this afternoon, I had several nodes that had empty caches but could not get a successful scheduler update. I was able to get them to start downloading some tasks by decreasing the minimum work buffer to 0.25 days. Now they are slowly getting some resent tasks. ID: 1308495 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1308497 - Posted: 21 Nov 2012, 20:33:27 UTC Things seems to have started again, and this time we're talking to Synergy over the Campus data network (128.32.18.157) - anybody using a manually configured hosts file please note. We're still using setiboinc.ssl.berkeley.edu, so the proxies should pick up the change automatically. So far, the only difference that I've noticed (apart from the fact that it works...) is a re-allocation and download of some of the little graphics files used in Simple View. ID: 1308497 ·

mikeej42 Send message Joined: 26 Oct 00 Posts: 109 Credit: 791,875,385 RAC: 9	Message 1308504 - Posted: 21 Nov 2012, 20:53:39 UTC - in response to Message 1308497. After I flushed the DNS caches on all my nodes I was able to go back to multi-day work buffers and got scheduler updates to complete. ID: 1308504 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1308505 - Posted: 21 Nov 2012, 20:59:20 UTC - in response to Message 1308504. After I flushed the DNS caches on all my nodes I was able to go back to multi-day work buffers and got scheduler updates to complete. Ah. So that's why I can't download the modest few I've been allocated... ID: 1308505 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1308506 - Posted: 21 Nov 2012, 21:01:47 UTC - in response to Message 1308445. Message from Eric: We've got some things to try. Let us know if it starts working. I find these two helpful: findstr /C:"[SETI@home] Scheduler request failed: " stdoutdae.txt >sched_failures-%computername%.txt findstr /C:"[SETI@home] Scheduler request completed: " stdoutdae.txt >sched_successes-%computername%.txt They work in the "command prompt" environment in Windows. Save them (separately or together) in one or two files in BOINC's Data directory: give the files names with the extension ".cmd" Then, double-clicking the file(s) will quickly give you an overview of how well the scheduler requests have been going. Don't swamp Eric with data, but if a few of us (those who feel confident working with that minimalist instruction - don't bother if you're not comfortable doing that) keep an eye on his experiments and provide feedback, it may help. Remember your logs will be timestamped in your local timezone - please supply the UTC offset so he can match them up with the server changes. I had not thought to check the logs that way. Quite a good idea. I took it a bit further and did a 3rd to just check for "[SETI@home] Scheduler request failed: Timeout was reached" to separate other failures. Then I have the bat count the lines and give me the % failure for total and timeout. So far checking several machines that have data going back to the 5th. The failure rate is between 14% & 19% for all failures. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1308506 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1308508 - Posted: 21 Nov 2012, 21:21:37 UTC - in response to Message 1308497. Last modified: 21 Nov 2012, 21:29:02 UTC Things seems to have started again, and this time we're talking to Synergy over the Campus data network (128.32.18.157) - anybody using a manually configured hosts file please note. We're still using setiboinc.ssl.berkeley.edu, so the proxies should pick up the change automatically. So far, the only difference that I've noticed (apart from the fact that it works...) is a re-allocation and download of some of the little graphics files used in Simple View. No real chanches on my side, DL very slow without proxy (<0.5kbps), with proxy a little better (still <5 kbps). The same host/conection give >1MBps for DL an Einstein WU, so the slow is not because my internet conection. Scheduler works very slow to, and UL fast but takes a lot of time to clear from the screen (donÂ´t know the nave you give to the task after tue UL is completes 100%). Bad, but at least data is flow with little or no error (i realy not see any scheduler error yet)... but takes more time to DL than crunch... so at this rates the caches will never fill on the fastest hosts even with 100WU. But AP-splitters still off line... ID: 1308508 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1308511 - Posted: 21 Nov 2012, 21:31:21 UTC - in response to Message 1308497. Last modified: 21 Nov 2012, 21:41:10 UTC Things seems to have started again, and this time we're talking to Synergy over the Campus data network (128.32.18.157) - anybody using a manually configured hosts file please note. We're still using setiboinc.ssl.berkeley.edu, so the proxies should pick up the change automatically. So far, the only difference that I've noticed (apart from the fact that it works...) is a re-allocation and download of some of the little graphics files used in Simple View. Brilliant, scheduler contacts now just work, even if i get no work (at the Main project) at least we're got a workaround for the scheduler timeouts, For Example: 21/11/2012 21:37:27 SETI@home Beta Test [sched_op_debug] Starting scheduler request 21/11/2012 21:37:27 SETI@home Beta Test Sending scheduler request: To fetch work. 21/11/2012 21:37:27 SETI@home Beta Test Requesting new tasks for CPU and GPU 21/11/2012 21:37:27 SETI@home Beta Test [sched_op_debug] CPU work request: 82381.29 seconds; 0.00 CPUs 21/11/2012 21:37:27 SETI@home Beta Test [sched_op_debug] NVIDIA GPU work request: 0.00 seconds; 0.00 GPUs 21/11/2012 21:37:27 SETI@home Beta Test [sched_op_debug] ATI GPU work request: 20239.74 seconds; 0.00 GPUs 21/11/2012 21:37:33 SETI@home Beta Test Scheduler request completed: got 20 new tasks 21/11/2012 21:37:33 SETI@home Beta Test [sched_op_debug] Server version 701 21/11/2012 21:37:33 SETI@home Beta Test Project requested delay of 7 seconds 21/11/2012 21:37:33 SETI@home Beta Test [sched_op_debug] estimated total CPU job duration: 76666 seconds 21/11/2012 21:37:33 SETI@home Beta Test [sched_op_debug] estimated total NVIDIA GPU job duration: 0 seconds 21/11/2012 21:37:33 SETI@home Beta Test [sched_op_debug] estimated total ATI GPU job duration: 19036 seconds 21/11/2012 21:37:33 SETI@home Beta Test [sched_op_debug] Deferring communication for 7 sec 21/11/2012 21:37:33 SETI@home Beta Test [sched_op_debug] Reason: requested by project Claggy ID: 1308511 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.