Message boards :
Number crunching :
Panic Mode On (79) Server Problems?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 22 · Next
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Hopefully, it's ghost task reissues. Which should be returned a bit more quickly by hungry hosts, helping to clean up the database. Results in the field and results awaiting validation have both been dropping. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. Not true. FIFO does not necessarliy equate to EDF (earliest deadline first). A task's deadline is determined by how long it is estimated to take to run. So it is possible to download a bunch of tasks yesterday estimated to take 20 hours to run and have deadlines in late January, and then a bunch today estimated to take 1 hour and have deadlines in mid-December. And don't forget, if you're running more than one project, Boinc has to balance all of them, and different projects do their time estimates and deadlines differently. However, if all of the tasks you're looking at have the same time estimate, then I agree, it is weird for them not to run FIFO, which presumably is also EDF. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. If tasks from the same download batch don't appear to run in FIFO (and be very careful to observe that you haven't applied a sort order to one of the columns in BOINC Manager, before you jump to that conclusion), then it's a long-standing bug which applies some slight randomisation to the display order when data is transferred from the server to the BOINC Client to the BOINC Manager. In short, it's cosmetic only. BOINC v6.10.58 is still a very old version. We applied a lot of pressure to get that bug (and many others) fixed - I forget just when. The latest ones - I'm running v7.0.38 - have had display order and running order in perfect step for a long time - possibly even since sometime in the v6.12.xx range - but I wouldn't advise upgrading just for this. Like I said, it's cosmetic only. |
fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0 |
Welp... Looks like the cricket graph just bottomed out. :( |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
Welp... Looks like the cricket graph just bottomed out. :( Just waiting for the next server status update... [Edit] Which is almost totally green... |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. Thank you for the very informative insight to my observation. Since the tasks are all APs, they all have a 25-day deadline from when they were issued. In 6.2.19, they would crunch in FIFO, unless for some crazy reason high priority mode kicked in. I switched to 6.10.58 a few days ago and for example, I have a pile of APs that are due Dec 3, but ones for Dec 6 were running in high priority instead. High priority has since ended, and the ones due Dec 3 and 4 still haven't been touched, but 6-8 are being crunched pretty much in order. I do notice that the sort order in Manager operates a little differently than in the older version, but I figure it will sort itself out eventually. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
Well I'm back to NNT if I want to get an acknowledge from the server of the tasks being reported. Allowing tasks greets me with 11/20/2012 11:21:44 PM | SETI@home | work fetch resumed by user 11/20/2012 11:25:39 PM | SETI@home | Sending scheduler request: To fetch work. 11/20/2012 11:25:39 PM | SETI@home | Requesting new tasks for CPU and ATI 11/20/2012 11:25:47 PM | | Project communication failed: attempting access to reference site 11/20/2012 11:25:47 PM | SETI@home | Scheduler request failed: Failure when receiving data from the peer 11/20/2012 11:25:49 PM | | Internet access OK - project servers may be temporarily down. 11/20/2012 11:27:13 PM | SETI@home | Sending scheduler request: To fetch work. 11/20/2012 11:27:13 PM | SETI@home | Requesting new tasks for CPU and ATI 11/20/2012 11:32:26 PM | | Project communication failed: attempting access to reference site 11/20/2012 11:32:26 PM | SETI@home | Scheduler request failed: Timeout was reached 11/20/2012 11:32:28 PM | | Internet access OK - project servers may be temporarily down. I currently have 15 ghosts, all GPU units, and I'm now down to 104 units "In Progress" counting those ghosts. I'm probably down to under 2 days worth of units for either the CPU or GPU and the only reason I haven't run out of GPU units, other that it's a weak GPU, is I routinely suspend GPU crunching to play games or watch movies. Well Turkey day is coming up, I guess even a computer could use the break. "Life is just nature's way of keeping meat fresh." - The Doctor |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Well, since the outage i've picked up some work. I'm also getting new errors when trying to contact the Scheduler. Still getting the timeouts, but to add to that i'm now getting "Server returned nothing (no headers, no data)" & "Failure when receiving data from the peer". As before, even with NNT set, it appears to depend on the wind direction & how you hold your tongue while clicking repatedly on the retry button as to whether or not you will get a response from the Scheduler. Grant Darwin NT |
fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0 |
As expected, with the AP splitters not running this morning I am able to connect and and the tasks are flowing quite well. Frank |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
My i7 is finally up to its full 200 WU limit. I suppose this means it finished enough Einstein GPU work to ask Seti for some, and the Seti servers were actually able to deliver it. I also see that the five APs I got yesterday are done already, four valid and one pending. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Still getting the timeouts, but to add to that i'm now getting "Server returned nothing (no headers, no data)" & "Failure when receiving data from the peer". And "Couldn't connect to server" pops up occasionally as well. Well, it was occasionally. 20min of clicking on update with NNT set & that's the only response i've got. EDIT- just had a look at the server status page & it appears the Scheduler has been disabled. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Well, something's afoot in da lab... Scheduling server is disabled and the crickets take a dive. Hmmmmmmmmmmm. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Message from Eric: We've got some things to try. Let us know if it starts working. I find these two helpful: findstr /C:"[SETI@home] Scheduler request failed: " stdoutdae.txt >sched_failures-%computername%.txt findstr /C:"[SETI@home] Scheduler request completed: " stdoutdae.txt >sched_successes-%computername%.txt They work in the "command prompt" environment in Windows. Save them (separately or together) in one or two files in BOINC's Data directory: give the files names with the extension ".cmd" Then, double-clicking the file(s) will quickly give you an overview of how well the scheduler requests have been going. Don't swamp Eric with data, but if a few of us (those who feel confident working with that minimalist instruction - don't bother if you're not comfortable doing that) keep an eye on his experiments and provide feedback, it may help. Remember your logs will be timestamped in your local timezone - please supply the UTC offset so he can match them up with the server changes. |
mikeej42 Send message Joined: 26 Oct 00 Posts: 109 Credit: 791,875,385 RAC: 9 |
After the change(s) this afternoon, I had several nodes that had empty caches but could not get a successful scheduler update. I was able to get them to start downloading some tasks by decreasing the minimum work buffer to 0.25 days. Now they are slowly getting some resent tasks. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Things seems to have started again, and this time we're talking to Synergy over the Campus data network (128.32.18.157) - anybody using a manually configured hosts file please note. We're still using setiboinc.ssl.berkeley.edu, so the proxies should pick up the change automatically. So far, the only difference that I've noticed (apart from the fact that it works...) is a re-allocation and download of some of the little graphics files used in Simple View. |
mikeej42 Send message Joined: 26 Oct 00 Posts: 109 Credit: 791,875,385 RAC: 9 |
After I flushed the DNS caches on all my nodes I was able to go back to multi-day work buffers and got scheduler updates to complete. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
After I flushed the DNS caches on all my nodes I was able to go back to multi-day work buffers and got scheduler updates to complete. Ah. So that's why I can't download the modest few I've been allocated... |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Message from Eric: I had not thought to check the logs that way. Quite a good idea. I took it a bit further and did a 3rd to just check for "[SETI@home] Scheduler request failed: Timeout was reached" to separate other failures. Then I have the bat count the lines and give me the % failure for total and timeout. So far checking several machines that have data going back to the 5th. The failure rate is between 14% & 19% for all failures. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Things seems to have started again, and this time we're talking to Synergy over the Campus data network (128.32.18.157) - anybody using a manually configured hosts file please note. We're still using setiboinc.ssl.berkeley.edu, so the proxies should pick up the change automatically. No real chanches on my side, DL very slow without proxy (<0.5kbps), with proxy a little better (still <5 kbps). The same host/conection give >1MBps for DL an Einstein WU, so the slow is not because my internet conection. Scheduler works very slow to, and UL fast but takes a lot of time to clear from the screen (don´t know the nave you give to the task after tue UL is completes 100%). Bad, but at least data is flow with little or no error (i realy not see any scheduler error yet)... but takes more time to DL than crunch... so at this rates the caches will never fill on the fastest hosts even with 100WU. But AP-splitters still off line... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Things seems to have started again, and this time we're talking to Synergy over the Campus data network (128.32.18.157) - anybody using a manually configured hosts file please note. We're still using setiboinc.ssl.berkeley.edu, so the proxies should pick up the change automatically. Brilliant, scheduler contacts now just work, even if i get no work (at the Main project) at least we're got a workaround for the scheduler timeouts, For Example: 21/11/2012 21:37:27 SETI@home Beta Test [sched_op_debug] Starting scheduler request Claggy |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.