Panic Mode On (79) Server Problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (79) Server Problems?
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 23 · Next
| Author | Message |
|---|---|
Yes, it could be ghosts being resent - some of my machines had developed a new crop of hauntings overnight. But it looks like it's beginning to decay now - this might be a good time to try manual updates, and help flush the remaining gremlins out of the system. Maybe because the daylight, ghost don't like daylight. ____________ | |
| ID: 1308076 · | |
Yes, it could be ghosts being resent - some of my machines had developed a new crop of hauntings overnight. But it looks like it's beginning to decay now - this might be a good time to try manual updates, and help flush the remaining gremlins out of the system. Well as long as it's just Pocha Hauntis... ____________ BSG Anthem My Facebook page | |
| ID: 1308096 · | |
|
Hopefully, it's ghost task reissues. Which should be returned a bit more quickly by hungry hosts, helping to clean up the database. | |
| ID: 1308098 · | |
On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. Not true. FIFO does not necessarliy equate to EDF (earliest deadline first). A task's deadline is determined by how long it is estimated to take to run. So it is possible to download a bunch of tasks yesterday estimated to take 20 hours to run and have deadlines in late January, and then a bunch today estimated to take 1 hour and have deadlines in mid-December. And don't forget, if you're running more than one project, Boinc has to balance all of them, and different projects do their time estimates and deadlines differently. However, if all of the tasks you're looking at have the same time estimate, then I agree, it is weird for them not to run FIFO, which presumably is also EDF. ____________ David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. | |
| ID: 1308106 · | |
On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. If tasks from the same download batch don't appear to run in FIFO (and be very careful to observe that you haven't applied a sort order to one of the columns in BOINC Manager, before you jump to that conclusion), then it's a long-standing bug which applies some slight randomisation to the display order when data is transferred from the server to the BOINC Client to the BOINC Manager. In short, it's cosmetic only. BOINC v6.10.58 is still a very old version. We applied a lot of pressure to get that bug (and many others) fixed - I forget just when. The latest ones - I'm running v7.0.38 - have had display order and running order in perfect step for a long time - possibly even since sometime in the v6.12.xx range - but I wouldn't advise upgrading just for this. Like I said, it's cosmetic only. | |
| ID: 1308127 · | |
|
Welp... Looks like the cricket graph just bottomed out. :( | |
| ID: 1308180 · | |
Welp... Looks like the cricket graph just bottomed out. :( Just waiting for the next server status update... [Edit] Which is almost totally green... ____________ | |
| ID: 1308181 · | |
On a side-note, Since switching from 6.2.19 to 6.10.58, I have noticed that my cache is not being processed in FIFO. It is all APs..about 17 days worth, and APs with a deadline four days sooner than the ones that keep getting picked to run next are still sitting there not getting started. Thank you for the very informative insight to my observation. Since the tasks are all APs, they all have a 25-day deadline from when they were issued. In 6.2.19, they would crunch in FIFO, unless for some crazy reason high priority mode kicked in. I switched to 6.10.58 a few days ago and for example, I have a pile of APs that are due Dec 3, but ones for Dec 6 were running in high priority instead. High priority has since ended, and the ones due Dec 3 and 4 still haven't been touched, but 6-8 are being crunched pretty much in order. I do notice that the sort order in Manager operates a little differently than in the older version, but I figure it will sort itself out eventually. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 1308237 · | |
|
Well I'm back to NNT if I want to get an acknowledge from the server of the tasks being reported. Allowing tasks greets me with | |
| ID: 1308241 · | |
|
Well, since the outage i've picked up some work. I'm also getting new errors when trying to contact the Scheduler. | |
| ID: 1308266 · | |
|
As expected, with the AP splitters not running this morning I am able to connect and and the tasks are flowing quite well. | |
| ID: 1308321 · | |
|
My i7 is finally up to its full 200 WU limit. I suppose this means it finished enough Einstein GPU work to ask Seti for some, and the Seti servers were actually able to deliver it. | |
| ID: 1308364 · | |
Still getting the timeouts, but to add to that i'm now getting "Server returned nothing (no headers, no data)" & "Failure when receiving data from the peer". And "Couldn't connect to server" pops up occasionally as well. Well, it was occasionally. 20min of clicking on update with NNT set & that's the only response i've got. EDIT- just had a look at the server status page & it appears the Scheduler has been disabled. ____________ Grant Darwin NT. | |
| ID: 1308417 · | |
|
Well, something's afoot in da lab... | |
| ID: 1308437 · | |
|
Message from Eric: We've got some things to try. Let us know if it starts working. I find these two helpful: findstr /C:"[SETI@home] Scheduler request failed: " stdoutdae.txt >sched_failures-%computername%.txt findstr /C:"[SETI@home] Scheduler request completed: " stdoutdae.txt >sched_successes-%computername%.txt They work in the "command prompt" environment in Windows. Save them (separately or together) in one or two files in BOINC's Data directory: give the files names with the extension ".cmd" Then, double-clicking the file(s) will quickly give you an overview of how well the scheduler requests have been going. Don't swamp Eric with data, but if a few of us (those who feel confident working with that minimalist instruction - don't bother if you're not comfortable doing that) keep an eye on his experiments and provide feedback, it may help. Remember your logs will be timestamped in your local timezone - please supply the UTC offset so he can match them up with the server changes. | |
| ID: 1308445 · | |
|
After the change(s) this afternoon, I had several nodes that had empty caches but could not get a successful scheduler update. I was able to get them to start downloading some tasks by decreasing the minimum work buffer to 0.25 days. Now they are slowly getting some resent tasks. | |
| ID: 1308495 · | |
|
Things seems to have started again, and this time we're talking to Synergy over the Campus data network (128.32.18.157) - anybody using a manually configured hosts file please note. We're still using setiboinc.ssl.berkeley.edu, so the proxies should pick up the change automatically. | |
| ID: 1308497 · | |
|
After I flushed the DNS caches on all my nodes I was able to go back to multi-day work buffers and got scheduler updates to complete. | |
| ID: 1308504 · | |
After I flushed the DNS caches on all my nodes I was able to go back to multi-day work buffers and got scheduler updates to complete. Ah. So that's why I can't download the modest few I've been allocated... | |
| ID: 1308505 · | |
Message from Eric: I had not thought to check the logs that way. Quite a good idea. I took it a bit further and did a 3rd to just check for "[SETI@home] Scheduler request failed: Timeout was reached" to separate other failures. Then I have the bat count the lines and give me the % failure for total and timeout. So far checking several machines that have data going back to the 5th. The failure rate is between 14% & 19% for all failures. ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 1308506 · | |
Message boards : Number crunching : Panic Mode On (79) Server Problems?
| Copyright © 2013 University of California |