Panic Mode On (78) Server Problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 22 · Next
| Author | Message |
|---|---|
|
Fred, | |
| ID: 1303135 · | |
|
Okay, I didn't pick up on the point you made that 100 worked now. I think I'll stay with a lower numver to increase probability of success in this environment. Also miss Matt's expertise and and funding is the key. I was crunching Orbit@Home when it ran out of money, and it's not pretty. | |
| ID: 1303147 · | |
|
Sorry for the late response: Scheduler was modified a few weeks ago to accept a max of 64... It was actually months ago (May), and the figure was increased relatively quickly to accept 256 tasks reported at once. | |
| ID: 1303160 · | |
|
I believe others have been reporting the scheduler problem -- here's what I see. | |
| ID: 1303215 · | |
|
besides the max. number of tasks set anyone else noticed the following or is it just me : | |
| ID: 1303224 · | |
Big problem with reporting and getting work if I have not set no new work. The scheduler goes into suspended animation -- it takes 50 to 10 minutes for the scheduler to time out and release that action. The report back is 'Timeout was reached, servers may be down. They are not, but the scheduler is in trouble as it has been for the past week. There is some general confusion over the timeout message, and I'm not surprised: even the project staff were caught out by this one. In fact, you see the timeout message when the boinc client - your own computer - decides that nothing is coming back from the server, and gives up - it stops listening. The full message includes lines like 04/11/2012 22:13:51 | SETI@home | [sched_op] NVIDIA GPU work request: 43326.80 seconds; 0.00 GPUs The actual timeout value - as in the 'too slow' line above - is 300 seconds or five minutes, but from what I've seen, some other activity (maybe a task finishing and uploading) can fool the communications subsystem into thinking that something is happening after all, and it allows some extra time. The root cause of the problem is, obviously, the server taking too long to work out which tasks to send and assemble them into a suitable reply to your request. But there's no simple switch on the server to say 'extend the time limit': we'll just have to wait for them to uncover the root cause, and fix that instead. | |
| ID: 1303227 · | |
It appears I don't have a cc_config.xml on my system already. I believe it goes in programdata\boinc (I'm running Win 7 64-bit) but it didn't seem to help (yes I shut the BOINC Manager down and restarted it). I'll paste the cc_config.xml below to get an opinion if I built it right. Thanks. I didn't think the cc_config file would do anything in my case considering I'm seeing the same symptoms as last week. That it appears that the uploaded units are being reported and processed OK, it's just that the client isn't getting the reply, or new the new units being assigned. Unless of course I to a schedule request with NNT and then those "ready to report" marked units are wiped from my tasks tab. Also Ghost Detector was failing with the message "Hmm, Server indicates less Work Units 'In Progress' than client_state.xml thinks you have on board ... Aborted". I assume that's due to the syncing error. Once the done units were cleared out with NNT/report it ran fine and surprise, ghosts once again. Edit: Oh, and I'm running around 300-340 pending with 3 CPU and 1 GPU task running when the schedule Nazi said "no more units for you" (Seinfeld joke variant, US TV comedy series for those who aren't familiar, I'm talking about the "This computer has reached a limit on tasks in progress" message). Split was around 90 for the CPU and the rest GPU but something like 95% of the GPU ones were shorties. ____________ "Life is just nature's way of keeping meat fresh." - The Doctor | |
| ID: 1303228 · | |
|
I don't want to have it extend the time limit -- I have other projects going and when the scheduler does this, it preempts other projects reporting for the duration. I'm hoping that the root cause is acknowledged, identified and dealt with -- until then, I'm completing SETI work on hand and getting and processing work from other projects.
| |
| ID: 1303251 · | |
|
It would seem that these server troubles started right when the New York disaster took place. | |
| ID: 1303256 · | |
|
I just had a thought, but I don't know how feasible it is or whether it might help anything. | |
| ID: 1303270 · | |
|
11/7/2012 2:32:31 PM | |
| ID: 1303271 · | |
11/7/2012 2:32:31 PM The behaviour is unchanged - the BOINC client has never requested work when some download is stalled. The trouble is, the old (silent) way generated thread after thread saying 'SETI isn't sending me any work'. Wrong. The computer wasn't requesting work, but the user didn't know it. Now, with the additional messages (there are several of them), you can see at a glance what the reason for the drought is, and decide whether it's one you can (or want to) do something about yourself. | |
| ID: 1303274 · | |
|
I guess I never got the old messages. It just upset me last night before I | |
| ID: 1303277 · | |
I guess I never got the old messages... You couldn't get the old messages, because there haven't been any (as Richard just stated ;-). Gruß, Gundolf | |
| ID: 1303281 · | |
I guess I never got the old messages... Opp's didn't catch that. Sorry, just frustrated and venting. I'm good now, LOL!! | |
| ID: 1303283 · | |
|
Just let me know if that project will ever working again... ATM I'am sick of it | |
| ID: 1303329 · | |
|
:)...This reminds me of a song from HeeHaw. | |
| ID: 1303336 · | |
Got a computational error on one of my CUDA tasks. Easy fix - don't shop on weekends! :=} Don't worry about that one. It's a -12 error due to the application and data. These are usually regarded as no-fault. I often have one or two in my errors list. On your website task page, you can drill down on the task to see the Std_Err report, which includes this: SETI@home error -12 Unknown error cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel Don't do anything special. Most of us run unattended and check it periodically. If you encounter more errors or just have some questions, feel free to open your own thread and ask for help. We usually use this one for comments and observations on how the project is running. Welcome to the project! ____________ Another Fred Support SETI@home when you search the Web or shop online with GoodSearch and GoodShop | |
| ID: 1303507 · | |
Got a computational error on one of my CUDA tasks. If you desire to suspend GPU computing while you are away from your computer you can tell BOINC to do that. There are options form the manager for "snooze" and "snooze GPU". However they are only for 1 hour IIRC. If you wanted to suspend GPU processing while you are out for a few hours there is a command line option. If you wanted to suspend GPU processing for 4 hours you could use. boinccmd --set_gpu_mode never 14400 Then after 4 hours BOINC would resume the previous state you had the GPU processing set. Normally it would be "auto" or "based on preferences". I just leave the GPU to do what it will. Sometimes there will be an error which I have no control over & the GPU trashes several hundred tasks. I just restart the machine when I find there is an issue & make sure everything is OK afterwards. ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 1303519 · | |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
| Copyright © 2013 University of California |