Panic Mode On (78) Server Problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next
| Author | Message |
|---|---|
|
Just got two lots of 20!! | |
| ID: 1302308 · | |
|
I'm starting to get the hang of this. If your cache is a long way below normal, it helps to reduce your cache size settings - that way you're not asking for so much in one go. | |
| ID: 1302310 · | |
I'm starting to get the hang of this. If your cache is a long way below normal, it helps to reduce your cache size settings - that way you're not asking for so much in one go. So very truth, (in both cases), a smaller, f.i. 3 (or less) and an additionel 2 (or 1) days does work better, also has a shorter turn around time. Less work to report in one go and less work needed per day, if we * all* ask for 10 + 10 days, we're shure in for SERVER trouble... :-\ In Holland we have a saying : a donkey doesn't hit the same stone twice. | |
| ID: 1302324 · | |
I've just had a note back from Eric: Richard, The LAST thing I want to do is get into some sort of trouble, but I read this several hours ago and it's been bugging me ever since. Does Eric know you couldn't report 6 tasks any better than you could report 6,000? I'm not talking about "limiting" the reporting to 6 at a time. I'm saying that if all I had was 6 tasks, I couldn't report them. If there's some really esoteric reason limiting a machine to 20 work units means that another machine would be able to report 6, I can't fathom it. I can't even make-up a story that sounds plausible. Nor do I understand why using a proxy would eliminate the problem with reporting. I can't invent a reason that this would be better or worse restricting work units in progress. I already KNOW I don't know what I'm talking about, but it would make me feel better if someone would explain in layman's terms how Eric's fix might fix a problem that can be overcome by using a proxy. Methinks Eric "knows" what the problem is; but he really doesn't. | |
| ID: 1302394 · | |
|
Well, the kitties won't be happy having their caches limited, but I guess if that's what it takes to right the ship........ | |
| ID: 1302396 · | |
Well, the kitties won't be happy having their caches limited, but I guess if that's what it takes to right the ship........ You know what? I've just edited this message away. It doesn't matter. The obvious doesn't matter, the occult doesn't matter, it just doesn't matter. | |
| ID: 1302401 · | |
|
I didn't say I agreed with it, or understand the logic behind it. | |
| ID: 1302403 · | |
|
Well all's right with the world now. Ghosts have been downloaded, scheduler requests are working. Okay there aren't any new units being made right now but the odd updating behavior and ghost generation is fixed at least for now. Just waiting for the cricket graph to drop off as the download backlog is cleared up. | |
| ID: 1302404 · | |
|
I doubt that the number of tasks awaiting validation is an issue -there is plenty of disk space, and the server doing the validation is well up to it. | |
| ID: 1302411 · | |
|
On a side note - suddenly I have around 2900 WUs with timed out errors... | |
| ID: 1302418 · | |
On a side note - suddenly I have around 2900 WUs with timed out errors... Not yet, but it will happen when VALRs get re-issued to the CUDA device instead of the CPU. ____________ Grant Darwin NT. | |
| ID: 1302421 · | |
I've just had a note back from Eric: So does that mean it will take 10 minutes for it to timeout now? I figure if it's not going to respond within 5 minutes it's as good a time as any for it to timeout. Usually when it did respond when the timeouts were at their worst it was within a couple of minutes; when things are going well most responses are within 20 seconds or so. Do we know why the Scheduler is having such a hard time keeping up with the load- more RAM required, faster disk subsystem? New system? ____________ Grant Darwin NT. | |
| ID: 1302424 · | |
Not yet, but it will happen when VALRs get re-issued to the CUDA device instead of the CPU. Hmm, but they were all initially sent as CUDAs to me as I don't ask for CPU-work. ____________ | |
| ID: 1302425 · | |
Not yet, but it will happen when VALRs get re-issued to the CUDA device instead of the CPU. Might be worth posting a link to a few of them so those that know about these things can have a look. ____________ Grant Darwin NT. | |
| ID: 1302426 · | |
|
so much for trying to heat the greenhouse with seti power tonight..hope my plants don't freeze .. | |
| ID: 1302428 · | |
Might be worth posting a link to a few of them so those that know about these things can have a look. http://setiathome.berkeley.edu/results.php?hostid=6167352&offset=0&show_names=0&state=6&appid= Pick one ;-) Hope it helps... ____________ | |
| ID: 1302432 · | |
|
So, how are we all doing this fine morning? | |
| ID: 1302467 · | |
Mark Sattler posted an interesting theory yesterday. He wondered whether asking Synergy to run the Scheduler, several MB splitters, and several AP splitters all at the same time might have been too much, and caused the inital slowdown we saw after maintenance last week. Sounds plausible to me. Richard Congrats, now you are in the right path, i was talking about that months ago. The problem always returns when the AP splitters starts. Maybe a clue, put less AP splitters to work for a while and see what happens, we all could be surprise on results. Another clue, during the last problem, i was able to DL (>150kpbps)/UL and report all with the help of a proxie with no problem (without a proxie DL(<1kpbs) and UL Ok report NO), thats interesting because thats point not for a bandwidth problem (the proxie uses the same bandwith). Talk about that with the others on the lab this could show another path to follow to. Have a good week. ____________ | |
| ID: 1302468 · | |
|
Success!!! | |
| ID: 1302476 · | |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
| Copyright © 2013 University of California |