Panic Mode On (78) Server Problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next
| Author | Message |
|---|---|
Success!!! I also cleared my 476 lost tasks overnight and earlier this AM. Tried Richard's suggestion on lowering the cache settings, but it didn't help in my case. Couldn't get scheduler for 5-6 hours after the splitters were disabled. I went back to my normal 5.75 days and eventually got them. There's another issue besides the timeouts. Why did Scheduler keep assigning work when we already had lost tasks? In the past, it has always filled those first. When mine came down, I was still getting the "no tasks available" (empty feeder) message at the end of each batch of 20, suggesting it was still trying to assign new tasks. Think that may need some looking - it was the potential for very large numbers that got me worried. As to the timeouts, I've also been in the "too much load on Synergy" camp, but I'm not so sure now after seeing how long it took me to connect after the splitters were disabled. I don't buy the bandwidth argument as a sole cause, but it certainly contributes, and some packet dumping router may have a role after the load gets heavy. Is there another possibility - database contention or something like that? I freely admit I don't know much about the issue, just that it sometimes caused problems with strange symptoms during my working years. ____________ Another Fred Support SETI@home when you search the Web or shop online with GoodSearch and GoodShop | |
| ID: 1302482 · | |
Might be worth posting a link to a few of them so those that know about these things can have a look. WU 1109239375 is enough to demonstrate that they weren't all VLAR. They were judged infeasible for some other reason. All 2877 were expired between 3:57:47 UTC and 3:57:54 UTC, so the database or other server delays apparently took about 7 seconds to get through that long list of "lost" tasks. Joe | |
| ID: 1302499 · | |
|
ok, now all may machines are empty ...what the heck are you guys doing? | |
| ID: 1302510 · | |
ok, now all may machines are empty ...what the heck are you guys doing? I don't think any of my rigs actually ran out yet. But I haven't checked all 9 of them. If they do, they all have Einstein as a backup project. But hopefully da boyz in da lab will have on their best thinking hats and kicking boots today and will start to get to the root of the problem. Best of luck with it, guys. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1302511 · | |
Mark Sattler posted an interesting theory yesterday. He wondered whether asking Synergy to run the Scheduler, several MB splitters, and several AP splitters all at the same time might have been too much, and caused the inital slowdown we saw after maintenance last week. Sounds plausible to me. Take a look at the database graphs- usual activity these days is around 700-800 queries/s. Untill the splitters were shut down, it didn't drop below 1,000/s with suspstain periods of just below 1,500/s & many peaks over 1,500/s. Even now there are many surges to 1,500/s+, but it's also dropping down to 700/s or less on occasion. ____________ Grant Darwin NT. | |
| ID: 1302527 · | |
|
@Joe | |
| ID: 1302528 · | |
My notional list of "work in progress" has gone up from 1,500 to 2,100 in the last two hours. It has been 27 hours since you said that. The ready to send has been at or near 0 (I assume it only ticks upward because of occasional timeout reassignments) and the splitters off for six hours that I'm aware of, probably a lot longer, and the Crickets are still maxed out! There was a mild downspike yesterday and an even smaller one just now, but there can't possibly still be that many ghost resends going on, can there? It's got me wondering if either something is wrong with the servers or there's an outside DOS attack going on. Or perhaps a web spider slipped through the filters and is trying to catalog every one of those 9 millions results out in the field and 7 million waiting for validation, or something like that. ____________ David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. | |
| ID: 1302589 · | |
|
I was going to remark on the high number of error WUs for my i7 where I had a short time timeout and now my original wingman and my replacement have both had natural timeouts and it's been sent to two more hosts, but now I'm wondering if the first two hosts completed the work and have been unable to upload and report due to the server problems the last few days. (And if that's the case, they'll eventually report late and the WUs will end up stuck and take even longer to disappear off my error list.) | |
| ID: 1302593 · | |
, and the Crickets are still maxed out! Seems to be slowly dropping back, hopefully this is a good sign! ____________ | |
| ID: 1302595 · | |
|
Dave (N9JFE) | |
| ID: 1302597 · | |
Dave (N9JFE) I know what happened to me (and also to my wingman in at least one case). What I found remarkable was that so many of my timeouts have now had full-time (i.e., not impossible) timeouts by the other users, and I wondered if *they* might be caused by the server problems. But that's a minor issue. ____________ David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. | |
| ID: 1302623 · | |
|
Well the Crickets aren't maxed out anymore, and I just made a scheduler contact and it took three seconds to acknowledge five completed APs. That's pretty quick. | |
| ID: 1302645 · | |
|
Here comes fun - they've turned on every splitter, and the cricket graph has fallen through the floor. I'm setting NNT and going to bed - tell me about it in the morning. | |
| ID: 1302665 · | |
|
The largest blind is who don´t want to see... | |
| ID: 1302681 · | |
The largest blind is who don´t want to see... I have one pc that's getting that message also..wonder what it means or what the limit is. | |
| ID: 1302695 · | |
|
I'm back to timeouts when just reporting on NNT. Wonder what they worked on today? | |
| ID: 1302698 · | |
|
Timeouts are back plus now I can't download tons of lost tasks I still have; when scheduler is finally successful, all I'm getting is limit reached message. | |
| ID: 1302717 · | |
@Joe Yes, you're right. Just a slight delay in actually getting tasks to 2 hosts. Joe | |
| ID: 1302719 · | |
Timeouts are back plus now I can't download tons of lost tasks I still have; when scheduler is finally successful, all I'm getting is limit reached message. So you might as well reduce your cache settings to something close to the limits, to reduce the number of failed scheduler requests and ease the strain on the servers..... ____________ Donald Infernal Optimist / Submariner, retired | |
| ID: 1302760 · | |
|
Message from server: This computer has reached a limit on tasks in progress I don't know how it's this time .. (no admin announced it) .. If I remember correct - the last time it was max. 50 WUs/CPU-thread and 400 WUs/GPU in BOINC. So my Intel Core2 Duo E7600 with NVIDIA GeForce GTX260 should get (50 x 2) + 400 = 500 WUs - maybe also this time. * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * ____________ >Das Deutsche Cafe. The German Cafe.< | |
| ID: 1302761 · | |
Message boards : Number crunching : Panic Mode On (78) Server Problems?
| Copyright © 2013 University of California |