Panic Mode On (78) Server Problems?

Message boards : Number crunching : Panic Mode On (78) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

AuthorMessage
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1302499 - Posted: 5 Nov 2012, 16:42:41 UTC - in response to Message 1302432.  

Might be worth posting a link to a few of them so those that know about these things can have a look.

http://setiathome.berkeley.edu/results.php?hostid=6167352&offset=0&show_names=0&state=6&appid=

Pick one ;-) Hope it helps...

WU 1109239375 is enough to demonstrate that they weren't all VLAR. They were judged infeasible for some other reason.

All 2877 were expired between 3:57:47 UTC and 3:57:54 UTC, so the database or other server delays apparently took about 7 seconds to get through that long list of "lost" tasks.
                                                                   Joe
ID: 1302499 · Report as offensive
Profile Tron

Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,250,468
RAC: 0
United States
Message 1302510 - Posted: 5 Nov 2012, 17:24:04 UTC

ok, now all may machines are empty ...what the heck are you guys doing?

turn the work back on! it's freezing in here :P
ID: 1302510 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1302511 - Posted: 5 Nov 2012, 17:27:24 UTC - in response to Message 1302510.  

ok, now all may machines are empty ...what the heck are you guys doing?

turn the work back on! it's freezing in here :P

I don't think any of my rigs actually ran out yet.
But I haven't checked all 9 of them.
If they do, they all have Einstein as a backup project.

But hopefully da boyz in da lab will have on their best thinking hats and kicking boots today and will start to get to the root of the problem.
Best of luck with it, guys.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1302511 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 1302527 - Posted: 5 Nov 2012, 18:13:45 UTC - in response to Message 1302467.  

Mark Sattler posted an interesting theory yesterday. He wondered whether asking Synergy to run the Scheduler, several MB splitters, and several AP splitters all at the same time might have been too much, and caused the inital slowdown we saw after maintenance last week. Sounds plausible to me.

Take a look at the database graphs- usual activity these days is around 700-800 queries/s. Untill the splitters were shut down, it didn't drop below 1,000/s with suspstain periods of just below 1,500/s & many peaks over 1,500/s.
Even now there are many surges to 1,500/s+, but it's also dropping down to 700/s or less on occasion.
Grant
Darwin NT
ID: 1302527 · Report as offensive
Profile Mad Fritz
Avatar

Send message
Joined: 20 Jul 01
Posts: 87
Credit: 11,334,904
RAC: 0
Switzerland
Message 1302528 - Posted: 5 Nov 2012, 18:13:50 UTC

@Joe
Thanks for looking into it :-)
Am I right that there will not really be any serious harm as the tasks were given to other crunchers in the meantime?

Andy
ID: 1302528 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1302589 - Posted: 5 Nov 2012, 20:17:29 UTC - in response to Message 1302155.  

My notional list of "work in progress" has gone up from 1,500 to 2,100 in the last two hours.

Everything is getting set to NNT and staying there, until I see zero tasks ready to send AND the splitters disabled.

It has been 27 hours since you said that. The ready to send has been at or near 0 (I assume it only ticks upward because of occasional timeout reassignments) and the splitters off for six hours that I'm aware of, probably a lot longer, and the Crickets are still maxed out! There was a mild downspike yesterday and an even smaller one just now, but there can't possibly still be that many ghost resends going on, can there? It's got me wondering if either something is wrong with the servers or there's an outside DOS attack going on. Or perhaps a web spider slipped through the filters and is trying to catalog every one of those 9 millions results out in the field and 7 million waiting for validation, or something like that.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1302589 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1302593 - Posted: 5 Nov 2012, 20:35:23 UTC

I was going to remark on the high number of error WUs for my i7 where I had a short time timeout and now my original wingman and my replacement have both had natural timeouts and it's been sent to two more hosts, but now I'm wondering if the first two hosts completed the work and have been unable to upload and report due to the server problems the last few days. (And if that's the case, they'll eventually report late and the WUs will end up stuck and take even longer to disappear off my error list.)

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1302593 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1302595 - Posted: 5 Nov 2012, 20:53:50 UTC

, and the Crickets are still maxed out!


Seems to be slowly dropping back, hopefully this is a good sign!
ID: 1302595 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1302597 - Posted: 5 Nov 2012, 21:01:01 UTC

Dave (N9JFE)
A look at one of your timed out task shows that it was sent to you with an "impossible" deadline, so you timed out, the same thing happened, at the same time to your wingman. The task was no sent out to two more crunchers, one of whom has reported, and the other is still "in progress". Since the task was delivered to you in September, and faulted on the same day it is pretty safe to say that task is not under the influence of the current server woes.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1302597 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1302623 - Posted: 5 Nov 2012, 21:47:40 UTC - in response to Message 1302597.  

Dave (N9JFE)
A look at one of your timed out task shows that it was sent to you with an "impossible" deadline, so you timed out, the same thing happened, at the same time to your wingman. The task was no sent out to two more crunchers, one of whom has reported, and the other is still "in progress". Since the task was delivered to you in September, and faulted on the same day it is pretty safe to say that task is not under the influence of the current server woes.

I know what happened to me (and also to my wingman in at least one case). What I found remarkable was that so many of my timeouts have now had full-time (i.e., not impossible) timeouts by the other users, and I wondered if *they* might be caused by the server problems.

But that's a minor issue.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1302623 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1302645 - Posted: 5 Nov 2012, 22:45:00 UTC

Well the Crickets aren't maxed out anymore, and I just made a scheduler contact and it took three seconds to acknowledge five completed APs. That's pretty quick.

Regarding having too many AP splitters going.. I've been wondering/asking for a while now if we can just knock it down to one AP splitter. If you load up 10 full tapes and let the splitters go as fast as they can, AP finishes all 10 tapes in usually around the same time as MB takes to get through 2-3. Maybe just slow AP down and limit the hindering effect it has on everything else?
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1302645 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1302665 - Posted: 5 Nov 2012, 23:35:33 UTC

Here comes fun - they've turned on every splitter, and the cricket graph has fallen through the floor. I'm setting NNT and going to bed - tell me about it in the morning.
ID: 1302665 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1302681 - Posted: 6 Nov 2012, 0:03:53 UTC
Last modified: 6 Nov 2012, 0:40:19 UTC

The largest blind is who don´t want to see...

Hope i´m wrong...

(edit) 05/11/2012 22:33:08 SETI@home Message from server: This computer has reached a limit on tasks in progress

A new limit?
ID: 1302681 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1302695 - Posted: 6 Nov 2012, 0:47:40 UTC - in response to Message 1302681.  

The largest blind is who don´t want to see...

Hope i´m wrong...

(edit) 05/11/2012 22:33:08 SETI@home Message from server: This computer has reached a limit on tasks in progress

A new limit?


I have one pc that's getting that message also..wonder what it means or what the limit is.
ID: 1302695 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1302698 - Posted: 6 Nov 2012, 1:08:10 UTC
Last modified: 6 Nov 2012, 1:10:33 UTC

I'm back to timeouts when just reporting on NNT. Wonder what they worked on today?

Edit: The next try was successful, of course. Will try work fetch now, but may go NNT all night.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1302698 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1302717 - Posted: 6 Nov 2012, 2:55:11 UTC
Last modified: 6 Nov 2012, 3:00:00 UTC

Timeouts are back plus now I can't download tons of lost tasks I still have; when scheduler is finally successful, all I'm getting is limit reached message.
Looks like limits are ridiculous like they were before (50tasks/CPU) which means even my slowest hosts are going to make scheduler requests *endlessly*, never able to fill a 5 day cache, only compounding the scheduler overload problem...
ID: 1302717 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1302719 - Posted: 6 Nov 2012, 3:05:49 UTC - in response to Message 1302528.  

@Joe
Thanks for looking into it :-)
Am I right that there will not really be any serious harm as the tasks were given to other crunchers in the meantime?

Andy

Yes, you're right. Just a slight delay in actually getting tasks to 2 hosts.
                                                                  Joe
ID: 1302719 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1302760 - Posted: 6 Nov 2012, 8:02:22 UTC - in response to Message 1302717.  

Timeouts are back plus now I can't download tons of lost tasks I still have; when scheduler is finally successful, all I'm getting is limit reached message.
Looks like limits are ridiculous like they were before (50tasks/CPU) which means even my slowest hosts are going to make scheduler requests *endlessly*, never able to fill a 5 day cache, only compounding the scheduler overload problem...

So you might as well reduce your cache settings to something close to the limits, to reduce the number of failed scheduler requests and ease the strain on the servers.....
Donald
Infernal Optimist / Submariner, retired
ID: 1302760 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1302761 - Posted: 6 Nov 2012, 8:07:11 UTC
Last modified: 6 Nov 2012, 8:12:11 UTC

Message from server: This computer has reached a limit on tasks in progress


I don't know how it's this time .. (no admin announced it) ..

If I remember correct - the last time it was max. 50 WUs/CPU-thread and 400 WUs/GPU in BOINC.

So my Intel Core2 Duo E7600 with NVIDIA GeForce GTX260 should get (50 x 2) + 400 = 500 WUs - maybe also this time.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
ID: 1302761 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1302768 - Posted: 6 Nov 2012, 8:49:06 UTC
Last modified: 6 Nov 2012, 9:26:06 UTC

If I remember correct - the last time it was max. 50 WUs/CPU-thread and 400 WUs/GPU in BOINC.

So my Intel Core2 Duo E7600 with NVIDIA GeForce GTX260 should get (50 x 2) + 400 = 500 WUs - maybe also this time.


Not even sure it is 50 per core. I'm down to 230 CPU tasks for my hexcore, and still get the limit message. That's with a CPU-only request. I'm only using 5 for crunching, but if that value is used I'd still get 250. I'm way over 400 on GPU tasks, and hate to see a limit there. I won't be able to handle much more than a one day outage, especially if there are a lot of shorties. New card is faster than the one I was running the last time we had these limits.

Haven't seen more timeouts like I reported earlier in the thread.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1302768 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.