Panic Mode On (84) Server Problems?

Message boards : Number crunching : Panic Mode On (84) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 21 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13414
Credit: 208,696,464
RAC: 304
Australia
Message 1382778 - Posted: 19 Jun 2013, 17:55:18 UTC - in response to Message 1382777.  


Looks like they're out of data to split, although even when there was data to split & all the splitters were running, not much was being produced.
Should be out of GPU work in an hour or 2.

Help may be on the way...
I just checked the Cricket graph again and it looks like some upload spikes have started. Should mean more datasets on the way to the servers.

Hopefully once they've got some Data to split, Eric will be albe to sort out why the splitters are so slow at actually splitting it.
Grant
Darwin NT
ID: 1382778 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1382779 - Posted: 19 Jun 2013, 17:59:36 UTC - in response to Message 1382778.  


Looks like they're out of data to split, although even when there was data to split & all the splitters were running, not much was being produced.
Should be out of GPU work in an hour or 2.

Help may be on the way...
I just checked the Cricket graph again and it looks like some upload spikes have started. Should mean more datasets on the way to the servers.

Hopefully once they've got some Data to split, Eric will be albe to sort out why the splitters are so slow at actually splitting it.

Yup....first things first, I guess.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1382779 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13414
Credit: 208,696,464
RAC: 304
Australia
Message 1382783 - Posted: 19 Jun 2013, 18:17:33 UTC - in response to Message 1382779.  


Prior to this weeks outage their splitting rate was as good as it's been since the PFB splitters came online. Not great, but at least enough to keep the ready-to-send buffer full.
Since the outage they've struggled to produce 10/s (20/s appears to be the minimum to meet demand these days).
Grant
Darwin NT
ID: 1382783 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1382784 - Posted: 19 Jun 2013, 18:18:32 UTC

Yep, one of my (work time only)-crunchers only have some GPU-work left, ran totally out of cpu-WUs.

Would be a nice gesture from the staff to post just a short notice of whats going on...
Aloha, Uli

ID: 1382784 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14542
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1382800 - Posted: 19 Jun 2013, 19:33:48 UTC - in response to Message 1382784.  

Yep, one of my (work time only)-crunchers only have some GPU-work left, ran totally out of cpu-WUs.

Would be a nice gesture from the staff to post just a short notice of whats going on...

Matt's posted quite a long one:

Technical News : Spring Cleaning
ID: 1382800 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 29399
Credit: 261,360,520
RAC: 489
Australia
Message 1382869 - Posted: 19 Jun 2013, 23:14:04 UTC - in response to Message 1382800.  

I wonder if the creation rate could be related to file 20jn12ac as it has been in its current state for way longer than is usual.

Cheers.
ID: 1382869 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13414
Credit: 208,696,464
RAC: 304
Australia
Message 1382979 - Posted: 20 Jun 2013, 10:10:11 UTC - in response to Message 1382869.  
Last modified: 20 Jun 2013, 10:12:15 UTC

A question for those doing AP WUs-
Have there been a lot more errors than usual over the last 6 hours or so?
Since the splitters have been cranking out more work, the number of AP results returned per hour has climbed considerably and is holding at the much higher than normal level.
Other than the occasional sharp spike, the number received per hour is generally around 4,000. It's presently over 7,000 & has been for several hours now.


EDIT- the average turnaround time is usually around 60hrs, presently it's down to 24.
Grant
Darwin NT
ID: 1382979 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1382981 - Posted: 20 Jun 2013, 10:16:55 UTC - in response to Message 1382979.  
Last modified: 20 Jun 2013, 10:22:14 UTC

A question for those doing AP WUs-
Have there been a lot more errors than usual over the last 6 hours or so?
Since the splitters have been cranking out more work, the number of AP results returned per hour has climbed considerably and is holding at the much higher than normal level.
Other than the occasional sharp spike, the number received per hour is generally around 4,000. It's presently over 7,000 & has been for several hours now.


EDIT- the average turnaround time is usually around 60hrs, presently it's down to 24.

Linux OpenCL AP apps were released almost 11 hours ago, along with new Windows OpenCL ATI AP apps, and the Windows Hybrid AP app.

Claggy
ID: 1382981 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13414
Credit: 208,696,464
RAC: 304
Australia
Message 1382983 - Posted: 20 Jun 2013, 10:22:42 UTC - in response to Message 1382981.  
Last modified: 20 Jun 2013, 10:23:06 UTC

A question for those doing AP WUs-
Have there been a lot more errors than usual over the last 6 hours or so?
Since the splitters have been cranking out more work, the number of AP results returned per hour has climbed considerably and is holding at the much higher than normal level.
Other than the occasional sharp spike, the number received per hour is generally around 4,000. It's presently over 7,000 & has been for several hours now.


EDIT- the average turnaround time is usually around 60hrs, presently it's down to 24.

Linux OpenCL AP apps were released almost 11 hours ago.


That might do it.
I'm thinking if that is the case then the number returned per hour should settle down at the higher level, but the turn around time should head back up to around what it was previously as their caches fill.
Grant
Darwin NT
ID: 1382983 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 1382996 - Posted: 20 Jun 2013, 11:52:10 UTC
Last modified: 20 Jun 2013, 11:52:24 UTC

Maybe this has something to do with it.
ID: 1382996 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1383004 - Posted: 20 Jun 2013, 12:17:57 UTC - in response to Message 1382996.  

Maybe this has something to do with it.


Yes, the newly released Brook app for older cards is having issues.
One for Raistmer to investigate and debug.

Apps go through beta, but some problems only crop up with main's larger and more varied user base.

At least that one should be fairly easy to diagnose and fix, not like the old DLL problem we've so far failed to solve.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1383004 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1383143 - Posted: 20 Jun 2013, 18:41:39 UTC - in response to Message 1382979.  

A question for those doing AP WUs-
Have there been a lot more errors than usual over the last 6 hours or so?.....

Not here, AP errors on my 2 AP machines are 2 and 1 respectively, AP Inconclusives are also at "normal" levels (8 & 6)

T.A.
ID: 1383143 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1383161 - Posted: 20 Jun 2013, 19:33:43 UTC

To lend some balance to the Multiverse:

Oh Waily Waily Waily! My Rac has gotten huger than
it's ever been "Recent average credit 31,490.38".
The Dread Lord is toying with my sanity!

Smashing Panic Button Now!
ID: 1383161 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1383228 - Posted: 20 Jun 2013, 23:01:49 UTC

I have noticed I'm getting a lot of not _0 or _1 tasks in the past 24 hours.

Such as..

wu1266348384
wu1266363116
wu1266367979
wu1266475608
wu1266597583

Just to name a few. I would say they're all related.. looks to be all ATI errors.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1383228 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13414
Credit: 208,696,464
RAC: 304
Australia
Message 1383280 - Posted: 21 Jun 2013, 6:29:38 UTC - in response to Message 1383228.  
Last modified: 21 Jun 2013, 6:31:21 UTC

Once again, the PFB splitters have fallen over.
Server status page shows green, but they're not producing any work (1.5/s doesn't count) even though there's plenty of data ready to be split. Ready-to-send buffer continues to shrink.
Grant
Darwin NT
ID: 1383280 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 29399
Credit: 261,360,520
RAC: 489
Australia
Message 1383284 - Posted: 21 Jun 2013, 6:46:52 UTC - in response to Message 1382869.  

I wonder if the creation rate could be related to file 20jn12ac as it has been in its current state for way longer than is usual.

I see that this file is still stuck in the splitters. :-(

Cheers.
ID: 1383284 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1383535 - Posted: 22 Jun 2013, 0:49:42 UTC

Server status page frozen as of 1650 UTC; that's about 8 hours now.
Didn't anyone notice?

Waiting for the Marines....
ID: 1383535 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1383536 - Posted: 22 Jun 2013, 0:53:42 UTC - in response to Message 1383535.  

Server status page frozen as of 1650 UTC; that's about 8 hours now.
Didn't anyone notice?

Waiting for the Marines....

That's strange, it says [As of 22 Jun 2013, 0:50:04 UTC] here.

Claggy
ID: 1383536 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1383606 - Posted: 22 Jun 2013, 7:14:27 UTC

Server status updating normally at present.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1383606 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1384249 - Posted: 24 Jun 2013, 16:32:30 UTC - in response to Message 1382869.  

I wonder if the creation rate could be related to file 20jn12ac as it has been in its current state for way longer than is usual.


That file is still in SSP.


"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1384249 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (84) Server Problems?


 
©2023 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.