Panic Mode On (83) Server Problems?

Message boards : Number crunching : Panic Mode On (83) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 20 · Next

AuthorMessage
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1362865 - Posted: 30 Apr 2013, 11:25:32 UTC

Yesterday evening I had a feeling we were going to run into problems overnight. Ya, the MB splitters seem to have been running OK but with the AP splitters not running at full capacity problems were inevitable overnight.

It seems like for a few days, things weren't too bad. But partial functionality on either of the splitters eventually results in problem. Maybe someone forgot to change tapes.

It seems that they need to have 6 splitters of each type to keep up with work unit demand.

ID: 1362865 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1362867 - Posted: 30 Apr 2013, 11:29:45 UTC

I was running low until I hit the right moment, and got 69 new tasks. Apart from one solitary mid-AR resend, every single one was a shorty. They just get snapped up too darn quickly during a shorty storm - we need mid-AR or AP to damp things down a bit.
ID: 1362867 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1362872 - Posted: 30 Apr 2013, 11:51:03 UTC - in response to Message 1362867.  

I was running low until I hit the right moment, and got 69 new tasks. Apart from one solitary mid-AR resend, every single one was a shorty. They just get snapped up too darn quickly during a shorty storm - we need mid-AR or AP to damp things down a bit.

Had a similar result with a lucky fetch. I'm out of AP and have mostly shorties, so my 83 gpu tasks sum to 2 hours of crunch time per BoincTasks. Have set up my zero resource share project for the maintenance window.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1362872 · Report as offensive
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1362877 - Posted: 30 Apr 2013, 12:09:37 UTC - in response to Message 1362867.  

I was running low until I hit the right moment, and got 69 new tasks. Apart from one solitary mid-AR resend, every single one was a shorty. They just get snapped up too darn quickly during a shorty storm - we need mid-AR or AP to damp things down a bit.

It seemed that for a few days there was sort of a nice mix of shorties, mid-ARs and AP work units. I don't know if anyone planned it that way or it was just random chance. With a decent mix, the system can almost maintain itself except until someone needs to change tapes - then it's pot luck.

ID: 1362877 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1362900 - Posted: 30 Apr 2013, 14:11:04 UTC

It is a game of hungry hungry hippos right now it would seem. With my 24 core box reporting a 3 hour queue it looks like it will be doing some PG work today once maintenance starts.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1362900 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13932
Credit: 208,696,464
RAC: 304
Australia
Message 1362909 - Posted: 30 Apr 2013, 18:34:28 UTC - in response to Message 1362860.  

+ 1, the splitters our next bottleneck?

It would appear so.
Just coming back from the outage, there is only 1 splitter running, and it is only producing 6 Wus/sec. With shorties we need at least 55/s. So we really need 10 PFB splitters, and there are only 6, and it's been a while since they've all been running. No wonder we keep running out of work, even while it is being split.
Grant
Darwin NT
ID: 1362909 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13932
Credit: 208,696,464
RAC: 304
Australia
Message 1362925 - Posted: 30 Apr 2013, 19:38:09 UTC - in response to Message 1362909.  

+ 1, the splitters our next bottleneck?

It would appear so.
Just coming back from the outage, there is only 1 splitter running, and it is only producing 6 Wus/sec. With shorties we need at least 55/s. So we really need 10 PFB splitters, and there are only 6, and it's been a while since they've all been running. No wonder we keep running out of work, even while it is being split.


All the splitters are now running, and they're producing less than half the rate of what is needed.
Grant
Darwin NT
ID: 1362925 · Report as offensive
andybutt
Volunteer tester
Avatar

Send message
Joined: 18 Mar 03
Posts: 262
Credit: 164,205,187
RAC: 516
United Kingdom
Message 1362958 - Posted: 30 Apr 2013, 21:14:15 UTC - in response to Message 1362925.  

Hungry GPU's sitting here doing nothing! Ho Hum!
ID: 1362958 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1362966 - Posted: 30 Apr 2013, 21:32:34 UTC

Just got a "Project down for maintenance" and a one hour backoff, so something's underway.

Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1362966 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24929
Credit: 3,081,182
RAC: 7
Ireland
Message 1362967 - Posted: 30 Apr 2013, 21:33:17 UTC

Hmmn, I get this....

30/04/2013 22:30:55 | SETI@home | Requesting new tasks for CPU
30/04/2013 22:30:57 | SETI@home | Scheduler request completed: got 0 new tasks
30/04/2013 22:30:57 | SETI@home | Project is temporarily shut down for maintenance

ID: 1362967 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1362986 - Posted: 30 Apr 2013, 22:35:48 UTC

My main cruncher just downloaded 98 GPU tasks, but my second one still can't get any.
ID: 1362986 · Report as offensive
Sakletare
Avatar

Send message
Joined: 18 May 99
Posts: 132
Credit: 23,423,829
RAC: 0
Sweden
Message 1362994 - Posted: 30 Apr 2013, 22:42:51 UTC

And what is this 90 Mbits/sec upload that's running 24/7, even when the project is down for maintenance?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets
ID: 1362994 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1362998 - Posted: 30 Apr 2013, 22:45:54 UTC - in response to Message 1362994.  

And what is this 90 Mbits/sec upload that's running 24/7, even when the project is down for maintenance?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

I don't think anyone ever guaranteed that that link was exclusively for seti@home.
ID: 1362998 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38019
Credit: 261,360,520
RAC: 489
Australia
Message 1363013 - Posted: 30 Apr 2013, 23:08:57 UTC - in response to Message 1362998.  

The MB splitters are certainly having a hard time getting up to their usual speed.

Cheers.
ID: 1363013 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1363019 - Posted: 30 Apr 2013, 23:38:30 UTC - in response to Message 1362998.  

And what is this 90 Mbits/sec upload that's running 24/7, even when the project is down for maintenance?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

I don't think anyone ever guaranteed that that link was exclusively for seti@home.

By the nature of a gigabit router interface, that link must be for SETI@home exclusively. My guess is the backup operation which is the reason for the Tuesday outage is sensibly being done to someplace outside the colocation facility.
                                                                   Joe
ID: 1363019 · Report as offensive
Tom*

Send message
Joined: 12 Aug 11
Posts: 127
Credit: 20,769,223
RAC: 9
United States
Message 1363060 - Posted: 1 May 2013, 1:46:36 UTC
Last modified: 1 May 2013, 1:56:32 UTC

Never thought I'd say this -

Please Split some AP's to slow down the feeder, Link and Client systems.
ID: 1363060 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1363061 - Posted: 1 May 2013, 2:53:07 UTC - in response to Message 1362998.  

And what is this 90 Mbits/sec upload that's running 24/7, even when the project is down for maintenance?

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

I don't think anyone ever guaranteed that that link was exclusively for seti@home.


then lets program the upload and download servers to report data rates and output on status page.
ID: 1363061 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1363062 - Posted: 1 May 2013, 3:05:59 UTC
Last modified: 1 May 2013, 3:12:09 UTC

What screams at me looking at the Munin graphs is that only when we ran out of AstroPulse units did the MB units start to plummet from 300K to 0. This suggests to me that a lot of GPU AstroPulse is being done and when that ran out the MB reserve was quickly devoured by hungry hungry GPUs.

It's interesting that 30 units/s creation rate isn't enough, at least during a shorty storm.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1363062 · Report as offensive
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1363073 - Posted: 1 May 2013, 3:30:20 UTC - in response to Message 1363062.  

What screams at me looking at the Munin graphs is that only when we ran out of AstroPulse units did the MB units start to plummet from 300K to 0. This suggests to me that a lot of GPU AstroPulse is being done and when that ran out the MB reserve was quickly devoured by hungry hungry GPUs.

It's interesting that 30 units/s creation rate isn't enough, at least during a shorty storm.

I noticed this too before the shutdown. When MB and AP units were both being split, things kind of behaved themselves and I got all the MB I could crunch. I sure hope this is fixed tomorrow.

The current server status page shows 6 MB splitters active. 8 were active earlier and couldn't keep up. This is very frustrating for dedicated Seti crunchers like me.

ID: 1363073 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13932
Credit: 208,696,464
RAC: 304
Australia
Message 1363106 - Posted: 1 May 2013, 6:51:32 UTC - in response to Message 1363062.  
Last modified: 1 May 2013, 7:00:02 UTC

It's interesting that 30 units/s creation rate isn't enough, at least during a shorty storm.

During a shorty storm 55/s is the minimum needed to meet demand.
The storm is over, but still the splitters aren't able to crank out enough work. For a while they were doing about 30/s (barely enough when there's a lot of VLARs in the mix). Now they've dropped down to less than 20.

Someone in the lab needs to take a look at what is going on- the splitters used to be able sustain 70/s no problems at all, now they can't even reach it as a peak.


EDIT- at least the shorty storm was over for a while. The work my systems were able to get after the outage while i was at work didn't have many shorties in it, but one of the systems was just able to get some more work (still nowhere near enough...) & it was almost all shorties.
Grant
Darwin NT
ID: 1363106 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 20 · Next

Message boards : Number crunching : Panic Mode On (83) Server Problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.