Servers are back on line after Monday's outage

Message boards : News : Servers are back on line after Monday's outage
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1327712 - Posted: 15 Jan 2013, 22:58:50 UTC - in response to Message 1327709.  

Thank you Jeff: thank you one and all.
That is some really special work!
Just looking forward to reporting a few tasks and maybe collecting some new ones ;-)



ID: 1327712 · Report as offensive
Profile Keno
Avatar

Send message
Joined: 11 Apr 02
Posts: 20
Credit: 8,491,109
RAC: 13
United States
Message 1327719 - Posted: 15 Jan 2013, 23:09:01 UTC

I work in a data center as a critical site engineer.
Without the AC working 100% all the servers would crash within an hour or sooner.
The added cost of electricity and AC maintenance to keep the place cool is figured in for most cases. Unexpected AC break-downs happen unfortunately.
ID: 1327719 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1327818 - Posted: 16 Jan 2013, 6:02:41 UTC - in response to Message 1327709.  

Cool Jeff !
Thanks for the update :)
ID: 1327818 · Report as offensive
Andy

Send message
Joined: 24 Jun 07
Posts: 2
Credit: 988,375
RAC: 0
Message 1327821 - Posted: 16 Jan 2013, 6:07:54 UTC - in response to Message 1320226.  

is it still down as i request updates but just getting deferred
06 00 16th jan 13
ID: 1327821 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9947
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1327876 - Posted: 16 Jan 2013, 9:51:16 UTC - in response to Message 1327821.  

is it still down as i request updates but just getting deferred
06 00 16th jan 13

It's not"down" just after an outage of this length (project was down for a day before this) there are thousands of machines all trying to report completed work and request new work, could take a few days before things settle down.
ID: 1327876 · Report as offensive
William Sommers

Send message
Joined: 19 Oct 00
Posts: 4
Credit: 42,566,282
RAC: 201
United States
Message 1327921 - Posted: 16 Jan 2013, 13:38:10 UTC

Is there a reason that work units download so very slow from Seti?
Downloads run 10 to 15 times faster on Rosetta.
This has been going on for months.
Rosetta downloads at 500 to 1200 KBps and Seti downloads at
2.49 to 9.78 KBps
Am I doing something wrong?
ID: 1327921 · Report as offensive
Rolf

Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 0
Switzerland
Message 1327926 - Posted: 16 Jan 2013, 14:00:56 UTC - in response to Message 1327921.  

Am I doing something wrong?

No, definitively not!
After the outage there is simply a gigantic traffic jam! The easiest way you can see it is http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw%3Am%3Ay
The green part of the graphics is outgoing from SETI, this is maxed out (no more traffic possible)!
Patience, please!
ID: 1327926 · Report as offensive
Grasnek

Send message
Joined: 26 Feb 00
Posts: 7
Credit: 365,990
RAC: 0
Netherlands
Message 1327983 - Posted: 16 Jan 2013, 17:11:22 UTC

I understand the traffic jam after a few days of being offline.

It's a shame that a lot of bandwidth is probably wasted though, because currently I only seem to get partial downloads. Work units download anywhere between 0.94% and 96.73%. Tons are stuck in download, and only a few made it to 100% today.

I also noticed that a lot of workunits (that didn't download btw) are for my GPU with an estimated run time of 2,5 minutes. So if they would have downloaded, it would create even more congestion in a matter of minutes.

Of course I have no idea what is and isn't possible with the scheduler, but wouldn't it be better to schedule a few long running work units after you've been offline, so you don't get bothered by those machines for a few hours and can tend to the rest of the clients?
ID: 1327983 · Report as offensive
Profile DeD
Avatar

Send message
Joined: 7 Jan 13
Posts: 6
Credit: 53,853
RAC: 0
United States
Message 1328173 - Posted: 17 Jan 2013, 5:25:46 UTC

Wow tons of bandwidth being used! Hopefully it's an influx of new users trying to help out
ID: 1328173 · Report as offensive
Profile Lynn Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 20 Nov 00
Posts: 13977
Credit: 79,603,650
RAC: 123
United States
Message 1328176 - Posted: 17 Jan 2013, 5:44:44 UTC - in response to Message 1328173.  

I keep getting a message. Won't get news tasks. Hoping this will change.
ID: 1328176 · Report as offensive
Profile DeD
Avatar

Send message
Joined: 7 Jan 13
Posts: 6
Credit: 53,853
RAC: 0
United States
Message 1328192 - Posted: 17 Jan 2013, 6:18:25 UTC

ID: 1328192 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1328670 - Posted: 18 Jan 2013, 15:48:34 UTC - in response to Message 1327983.  
Last modified: 18 Jan 2013, 15:52:08 UTC

I understand the traffic jam after a few days of being offline.

It's a shame that a lot of bandwidth is probably wasted though, because currently I only seem to get partial downloads. Work units download anywhere between 0.94% and 96.73%. Tons are stuck in download, and only a few made it to 100% today.

I also noticed that a lot of workunits (that didn't download btw) are for my GPU with an estimated run time of 2,5 minutes. So if they would have downloaded, it would create even more congestion in a matter of minutes.

Of course I have no idea what is and isn't possible with the scheduler, but wouldn't it be better to schedule a few long running work units after you've been offline, so you don't get bothered by those machines for a few hours and can tend to the rest of the clients?


a) the "stuck downloads" syndrome is common for the servers when there's a "Traffic jam": It's normal, and will work itself out.

B) The folks at Berkeley don't necessarily know what type of WU (workunit) a "tape" will produce. Sometimes a 'tape' starts with normal WU's and about halfway through, starts producing shorties. It depends on what the Aracibo telescope is doing while the "tape" is being produced: sometimes that changes while the "tape" is running. The GPU's run time estimates are often wrong - it depends on what kind of WU's you've previously processed. (With an estimate of 2.5 minutes, you must have high-end nVidia Keppler(s)!)
.

Hello, from Albany, CA!...
ID: 1328670 · Report as offensive
Profile Cornhusker

Send message
Joined: 20 Apr 09
Posts: 41
Credit: 45,415,265
RAC: 37
United States
Message 1328748 - Posted: 18 Jan 2013, 19:31:56 UTC - in response to Message 1328192.  

yea check out http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw%3Am%3Ay


they are having a severe network load right now...


I'd appreciate it if someone in the know would explain the chart. It says it's a chart for a gigibit eithernet controller. That's 1,000 megabits, yet the graph shows it's being utilized at less than 100 megabits. So what's the problem?

ID: 1328748 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 20237
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1328761 - Posted: 18 Jan 2013, 19:50:14 UTC

The easy bit first...
The 1Gb bandwidth has been throttled to 100Mb by "campus politics"

Next the less easy bits
The green mass is the data rate from S@H to the rest of the word (confusingly called "in")
The blue line is the data rate into S@H (and confusingly called "out")
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1328761 · Report as offensive
Profile Cornhusker

Send message
Joined: 20 Apr 09
Posts: 41
Credit: 45,415,265
RAC: 37
United States
Message 1328991 - Posted: 19 Jan 2013, 4:33:42 UTC - in response to Message 1328761.  

Thanks for taking the time to explain it, Bob. It makes sense now!
ID: 1328991 · Report as offensive
Grasnek

Send message
Joined: 26 Feb 00
Posts: 7
Credit: 365,990
RAC: 0
Netherlands
Message 1329012 - Posted: 19 Jan 2013, 6:21:24 UTC - in response to Message 1328670.  


a) the "stuck downloads" syndrome is common for the servers when there's a "Traffic jam": It's normal, and will work itself out.

B) The folks at Berkeley don't necessarily know what type of WU (workunit) a "tape" will produce. Sometimes a 'tape' starts with normal WU's and about halfway through, starts producing shorties. It depends on what the Aracibo telescope is doing while the "tape" is being produced: sometimes that changes while the "tape" is running. The GPU's run time estimates are often wrong - it depends on what kind of WU's you've previously processed. (With an estimate of 2.5 minutes, you must have high-end nVidia Keppler(s)!)


Thanks for this insight, I didn't know it was depending on what material is being delivered. I thought the data stream was just being chopped up in shorter and larger parts for slower and faster systems and some made compatible for GPU processing rather than CPU.

I recently upgraded to a manufacturer overclocked GTX670 (I just plugged it in, didn't change anything about it), so you guessed right about the high end keppler. Most of the estimated 2,5 minute WUs ended up taking a bit longer than 3,5 minutes though. But to be fair I had never seen such small WUs before, so estimates are bound to be off.
ID: 1329012 · Report as offensive
robspan

Send message
Joined: 3 Apr 99
Posts: 2
Credit: 259,550
RAC: 0
United Kingdom
Message 1329230 - Posted: 19 Jan 2013, 22:16:10 UTC

I've had nothing for 10 days,

just the usual "got 0 new tasks" nonsense...

I guess my other projects get all the cpu time !
ID: 1329230 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9947
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1329271 - Posted: 20 Jan 2013, 1:07:12 UTC - in response to Message 1329230.  

I've had nothing for 10 days,

just the usual "got 0 new tasks" nonsense...

I guess my other projects get all the cpu time !

Looks like you just got some
ID: 1329271 · Report as offensive
Profile Al Capone

Send message
Joined: 13 Jul 01
Posts: 4
Credit: 23,063,498
RAC: 0
Sweden
Message 1329340 - Posted: 20 Jan 2013, 5:34:43 UTC

It seems that the Seti Project is dying.

The bandwith cap to 100Mb is probably the first step.

I havn't recieved any new WU´s to process.

If this will continue for much longer i will
delete my SETI account on all my computers.



ID: 1329340 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 22933
Credit: 261,360,520
RAC: 489
Australia
Message 1329350 - Posted: 20 Jan 2013, 7:36:18 UTC - in response to Message 1329340.  

It seems that the Seti Project is dying.

The bandwith cap to 100Mb is probably the first step.

I havn't recieved any new WU´s to process.

If this will continue for much longer i will
delete my SETI account on all my computers.

If you haven't received any work then I suggest that you check your settings at your end as work has been flowing freely if a bit slow.

Cheers.
ID: 1329350 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Servers are back on line after Monday's outage


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.