Panic Mode On (24) Server problems

Message boards : Number crunching : Panic Mode On (24) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 932072 - Posted: 9 Sep 2009, 14:01:33 UTC - in response to Message 932056.  

More like Catch22
for days - no work
Now - work available
BUT ya can;t get it 'cause ya can't upload.


I got some! GPUs no longer idle...for now. If the uploads stay stalled I won't be able to download more...


Same here and I guess everywhere..

UL/DL slowly/not possible.. errors..
Also not possible to reach the scheduler..


ID: 932072 · Report as offensive
Profile 52 Aces
Avatar

Send message
Joined: 7 Jan 02
Posts: 497
Credit: 14,261,068
RAC: 67
United States
Message 932074 - Posted: 9 Sep 2009, 14:19:18 UTC - in response to Message 932049.  

More like Catch22


A Catch two-twenty-two, as all your wingmen are also in the same situation, so even stuff you had pending can't be credited as they can't upload either ;-)

I do wish the User total chart inside Boinc graphed based on the timestamp I completed the WU. I consistenly crunch the same small amount of work everyday. But my chart looks like a I go dark for days on end as my silly wingmen seem to use their systems for other stuff, thus I wait and my graph goes flat ;-)
ID: 932074 · Report as offensive
Profile Samdani
Avatar

Send message
Joined: 21 Oct 00
Posts: 85
Credit: 13,480,553
RAC: 0
Pakistan
Message 932097 - Posted: 9 Sep 2009, 17:29:51 UTC

Finally started to receive work. Actually, at the exact moment when last GPU unit was being crunched. Am I lucky or what :)
ID: 932097 · Report as offensive
Profile Jet

Send message
Joined: 25 Sep 07
Posts: 12
Credit: 1,586,013
RAC: 0
Ukraine
Message 932104 - Posted: 9 Sep 2009, 18:08:20 UTC - in response to Message 932049.  

Yes, you are right, exact description of the problem. upload pending :-(
ID: 932104 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 932109 - Posted: 9 Sep 2009, 18:36:52 UTC

As others have noted already, bandwidth is pretty much pegged:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-250/gigabitethernet2_3&ranges=d%3Aw&view=Octets

It's a 100 megabit link at 92 megabits. When you start loading above 80% or so, there are lots of packets that get dropped due to congestion.

It would be nice if BOINC could "flow control" uploads: slowing down all of the attempts would reduce collisions and increase throughput.

In the meantime, hang on and enjoy the ride. It always recovers eventually.
ID: 932109 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 932116 - Posted: 9 Sep 2009, 19:07:08 UTC - in response to Message 932109.  

Now my "kill network for a week then upload everything, rinse lather repeat" plan isn't so stupid, lol.

God I wish I knew what it is like to worry about being on your last 1,000 GPU units.
30+ Computers heading our way! Currently at the "Zomg we need to talk to our tech expert at the co-op about this first!!!" stage. 16 Lab machines and 14+ Staff machines each with 2.2Ghz CPUs and 256MB ram. Think they balance? The RAM certainly is bad
ID: 932116 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 932117 - Posted: 9 Sep 2009, 19:10:16 UTC

Can we learn anything about these outages, where the bandwidth gets pegged and we all "suffer"? Like, what is the maximum time Matt can leave Berkeley before all hell breaks loose? Or, is there something more interesting?
ID: 932117 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 932119 - Posted: 9 Sep 2009, 19:17:05 UTC

Uploads keep hitting 100% then just go back to retry in....At least they are making progress.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 932119 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 932120 - Posted: 9 Sep 2009, 19:17:06 UTC - in response to Message 932117.  

The problem at this moment (in about 3.5 moments there will be a new problem :p lol jk), is that everybody's computers are able to get work now, so they use massive amounts of bandwidth. Because of this, nobody can upload their tasks to get more. It's a vicious cycle that only a new internet connection can fix, but right now SETI just doesn't have the cash for it.


30+ Computers heading our way! Currently at the "Zomg we need to talk to our tech expert at the co-op about this first!!!" stage. 16 Lab machines and 14+ Staff machines each with 2.2Ghz CPUs and 256MB ram. Think they balance? The RAM certainly is bad
ID: 932120 · Report as offensive
Profile Labbie
Avatar

Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 932145 - Posted: 9 Sep 2009, 22:10:03 UTC

Think it's bad here?

Pigeon transfers data faster than South Africa's Telkom

JOHANNESBURG (Reuters) – A South African information technology company on Wednesday proved it was faster for them to transmit data with a carrier pigeon than to send it using Telkom , the country's leading internet service provider.

Internet speed and connectivity in Africa's largest economy are poor because of a bandwidth shortage. It is also expensive.

Local news agency SAPA reported the 11-month-old pigeon, Winston, took one hour and eight minutes to fly the 80 km (50 miles) from Unlimited IT's offices near Pietermaritzburg to the coastal city of Durban with a data card was strapped to his leg.

Including downloading, the transfer took two hours, six minutes and 57 seconds -- the time it took for only four percent of the data to be transferred using a Telkom line.

SAPA said Unlimited IT performed the stunt after becoming frustrated with slow internet transmission times.

The company has 11 call-centers around the country and regularly sends data to its other branches.

Telkom could not immediately be reached for comment.

Internet speed is expected to improve once a new 17,000 km underwater fiber optic cable linking southern and East Africa to other networks becomes operational before South Africa hosts the soccer World Cup next year.

Local service providers are currently negotiating deals for more bandwidth.


Calm Chaos Forum...Join Calm Chaos Now
ID: 932145 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 932146 - Posted: 9 Sep 2009, 22:17:24 UTC - in response to Message 932117.  

Can we learn anything about these outages, where the bandwidth gets pegged and we all "suffer"? Like, what is the maximum time Matt can leave Berkeley before all hell breaks loose? Or, is there something more interesting?

If by "we" you mean forum members, I think there are a couple of interesting lessons.

Probably the most interesting is: "when you crush the network, it does eventually recover."

If by "we" you mean SETI@Home, I think there are some experiments that could be run, like setting the "back-off" time to more than 11 seconds.

If by "we" you mean the BOINC developers, I think we have a testbed for some kind of flow control, but someone has to write the code.
ID: 932146 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 932147 - Posted: 9 Sep 2009, 22:20:36 UTC - in response to Message 932145.  

Think it's bad here?

Pigeon transfers data faster than South Africa's Telkom

Which makes me wonder why they haven't implemented IPoAC in South Africa.

ID: 932147 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 932148 - Posted: 9 Sep 2009, 22:24:37 UTC - in response to Message 932147.  

Think it's bad here?

Pigeon transfers data faster than South Africa's Telkom

Which makes me wonder why they haven't implemented IPoAC in South Africa.

Thinking about it a little more, I wonder if IPoAC would be possible to improve the bandwidth situation on campus....
ID: 932148 · Report as offensive
Profile Labbie
Avatar

Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 932150 - Posted: 9 Sep 2009, 22:30:01 UTC - in response to Message 932148.  

Think it's bad here?

Pigeon transfers data faster than South Africa's Telkom

Which makes me wonder why they haven't implemented IPoAC in South Africa.

Thinking about it a little more, I wonder if IPoAC would be possible to improve the bandwidth situation on campus....


LOL - but it would take an entire flock!!!!

Calm Chaos Forum...Join Calm Chaos Now
ID: 932150 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 932152 - Posted: 9 Sep 2009, 22:38:58 UTC - in response to Message 932146.  

Can we learn anything about these outages, where the bandwidth gets pegged and we all "suffer"? Like, what is the maximum time Matt can leave Berkeley before all hell breaks loose? Or, is there something more interesting?

If by "we" you mean forum members, I think there are a couple of interesting lessons.

Probably the most interesting is: "when you crush the network, it does eventually recover."

If by "we" you mean SETI@Home, I think there are some experiments that could be run, like setting the "back-off" time to more than 11 seconds.

If by "we" you mean the BOINC developers, I think we have a testbed for some kind of flow control, but someone has to write the code.


Ok, you have some good thoughts. I would love to know if anyone relevant has read them.
ID: 932152 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 932153 - Posted: 9 Sep 2009, 22:40:47 UTC - in response to Message 932145.  

Telkom could not immediately be reached for comment.

I wonder why not?
ID: 932153 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 932154 - Posted: 9 Sep 2009, 22:42:44 UTC - in response to Message 932153.  

Telkom could not immediately be reached for comment.

I wonder why not?

Lack of bandwidth...

F.
ID: 932154 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 932155 - Posted: 9 Sep 2009, 22:49:01 UTC - in response to Message 932154.  

Telkom could not immediately be reached for comment.

I wonder why not?

Lack of bandwidth...

Or lack of pigeons...
ID: 932155 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 932157 - Posted: 9 Sep 2009, 22:50:18 UTC - in response to Message 932152.  

Can we learn anything about these outages, where the bandwidth gets pegged and we all "suffer"? Like, what is the maximum time Matt can leave Berkeley before all hell breaks loose? Or, is there something more interesting?

If by "we" you mean forum members, I think there are a couple of interesting lessons.

Probably the most interesting is: "when you crush the network, it does eventually recover."

If by "we" you mean SETI@Home, I think there are some experiments that could be run, like setting the "back-off" time to more than 11 seconds.

If by "we" you mean the BOINC developers, I think we have a testbed for some kind of flow control, but someone has to write the code.


Ok, you have some good thoughts. I would love to know if anyone relevant has read them.

I assume you aren't saying that those of use here are not relevant. :-)

Since every time this happens it seems to be a minor tempest in the forums, I would say that some forum members still haven't figured out that it isn't as big a problem as it might.

For the folks at SETI@Home, I'm sure they've experienced what happens when you try to speed up the recovery process -- my experience is that it generally makes recovery slower.

For the BOINC developers, one of my suggestions is in 6.6.38 and later.
ID: 932157 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 932158 - Posted: 9 Sep 2009, 22:50:37 UTC - in response to Message 932155.  

Telkom could not immediately be reached for comment.

I wonder why not?

Lack of bandwidth...

Or lack of pigeons...

Same thing.
ID: 932158 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (24) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.