Panic Mode On (58) Server problems?

Message boards : Number crunching : Panic Mode On (58) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1160392 - Posted: 8 Oct 2011, 22:39:55 UTC - in response to Message 1160132.  


Does anyone want to hazard a guess as to when GPU crunching estimates will finally come back to realistic values?

The DCF on both of my machines is now around 1.0 & CPU estimates are close enough. But the GPU ones are still way out- and although they slowly get closer to the actual value (although never even remotely near it) as soon as a CPU WU is done, it's back to where it was.

Remember the sequence of events.

......

But all we can do from the outside is observe, advise, warn, cajole..... The decisions will be taken in Berkeley.

Thanks for the info.
So with a bit of luck, it should be some time between now & then.


With things as they are, it didn't take long to hit the new CPU limits, but due to the estimated completion times i'm not going to ever hit the GPU limits.
Grant
Darwin NT
ID: 1160392 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1160399 - Posted: 8 Oct 2011, 22:50:40 UTC - in response to Message 1160293.  
Last modified: 8 Oct 2011, 23:12:02 UTC

Are there any plans of making version 6.12 usable by high end hosts before trying to roll to V7?

I presume you're referring to the increased backoffs in BOINC 6.12.x, and as that's a fundamental design of the series I don't expect the BOINC devs to modify it. They're in bugfixing only mode for that branch, and of course assuming that because 6.12 is the recommended version it's reasonable to consider its effects as if all users have adopted the recommendation.

The issue isn't really the backoffs so much as work delivery here,

Nah, the backoffs are an issue, because even when the work is available & can be downloaded, all it takes is one or two hiccups with the download and with the 6.12.33 client, it goes into project backoff mode for 2+ hours & not only does nothing get downloaded, nothing gets reported either so it all just backs up. In the end if i'm not here to keep hitting the retry button the system runs out of work. Another miss & i've seen backoofs of over 8 hours.
More than just a bit silly IMHO.


EDIT- although i have to agree that more network bandwidth would be good- the backoffs are a problem, but if the network was maxed out for only an hour or two instead of permanently then the backoffs wouldn't occur & wouldn't be an issue.
Grant
Darwin NT
ID: 1160399 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1160402 - Posted: 8 Oct 2011, 23:04:04 UTC

I agree with grant. I had to abuse the retry button just to get 14 work units. It kind of peeves me to go to transfers and see a work unit 80% downloaded with a back off of 2 hours.


[/quote]

Old James
ID: 1160402 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34347
Credit: 79,922,639
RAC: 80
Germany
Message 1160413 - Posted: 8 Oct 2011, 23:25:32 UTC


Keep in mind results out in the field are only ~3 million.
Usually its more 6 million.
It wasn´t that low since the last big outage last year.

I´m normally able to download 200+ APs and 2000 MBs a day without a glitch with 6.12.

I totally agree with Joe.




With each crime and every kindness we birth our future.
ID: 1160413 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1160434 - Posted: 9 Oct 2011, 0:29:23 UTC - in response to Message 1160413.  

I´m normally able to download 200+ APs and 2000 MBs a day without a glitch with 6.12.

I get excited when i get 6 downloads in a row without it backing off for more than 2 hours.

Grant
Darwin NT
ID: 1160434 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1160437 - Posted: 9 Oct 2011, 0:46:34 UTC

It is bad enough on a fast cruncher that I went BACK to 6.10.58 on it.

The slower machine with the GT430 card:.. It is a minor nuisance. on the GTX 590/ 2x e5620 machine it was intolerable.
Janice
ID: 1160437 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1160440 - Posted: 9 Oct 2011, 1:00:21 UTC - in response to Message 1160437.  

Same here. My triple 460 machine would never build a cache with 6.12.33. I ran it for about 5-6 days before switching all of my machines back to 6.10.58. I could care less about the manager, as I use BoincTasks to manage my 3 hosts simultaneously, but the client back offs are entirely too long, and the client just gives up entirely too quickly. If it gets a No Tasks Available message 3-4 times in a row, it will back off 1-2 hours before even asking again. If that results in another No Tasks available, it can go even longer than that. With as quickly as the feeder runs dry on this project, it is basically worthless to a machine with a significant RAC.
ID: 1160440 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1160444 - Posted: 9 Oct 2011, 1:28:14 UTC


Downloads continue to be difficult- time out straight away, or time counts down but no data untill it eventually times out (5-15min later). Or it eventually starts to download at 2-5KB/s.
Or you get lucky & it starts straight away & at 25-40kB/s.


I wonder if it is a problem with one of the download servers, or if there's still a problem with their load sharing?
When network traffic is at reasonable levels, all downloads come down at similar speeds. But when things are maxed out one server gives little or no downloads while on the other they fly.
Grant
Darwin NT
ID: 1160444 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1160446 - Posted: 9 Oct 2011, 1:36:01 UTC - in response to Message 1160444.  


Downloads continue to be difficult- time out straight away, or time counts down but no data untill it eventually times out (5-15min later). Or it eventually starts to download at 2-5KB/s.
Or you get lucky & it starts straight away & at 25-40kB/s.


I wonder if it is a problem with one of the download servers, or if there's still a problem with their load sharing?
When network traffic is at reasonable levels, all downloads come down at similar speeds. But when things are maxed out one server gives little or no downloads while on the other they fly.

Or the routing table in our flaky router...more RAM is on the way.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1160446 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1160447 - Posted: 9 Oct 2011, 1:37:07 UTC - in response to Message 1160446.  


Downloads continue to be difficult- time out straight away, or time counts down but no data untill it eventually times out (5-15min later). Or it eventually starts to download at 2-5KB/s.
Or you get lucky & it starts straight away & at 25-40kB/s.


I wonder if it is a problem with one of the download servers, or if there's still a problem with their load sharing?
When network traffic is at reasonable levels, all downloads come down at similar speeds. But when things are maxed out one server gives little or no downloads while on the other they fly.

Or the routing table in our flaky router...more RAM is on the way.

Unlikely, but you never know.
Grant
Darwin NT
ID: 1160447 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1160458 - Posted: 9 Oct 2011, 2:22:32 UTC - in response to Message 1160447.  

I think we have other problems, but those should be more apparent and easier to work on after the router and saturation issues are resolved.
Janice
ID: 1160458 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1160482 - Posted: 9 Oct 2011, 4:55:03 UTC - in response to Message 1160458.  

I think we have other problems, but those should be more apparent and easier to work on after the router and saturation issues are resolved.


From the science paper "Status of the UC-Berkeley SETI Efforts
E. J. Korpelaa, D. P. Andersona, R. Bankaya, J. Cobba, A. Howarda,
M. Lebofskya, A. P. V. Siemiona, J. von Korffb, and D. Werthimera
University of California, Berkeley, CA, 94720 USA;
Kansas State University,Manhattan, KS, 66502 USA" posted online at:

http://setiathome.berkeley.edu/sah_papers/status_of_ucb_seti_efforts_2011.pdf

we read the following;

"SETI@home is one of the largest supercomputers on our planet, currently averaging 3.5 PFLOP actual performance."

A 3.5 PFLOP supercomputer is the equivalent of two CRAY Jaguar supercomputers (Oak Ridge National Labs) which cost about $100,000,000.00 each. One would THINK having a free $200,000,000.00 super-computer would justify someone's ponying-up for a bigger data pipe.

You know why it doesn't, don't you?

Because we're looking for little green men and not something politically useful in someone's quest for power.

We've got hundreds, possibly thousands, of TRULY stupid government programs being funded at ludicrous levels, for no reason at all... bridges to nowhere, midnight basketball, sex-change operations as a standard health benefit, weapons systems nobody wants, $400 toilet seats, hundreds of millions to solar panel companies with outdated technologies, not to mention things like a big hole in the ground for a SCSC that was never built...

Our government practically BURNS more money than many first-world countries' GDP on the most asinine things... you can get lots of funds if you want to study what makes teenagers want to have sex with each-other (the researchers need to get out once in a while) or even more to study the effects of alcohol on reflex speeds (the researchers need a pencil, paper, a six-pack and a stopwatch, or a case of Scotch and a sundial).

But we can't get an adequate data connection to a 3.5 PFLOP supercomputer looking for one of the major discoveries of all time? And it is being donated for nothing, maintained for nothing, improved for nothing, powered for nothing, and just gets faster and faster every stinking day.

I wonder what the typical system administrator for a supercomputer makes as a salary?

But a lot of the world doesn't want us to find "a Martian" and the rest thinks we're all tin-foil hat wearing cranks.

The money isn't coming folks. Not now, not ever.

Sometimes I think Berkeley keeps this thing going more out of habit than anything else.

"'Life.' Don't talk to me about 'life.'"
ID: 1160482 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1160492 - Posted: 9 Oct 2011, 6:12:24 UTC
Last modified: 9 Oct 2011, 6:16:16 UTC

Something's borked. Scheduler requests are timing out, D/L's are getting the "HTTP error", result creation rates are zero and all the transitioners are offline. U/L's seem to be working though, albeit slowly.
ID: 1160492 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1160493 - Posted: 9 Oct 2011, 6:21:56 UTC - in response to Message 1160492.  


Yes, i noticed the "Ready to send" buffer had shrunk, and the splitters hadn't cranked it up. Checked out the server staus page & it appears all the transitioners have decided to take a break. End result- everything will come to a grinding halt.
Grant
Darwin NT
ID: 1160493 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1160496 - Posted: 9 Oct 2011, 6:29:41 UTC
Last modified: 9 Oct 2011, 6:47:09 UTC

"She's comin' apart, Jim..."

My little portion of the Cray supercomputer is having problems connecting to the control node.

Completed results are all going into pending and no validations are occurring....or at least no credits are being awarded.

That little yellow dude has done it again.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1160496 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1160498 - Posted: 9 Oct 2011, 6:53:57 UTC
Last modified: 9 Oct 2011, 6:55:59 UTC

09/10/2011 10:47:25 SETI@home Message from server: This computer has reached a limit on tasks in progress

?...

Max tasks per day 396
Number of tasks today 260


Max tasks per day 672
Number of tasks today 3
ID: 1160498 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36329
Credit: 261,360,520
RAC: 489
Australia
Message 1160499 - Posted: 9 Oct 2011, 6:54:34 UTC - in response to Message 1160496.  

And all the transitioners are showing as being down now too.

Cheers.
ID: 1160499 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1160500 - Posted: 9 Oct 2011, 6:57:37 UTC - in response to Message 1160496.  

My little portion of the Cray supercomputer is having problems connecting to the control node.

Like :-)

ID: 1160500 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1160510 - Posted: 9 Oct 2011, 8:11:02 UTC

All OK here, just got up and my home PC can upload and report. Not sure about downloads as it is sitting on 3 APs with estimated run times of 10-11 days so BOINC thinks it is full. Once it finishes crunching shorties in about 2 hours it should be able to start on two of the APs and bring the estimates down. I just hope there is so work left by then.
ID: 1160510 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1160512 - Posted: 9 Oct 2011, 8:14:59 UTC - in response to Message 1160510.  

I just hope there is so work left by then.

Doubtful.
Even though the server status page shows the splitters as running, it also shows the creation rate for MB and AP at zero. And ready to send is dropping on both.

That little yellow guy better head for the hills, because in a short while the Cricket graphs are gonna tank.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1160512 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (58) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.