Panic Mode On (69) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (69) Server problems?

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next
Author Message
msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38875
Credit: 577,821,956
RAC: 523,335
United States
Message 1200110 - Posted: 26 Feb 2012, 19:34:42 UTC

Ahhh!! Kitties spy Boincstats has picked up fresh stats for tomorrow's update.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2245
Credit: 8,589,438
RAC: 4,316
United States
Message 1200174 - Posted: 26 Feb 2012, 22:41:46 UTC - in response to Message 1200056.

My G4 boxes take about 55 hours for a mid-range MB, 35-40 for a VLAR, and 10-15 for a "shorty", depending on how much I am using that computer - I have NO dedicated crunchers. It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.

Very true. I remember a few months back when things were mostly smooth, even with my 10-day cache, WUs that I got 10 days ago and return were STILL waiting for quite possibly the slowest allowed machines to crunch AP. Oh, and guess what? They would never return them anyway.

I know everybody gets stuck with bad wingmates from time to time, but I felt like I was being punished for some reason. Nearly every AP I fought the congested pipe for ended up having a wingmate that had like a 1.6GHz P4, running stock, and their cache was two tasks. RAC of like 15. Or I would get one where their computer was created like a day before they became my wingmate and had hundreds of tasks.. and never returned a single one of them. Having to wait 25/45 days for that kind of nonsense is something that can be fixed.

Just drop the deadline for MB back down to 3 weeks like it used to be. 21 days is more than plenty of time for someone to ask for work, download it, crunch it in 2-5 days, have internet only one day per week and still return it before the deadline.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Yezok/Zeek.Seti.Cluster
Avatar
Send message
Joined: 6 Jul 99
Posts: 52
Credit: 2,811,153
RAC: 2
Canada
Message 1200189 - Posted: 26 Feb 2012, 23:54:39 UTC - in response to Message 1198679.

I think he means this one every one the runs SETI should have have one in their back pocket and should have read it 100 times.


It should be a REQUIREMENT !!!
____________
Gimme BEER and WU's!!!!

Seahawk
Volunteer tester
Avatar
Send message
Joined: 8 Jan 08
Posts: 916
Credit: 3,765,845
RAC: 1
United States
Message 1200232 - Posted: 27 Feb 2012, 2:50:41 UTC

I just watched half a page worth of WUs d/l without a flaw. First time in over a month. Then I noticed that no APs are being sent out as the tapes are all done.
____________
I used to be a cruncher like you, then I took an arrow to the knee.

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3622
Credit: 48,536,941
RAC: 34,853
United States
Message 1200242 - Posted: 27 Feb 2012, 3:46:57 UTC - in response to Message 1200232.

I just watched half a page worth of WUs d/l without a flaw. First time in over a month. Then I noticed that no APs are being sent out as the tapes are all done.


I went from 32 WU ready to 732 wu ready is about 2 hours.
____________

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6123
Credit: 673,784
RAC: 1,180
United States
Message 1200264 - Posted: 27 Feb 2012, 6:05:32 UTC - in response to Message 1200174.

My G4 boxes take about 55 hours for a mid-range MB, 35-40 for a VLAR, and 10-15 for a "shorty", depending on how much I am using that computer - I have NO dedicated crunchers. It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.

Very true. I remember a few months back when things were mostly smooth, even with my 10-day cache, WUs that I got 10 days ago and return were STILL waiting for quite possibly the slowest allowed machines to crunch AP. Oh, and guess what? They would never return them anyway.

I know everybody gets stuck with bad wingmates from time to time, but I felt like I was being punished for some reason. Nearly every AP I fought the congested pipe for ended up having a wingmate that had like a 1.6GHz P4, running stock, and their cache was two tasks. RAC of like 15. Or I would get one where their computer was created like a day before they became my wingmate and had hundreds of tasks.. and never returned a single one of them. Having to wait 25/45 days for that kind of nonsense is something that can be fixed.

Just drop the deadline for MB back down to 3 weeks like it used to be. 21 days is more than plenty of time for someone to ask for work, download it, crunch it in 2-5 days, have internet only one day per week and still return it before the deadline.

3-4 weeks (down from the current 6.5) I could live with. Much more reasonable than some of the other suggestions I've seen here recently.

____________
Donald
Infernal Optimist / Submariner, retired

TPCBF
Send message
Joined: 18 May 99
Posts: 50
Credit: 990,811
RAC: 1,788
United States
Message 1200266 - Posted: 27 Feb 2012, 6:15:43 UTC - in response to Message 1200056.

It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.
+1

That's my take on this problem as well. A few weeks ago, I was almost scolded when mentioning this. Had WU already sitting in PV jail and when checking on the wingman, he had loaded up on tasks, but nothing returned in a couple of weeks already and another 4-5 weeks to just sit and wait for those WUs to timeout and eventually are being re-assigned. And he had a "top notch" crunching machine, while I have currently ony two rather mediocre systems crunching for S@H.

I think that the problems are rather self-induced by those folks with their high-end crunching machines, boasting all over the place how they "load up" of week(s) of WUs. But then they aren't likely the ones who will admit to that...

Ralf

Blake Bonkofsky
Volunteer tester
Avatar
Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,332,781
RAC: 0
United States
Message 1200280 - Posted: 27 Feb 2012, 7:00:34 UTC - in response to Message 1200266.
Last modified: 27 Feb 2012, 7:02:28 UTC

I would think that anyone that posts here on the forums boasting about power crunchers, probably knows the inner workings of SETI/BOINC, and it hardly the problem group you refer to. I'm not saying those users don't exist, but I'd say they are not the ones taking active roles on these forums. I've been "back" to SETI for almost 2 years now, and don't plan on going anywhere. Even if I do, I will run my cache's down first, as I had to do last year when my AC system failed in August.

Second, I wish I could "Load up on week(s)" of tasks. I can to my 10 day target on the CPU if I pick up enough AP's, but even my slowest GPU with its 400 WU limit is only about 1.5-2days. MB only on the CPU's and I get maybe 4 days on the Q8300.

Also, what is the difference, really, between a power user that holds a 10 day cache and reports it regularly, and a ancient machine that takes several days to crunch a single task? They are both effectively "holding" that WU for several days, one machine just happens to need all of that CPU time to crunch it.
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6783
Credit: 92,944,278
RAC: 75,877
Australia
Message 1200281 - Posted: 27 Feb 2012, 7:03:32 UTC - in response to Message 1200266.

It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.
+1

That's my take on this problem as well. A few weeks ago, I was almost scolded when mentioning this. Had WU already sitting in PV jail and when checking on the wingman, he had loaded up on tasks, but nothing returned in a couple of weeks already and another 4-5 weeks to just sit and wait for those WUs to timeout and eventually are being re-assigned. And he had a "top notch" crunching machine, while I have currently ony two rather mediocre systems crunching for S@H.

I think that the problems are rather self-induced by those folks with their high-end crunching machines, boasting all over the place how they "load up" of week(s) of WUs. But then they aren't likely the ones who will admit to that...

Ralf

My take is the opposite, I always seem to be waiting on old slow machines but then again there's also "Murphy's Law" which effects both old and new machines, even "Top notch" machines suffer from the same problems as old ones and just plain fail either due to lightening strikes, power surges, faulty parts or just because they wern't properly built in the first place.

Also take into account that there is also a lot of ghosts floating about the system again even with resends turned on.

All these things are just a fact of life so stop worry about it as it'll only give ya's grey hair (I know as I've plenty of them). ;)

Cheers.
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2245
Credit: 8,589,438
RAC: 4,316
United States
Message 1200307 - Posted: 27 Feb 2012, 11:17:25 UTC - in response to Message 1200281.
Last modified: 27 Feb 2012, 11:33:17 UTC

My take is the opposite, I always seem to be waiting on old slow machines but then again there's also "Murphy's Law" which effects both old and new machines, even "Top notch" machines suffer from the same problems as old ones and just plain fail either due to lightening strikes, power surges, faulty parts or just because they wern't properly built in the first place.

Also take into account that there is also a lot of ghosts floating about the system again even with resends turned on.

All these things are just a fact of life so stop worry about it as it'll only give ya's grey hair (I know as I've plenty of them). ;)

Cheers.

However.. I would have to venture to guess that some of these "top notch" machines have a somewhat higher likelihood of having a backed-up copy of the data dir (maybe with an empty cache?). Most importantly, the files that say what computer it is. Even if there was some catastrophic failure, the HDD is almost always read-able, and the computer-ID files can be pulled off of it once the repairs have been made.

In both of those situations, a catastrophic failure of that machine.. and it can be rebuilt, and the proper files thrown into the data dir, and they'll get all their old tasks back as resends. Or at least most of them.

Bottom line.. pro-tip: make a back-up of your data dir now (minus all the actual WUs)! Never know when you'll need it.

Completely totally unrelated question: If I were to decide to upgrade from 6.2.19 to 6.10.58/60 (which one is better, btw?), does the <no_gpus> cc_config flag prevent BOINC from even seeing GPUs entirely? ie: on the computer list.. will the GPU column stay blank? Only reason I'm thinking of upgrading is because of what 6.2.19 sees for my OS. Other than that, it works fine, and I like that I can see how many seconds of work I'm asking for (I know there are debug/log flags that can activate that on the newer stuff.. but I like that I don't have to). I do kind of miss what 5.x had for the average download/upload rate when a transfer completed, but it's not that important.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4067
Credit: 32,879,098
RAC: 6,797
United Kingdom
Message 1200311 - Posted: 27 Feb 2012, 11:43:19 UTC - in response to Message 1200307.

Completely totally unrelated question: If I were to decide to upgrade from 6.2.19 to 6.10.58/60 (which one is better, btw?), does the <no_gpus> cc_config flag prevent BOINC from even seeing GPUs entirely? ie: on the computer list.. will the GPU column stay blank? Only reason I'm thinking of upgrading is because of what 6.2.19 sees for my OS. Other than that, it works fine, and I like that I can see how many seconds of work I'm asking for (I know there are debug/log flags that can activate that on the newer stuff.. but I like that I don't have to). I do kind of miss what 5.x had for the average download/upload rate when a transfer completed, but it's not that important.

It really doesn't matter if you go for 6.10.58 or 6.10.60, the only difference with 6.10.60 is that it has a screen saver fix:

• Linux: Project list issue in the attach wizard that lead to a crash.
• SCR: Follow the Mac's lead and gracefully exit the Data Management thread. Preserve the handle to take more drastic actions should that not work.
• SCR: Fix compile breaks.


The <no_gpus>1</no_gpus> option stops Boinc from even looking for GPUs.

Claggy

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11161
Credit: 13,934,981
RAC: 12,821
United States
Message 1200359 - Posted: 27 Feb 2012, 15:32:32 UTC - in response to Message 1200056.

- I have NO dedicated crunchers. It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches.

I have a 24/7 broadband connection. My i7's limits of 400 each for CPU and GPU tend to result in about a 5-day turnaround (depending on how much Einstein it's doing, which is affected by Seti server outages; right now it has a huge pile of Einstein, running 8 at a time in high-priority and letting Seti run only on the GPU). If everything is running smoothly, the same scheduler request that reports a finished task will also get a new one downloaded. It's not that it actually takes the i7 5 days to crunch the unit, simply that there are 5 days' worth of other tasks ahead of it in line. Once a particular task's turn comes up, it crunches in anything from a few minutes to a few hours and, if possible, is reported shortly thereafter.

FWIW, my i7 was bought with the intention of using it for other things besides Boinc. I even bought Win 7 Pro and installed it as a dual boot (because the video editing program I bought only runs on Pro). But that hasn't happened yet. There are other parts of my life that I need to straighten out before I can sit down and really play with that.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11161
Credit: 13,934,981
RAC: 12,821
United States
Message 1200361 - Posted: 27 Feb 2012, 15:34:55 UTC - in response to Message 1199668.

The humor was very much appreciated! I took the thread down because the problem was resolved. I just un-hid it.

I appreciate that you appreciate the humor, and that you read these forums. On a Saturday, no less.
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile Anthony Arbuzoff
Volunteer tester
Avatar
Send message
Joined: 6 Apr 00
Posts: 204
Credit: 2,479,497
RAC: 2,368
Russia
Message 1200369 - Posted: 27 Feb 2012, 15:46:57 UTC

About eight of Monday morning at the Berkeley. Guys, wake up and return to us our result status page, please!
____________

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3404
Credit: 19,587,230
RAC: 18,754
Sweden
Message 1200371 - Posted: 27 Feb 2012, 16:00:02 UTC - in response to Message 1200369.

About eight of Monday morning at the Berkeley. Guys, wake up and return to us our result status page, please!


Yes, the result pages are really missed. It's the only way I can see if my machines produce a lot of errors. It's also nice to have for other reasons, but the ONLY way to early catch any computer that's going crazy and producing crap results.

So, please turn on the result pages.
____________

Profile Anthony Arbuzoff
Volunteer tester
Avatar
Send message
Joined: 6 Apr 00
Posts: 204
Credit: 2,479,497
RAC: 2,368
Russia
Message 1200392 - Posted: 27 Feb 2012, 17:29:47 UTC

They don't hurry up at all :( C'mon, pray turn that great tumbler switch on :)))
____________

Kevin Olley
Send message
Joined: 3 Aug 99
Posts: 368
Credit: 35,232,287
RAC: 1,772
United Kingdom
Message 1200394 - Posted: 27 Feb 2012, 17:40:28 UTC

Cricket looks pretty smooth, I hope someones looking out for a yellow thing:-)



____________
Kevin


Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6783
Credit: 92,944,278
RAC: 75,877
Australia
Message 1200420 - Posted: 27 Feb 2012, 20:05:26 UTC - in response to Message 1200394.

Cricket looks pretty smooth, I hope someones looking out for a yellow thing:-)

The shotgun is loaded and ready for that critter. ;)

Cheers.
____________

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1200430 - Posted: 27 Feb 2012, 20:31:15 UTC

Too late! Yellow rubber duck has arrived, as requested.
Splitters are down.
____________

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3404
Credit: 19,587,230
RAC: 18,754
Sweden
Message 1200437 - Posted: 27 Feb 2012, 20:49:46 UTC - in response to Message 1200430.

Too late! Yellow rubber duck has arrived, as requested.
Splitters are down.


Indeed so, and from here on it's downhill all the way until after tomorrows outage.
____________

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Panic Mode On (69) Server problems?

Copyright © 2014 University of California