Panic Mode On (69) Server problems?

Message boards : Number crunching : Panic Mode On (69) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1200035 - Posted: 26 Feb 2012, 17:20:55 UTC - in response to Message 1200003.  

Hi Claggy,
Thanks, dont know much about the in[f]ternal workings of Boinc or the drivers.

Cheers,

That throws most people for a loop the first time they see it. In reality ,depending on the speed on your GPU & GPU, the processing of a task will spend much less than .04% of its time on the CPU. As there is a ~30 second "load time" to get GPU tasks started.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1200035 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1200038 - Posted: 26 Feb 2012, 17:24:35 UTC - in response to Message 1200035.  

Hi Claggy,
Thanks, dont know much about the in[f]ternal workings of Boinc or the drivers.

Cheers,

That throws most people for a loop the first time they see it. In reality ,depending on the speed on your GPU & GPU, the processing of a task will spend much less than .04% of its time on the CPU. As there is a ~30 second "load time" to get GPU tasks started.

That depends on the host, drivers, and apps being used.
My top rig has aprox. 9-10 second load time to initialize a GPU task.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1200038 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1200044 - Posted: 26 Feb 2012, 17:42:29 UTC

Hmmm....
Stats exporting seems to be held up. Nothing to Boincstats for yesterday, I'm gonna soon be approaching a million creds since the last update.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1200044 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1200056 - Posted: 26 Feb 2012, 18:20:21 UTC - in response to Message 1199928.  
Last modified: 26 Feb 2012, 18:21:47 UTC

My single-core machine is nowhere near fast, but it does well enough. A 2.5-day cache is ~8 VLARs, or ~20 shorties, or ~14 "average" MBs. A 10-day cache of AP is about 5.

Not that we're trying to discourage anyone from participating, but realistically.. if a computer can't do a single MB in less than 24 hours (actual crunching time, not turn-around time), buh-bye. If it means helping the database be as lean as it can be and not have to wait ~40 days for a task to complete, then so be it.

My G4 boxes take about 55 hours for a mid-range MB, 35-40 for a VLAR, and 10-15 for a "shorty", depending on how much I am using that computer - I have NO dedicated crunchers. It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.
Donald
Infernal Optimist / Submariner, retired
ID: 1200056 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1200087 - Posted: 26 Feb 2012, 19:04:29 UTC - in response to Message 1200056.  

I wonder if the folks that load up a lot of WU and then just dissapear, might not be doing so because they run into the sort of problems we have been seeing recently, and are to some extent still an issue.

To those more used to 'instant' gratification, the lack thereof makes for wanderlust, and a move to pastures new, where credits are gained more rapidly and stats [which are the only 'visible' reward for effort] are seen to increase at a rapid rate. When they find it a non stop hassle to d/l work, servers fall over dramatically now and then and there is a regular outage on Tuesdays which is posted as a 3 DAY outage, rather than the current 3 hour one..

Methinks they just say what the hell and either dump the whole idea of distributed computing or move to easier pastures..

Some people just cant take setbacks in thier stride:-) Other thrive on them.
I have learned over time that what cant be cured must be endured:-/ And unless the whole proposition is on a loosing wicket, I just meander on..

But if its a case of diminishing returns and no sign of improvement even I will
reconsider my options and set a limit on how long to persevere..

All things considered, each project is part of a greater whole, ie:- distrubuted computing, whic has been seen as a way of utilising resources that were being wasted..

However, the original concept has been overshadowed by those who build powerful rigs and farms dedicated to crunching WU.. Not quite the original concept is it?
Still I must admit to being as guilty as anyone else in so doing..

Its the lure of those damm stats wot is to blame I says. Not to mention all those downloadable certificates of immense cobblestone gratification:-)

Cheers,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1200087 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1200110 - Posted: 26 Feb 2012, 19:34:42 UTC

Ahhh!! Kitties spy Boincstats has picked up fresh stats for tomorrow's update.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1200110 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1200174 - Posted: 26 Feb 2012, 22:41:46 UTC - in response to Message 1200056.  

My G4 boxes take about 55 hours for a mid-range MB, 35-40 for a VLAR, and 10-15 for a "shorty", depending on how much I am using that computer - I have NO dedicated crunchers. It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.

Very true. I remember a few months back when things were mostly smooth, even with my 10-day cache, WUs that I got 10 days ago and return were STILL waiting for quite possibly the slowest allowed machines to crunch AP. Oh, and guess what? They would never return them anyway.

I know everybody gets stuck with bad wingmates from time to time, but I felt like I was being punished for some reason. Nearly every AP I fought the congested pipe for ended up having a wingmate that had like a 1.6GHz P4, running stock, and their cache was two tasks. RAC of like 15. Or I would get one where their computer was created like a day before they became my wingmate and had hundreds of tasks.. and never returned a single one of them. Having to wait 25/45 days for that kind of nonsense is something that can be fixed.

Just drop the deadline for MB back down to 3 weeks like it used to be. 21 days is more than plenty of time for someone to ask for work, download it, crunch it in 2-5 days, have internet only one day per week and still return it before the deadline.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1200174 · Report as offensive
Profile Warren Kozey
Avatar

Send message
Joined: 6 Jul 99
Posts: 54
Credit: 5,026,721
RAC: 0
Canada
Message 1200189 - Posted: 26 Feb 2012, 23:54:39 UTC - in response to Message 1198679.  

I think he means this one every one the runs SETI should have have one in their back pocket and should have read it 100 times.


It should be a REQUIREMENT !!!
Gimme BEER and WU's!!!!
ID: 1200189 · Report as offensive
Seahawk
Volunteer tester
Avatar

Send message
Joined: 8 Jan 08
Posts: 937
Credit: 8,157,029
RAC: 5
United States
Message 1200232 - Posted: 27 Feb 2012, 2:50:41 UTC

I just watched half a page worth of WUs d/l without a flaw. First time in over a month. Then I noticed that no APs are being sent out as the tapes are all done.
I used to be a cruncher like you, then I took an arrow to the knee.
ID: 1200232 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1200242 - Posted: 27 Feb 2012, 3:46:57 UTC - in response to Message 1200232.  

I just watched half a page worth of WUs d/l without a flaw. First time in over a month. Then I noticed that no APs are being sent out as the tapes are all done.


I went from 32 WU ready to 732 wu ready is about 2 hours.

ID: 1200242 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1200264 - Posted: 27 Feb 2012, 6:05:32 UTC - in response to Message 1200174.  

My G4 boxes take about 55 hours for a mid-range MB, 35-40 for a VLAR, and 10-15 for a "shorty", depending on how much I am using that computer - I have NO dedicated crunchers. It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.

Very true. I remember a few months back when things were mostly smooth, even with my 10-day cache, WUs that I got 10 days ago and return were STILL waiting for quite possibly the slowest allowed machines to crunch AP. Oh, and guess what? They would never return them anyway.

I know everybody gets stuck with bad wingmates from time to time, but I felt like I was being punished for some reason. Nearly every AP I fought the congested pipe for ended up having a wingmate that had like a 1.6GHz P4, running stock, and their cache was two tasks. RAC of like 15. Or I would get one where their computer was created like a day before they became my wingmate and had hundreds of tasks.. and never returned a single one of them. Having to wait 25/45 days for that kind of nonsense is something that can be fixed.

Just drop the deadline for MB back down to 3 weeks like it used to be. 21 days is more than plenty of time for someone to ask for work, download it, crunch it in 2-5 days, have internet only one day per week and still return it before the deadline.

3-4 weeks (down from the current 6.5) I could live with. Much more reasonable than some of the other suggestions I've seen here recently.

Donald
Infernal Optimist / Submariner, retired
ID: 1200264 · Report as offensive
TPCBF

Send message
Joined: 18 May 99
Posts: 54
Credit: 4,594,980
RAC: 0
United States
Message 1200266 - Posted: 27 Feb 2012, 6:15:43 UTC - in response to Message 1200056.  

It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.
+1

That's my take on this problem as well. A few weeks ago, I was almost scolded when mentioning this. Had WU already sitting in PV jail and when checking on the wingman, he had loaded up on tasks, but nothing returned in a couple of weeks already and another 4-5 weeks to just sit and wait for those WUs to timeout and eventually are being re-assigned. And he had a "top notch" crunching machine, while I have currently ony two rather mediocre systems crunching for S@H.

I think that the problems are rather self-induced by those folks with their high-end crunching machines, boasting all over the place how they "load up" of week(s) of WUs. But then they aren't likely the ones who will admit to that...

Ralf
ID: 1200266 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1200280 - Posted: 27 Feb 2012, 7:00:34 UTC - in response to Message 1200266.  
Last modified: 27 Feb 2012, 7:02:28 UTC

I would think that anyone that posts here on the forums boasting about power crunchers, probably knows the inner workings of SETI/BOINC, and it hardly the problem group you refer to. I'm not saying those users don't exist, but I'd say they are not the ones taking active roles on these forums. I've been "back" to SETI for almost 2 years now, and don't plan on going anywhere. Even if I do, I will run my cache's down first, as I had to do last year when my AC system failed in August.

Second, I wish I could "Load up on week(s)" of tasks. I can to my 10 day target on the CPU if I pick up enough AP's, but even my slowest GPU with its 400 WU limit is only about 1.5-2days. MB only on the CPU's and I get maybe 4 days on the Q8300.

Also, what is the difference, really, between a power user that holds a 10 day cache and reports it regularly, and a ancient machine that takes several days to crunch a single task? They are both effectively "holding" that WU for several days, one machine just happens to need all of that CPU time to crunch it.
ID: 1200280 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1200281 - Posted: 27 Feb 2012, 7:03:32 UTC - in response to Message 1200266.  

It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches. I have also had wingmen who were new to the project, loaded up on Tasks, then never completed them and left them to time-out (45 days + resend time). I suspect these folks are a significant part of the problem with database bloat.
+1

That's my take on this problem as well. A few weeks ago, I was almost scolded when mentioning this. Had WU already sitting in PV jail and when checking on the wingman, he had loaded up on tasks, but nothing returned in a couple of weeks already and another 4-5 weeks to just sit and wait for those WUs to timeout and eventually are being re-assigned. And he had a "top notch" crunching machine, while I have currently ony two rather mediocre systems crunching for S@H.

I think that the problems are rather self-induced by those folks with their high-end crunching machines, boasting all over the place how they "load up" of week(s) of WUs. But then they aren't likely the ones who will admit to that...

Ralf

My take is the opposite, I always seem to be waiting on old slow machines but then again there's also "Murphy's Law" which effects both old and new machines, even "Top notch" machines suffer from the same problems as old ones and just plain fail either due to lightening strikes, power surges, faulty parts or just because they wern't properly built in the first place.

Also take into account that there is also a lot of ghosts floating about the system again even with resends turned on.

All these things are just a fact of life so stop worry about it as it'll only give ya's grey hair (I know as I've plenty of them). ;)

Cheers.
ID: 1200281 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1200307 - Posted: 27 Feb 2012, 11:17:25 UTC - in response to Message 1200281.  
Last modified: 27 Feb 2012, 11:33:17 UTC

My take is the opposite, I always seem to be waiting on old slow machines but then again there's also "Murphy's Law" which effects both old and new machines, even "Top notch" machines suffer from the same problems as old ones and just plain fail either due to lightening strikes, power surges, faulty parts or just because they wern't properly built in the first place.

Also take into account that there is also a lot of ghosts floating about the system again even with resends turned on.

All these things are just a fact of life so stop worry about it as it'll only give ya's grey hair (I know as I've plenty of them). ;)

Cheers.

However.. I would have to venture to guess that some of these "top notch" machines have a somewhat higher likelihood of having a backed-up copy of the data dir (maybe with an empty cache?). Most importantly, the files that say what computer it is. Even if there was some catastrophic failure, the HDD is almost always read-able, and the computer-ID files can be pulled off of it once the repairs have been made.

In both of those situations, a catastrophic failure of that machine.. and it can be rebuilt, and the proper files thrown into the data dir, and they'll get all their old tasks back as resends. Or at least most of them.

Bottom line.. pro-tip: make a back-up of your data dir now (minus all the actual WUs)! Never know when you'll need it.

Completely totally unrelated question: If I were to decide to upgrade from 6.2.19 to 6.10.58/60 (which one is better, btw?), does the <no_gpus> cc_config flag prevent BOINC from even seeing GPUs entirely? ie: on the computer list.. will the GPU column stay blank? Only reason I'm thinking of upgrading is because of what 6.2.19 sees for my OS. Other than that, it works fine, and I like that I can see how many seconds of work I'm asking for (I know there are debug/log flags that can activate that on the newer stuff.. but I like that I don't have to). I do kind of miss what 5.x had for the average download/upload rate when a transfer completed, but it's not that important.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1200307 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1200311 - Posted: 27 Feb 2012, 11:43:19 UTC - in response to Message 1200307.  

Completely totally unrelated question: If I were to decide to upgrade from 6.2.19 to 6.10.58/60 (which one is better, btw?), does the <no_gpus> cc_config flag prevent BOINC from even seeing GPUs entirely? ie: on the computer list.. will the GPU column stay blank? Only reason I'm thinking of upgrading is because of what 6.2.19 sees for my OS. Other than that, it works fine, and I like that I can see how many seconds of work I'm asking for (I know there are debug/log flags that can activate that on the newer stuff.. but I like that I don't have to). I do kind of miss what 5.x had for the average download/upload rate when a transfer completed, but it's not that important.

It really doesn't matter if you go for 6.10.58 or 6.10.60, the only difference with 6.10.60 is that it has a screen saver fix:

• Linux: Project list issue in the attach wizard that lead to a crash.
• SCR: Follow the Mac's lead and gracefully exit the Data Management thread. Preserve the handle to take more drastic actions should that not work.
• SCR: Fix compile breaks.


The <no_gpus>1</no_gpus> option stops Boinc from even looking for GPUs.

Claggy
ID: 1200311 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1200359 - Posted: 27 Feb 2012, 15:32:32 UTC - in response to Message 1200056.  

- I have NO dedicated crunchers. It has been my experience (YMMV) that when "I" have to wait more than a week to get a Task validated and credited, my wingman is a much newer/faster box with one or more GPUs, a cache of over 1000 Tasks, and an average return time of 20-30 days.

This might indicate he does not have a 24/7 broadband connection, and reports and requests new work in large batches.

I have a 24/7 broadband connection. My i7's limits of 400 each for CPU and GPU tend to result in about a 5-day turnaround (depending on how much Einstein it's doing, which is affected by Seti server outages; right now it has a huge pile of Einstein, running 8 at a time in high-priority and letting Seti run only on the GPU). If everything is running smoothly, the same scheduler request that reports a finished task will also get a new one downloaded. It's not that it actually takes the i7 5 days to crunch the unit, simply that there are 5 days' worth of other tasks ahead of it in line. Once a particular task's turn comes up, it crunches in anything from a few minutes to a few hours and, if possible, is reported shortly thereafter.

FWIW, my i7 was bought with the intention of using it for other things besides Boinc. I even bought Win 7 Pro and installed it as a dual boot (because the video editing program I bought only runs on Pro). But that hasn't happened yet. There are other parts of my life that I need to straighten out before I can sit down and really play with that.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1200359 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1200361 - Posted: 27 Feb 2012, 15:34:55 UTC - in response to Message 1199668.  

The humor was very much appreciated! I took the thread down because the problem was resolved. I just un-hid it.

I appreciate that you appreciate the humor, and that you read these forums. On a Saturday, no less.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1200361 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1200369 - Posted: 27 Feb 2012, 15:46:57 UTC

About eight of Monday morning at the Berkeley. Guys, wake up and return to us our result status page, please!
ID: 1200369 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1200392 - Posted: 27 Feb 2012, 17:29:47 UTC

They don't hurry up at all :( C'mon, pray turn that great tumbler switch on :)))
ID: 1200392 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Panic Mode On (69) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.