Panic Mode On (83) Server Problems?

Message boards : Number crunching : Panic Mode On (83) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 21 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1358473 - Posted: 19 Apr 2013, 8:43:54 UTC - in response to Message 1358464.  
Last modified: 19 Apr 2013, 8:45:04 UTC

The limits are there to prevent issues from arising in the first place. I fail to see how raising them would help the project at all.

The project scope is clearly stated, to use spare cpu cycles that would otherwise be "wasted" running screensavers or other idle tasks. It was never intended to cope with dedicated computers or specially built crunching farms with multiple GPU's and demands for massive caches to sustain them running 24x7.

By constantly demanding that the limits be raised you are making your need to execute as many SETI tasks as possible more important than the well-being of the project itself.

The fix to this problem is larger workunits which will reduce the number of "in progress" entries in the database.


I realy agree with Mark, and please don´t blame who have the called "crunching farms" for the last year SETI DB problems.

Another thing, i don´t belive an small rise in the limits (from 100 WU per GPU hosts to 100 WU per GPU) will "crash" the SETI servers, in numbers if we have 1000 hosts (i´m very optimistic, sure is a lot less) with 2xGPU they will increace the total WU handled by 100k, nothing compared with the 3 MM number allready waiting in db.

Nobody is asking for a 10 day´s cache, just a cache that could hold for few hours (maybe a 1/4 day) instead of minutes like the one we have now, think about that.
ID: 1358473 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1358477 - Posted: 19 Apr 2013, 9:00:05 UTC - in response to Message 1358473.  
Last modified: 19 Apr 2013, 9:09:00 UTC

The limits are there to prevent issues from arising in the first place. I fail to see how raising them would help the project at all.

The project scope is clearly stated, to use spare cpu cycles that would otherwise be "wasted" running screensavers or other idle tasks. It was never intended to cope with dedicated computers or specially built crunching farms with multiple GPU's and demands for massive caches to sustain them running 24x7.

By constantly demanding that the limits be raised you are making your need to execute as many SETI tasks as possible more important than the well-being of the project itself.

The fix to this problem is larger workunits which will reduce the number of "in progress" entries in the database.


I realy agree with Mark, and please don´t blame who have the called "crunching farms" for the last year SETI DB problems.

Another thing, i don´t belive an small rise in the limits (from 100 WU per GPU hosts to 100 WU per GPU) will "crash" the SETI servers, in numbers if we have 1000 hosts (i´m very optimistic, sure is a lot less) with 2xGPU they will increace the total WU handled by 100k, nothing compared with the 3 MM number allready waiting in db.

Nobody is asking for a 10 day´s cache, just a cache that could hold for few hours (maybe a 1/4 day) instead of minutes like the one we have now, think about that.

I don't have all the answers......
But, let me tell you this much.

I am holding my own with mostly 2-3 year old hardware. I pretty much know how to make it run. If I had the funds to go balls out......I could take on the best.
And.....please let this be known, because I am proud of it.....
I have done this 'at home' with my own funds, on my own power bill.
Not by running on a business's power or a bunch of school computers.

My own. Here in my home. Nothing but.

I have survived with the kind donations from a number of very kind souls, going back a few years, that have donated to me more than a few motherboards, a few PSUs, and many more than a few GPUs. I have just received a donation of more than a few 580s......
Which, as I have quipped before, might be trash to some, but gold to the kitties.

I can't afford to be cutting edge, my last power bill just arrived....
Over $550.00....for running all 9 rigs 24/7.

I have another donated mobo that I have apologized to the donor that I have not yet fully set up to run.......because I cannot afford to pay more for the juice to run another rig.

Y'all have been so kind.

Just know the kittyman is doing all he can for the project, and will continue to do so all long as he can.


More to respond to me?
My PM box is open to all....................
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1358477 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1358478 - Posted: 19 Apr 2013, 9:08:02 UTC

And the kitties have regained most of their caches.

Meow!!!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1358478 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1358479 - Posted: 19 Apr 2013, 9:10:28 UTC

Mark.

We all know how much you love and all you done for the success of this project.

Go kitties!

ID: 1358479 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1358481 - Posted: 19 Apr 2013, 9:13:40 UTC - in response to Message 1358479.  

Mark.

We all know how much you love and all you done for the success of this project.

Go kitties!

Well, it's all legend now...LOL.
Some do not appreciate it when I make some grandiose statements.
But when I do, I have been there, and done that.

I do not posture much. When I say I have been there and done that, I have done it.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1358481 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 1358535 - Posted: 19 Apr 2013, 11:37:28 UTC - in response to Message 1358477.  

I can't afford to be cutting edge, my last power bill just arrived....
Over $550.00....for running all 9 rigs 24/7.


Wow, cheap. When I was running SETI flat out, many GPUs, >200,000 RAC my electric bill was over $1000 per month. That was years ago though with less efficient GPUs.
--
Classic 82353 WU / 400979 h
ID: 1358535 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 286
Credit: 167,386,578
RAC: 0
India
Message 1358596 - Posted: 19 Apr 2013, 16:56:55 UTC

Am I glad that I am currently living in this part of the world. Electricity is included in the apartment rent, no matter how much I use. I pray it stays that way for a looooooong time.
______________

ID: 1358596 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1358602 - Posted: 19 Apr 2013, 17:12:03 UTC - in response to Message 1358481.  

Mark.

We all know how much you love and all you done for the success of this project.

Go kitties!

Well, it's all legend now...LOL.
Some do not appreciate it when I make some grandiose statements.
But when I do, I have been there, and done that.

I do not posture much. When I say I have been there and done that, I have done it.

Been there, done that, still saving up for the toaster... :D

SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1358602 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1358669 - Posted: 19 Apr 2013, 21:27:17 UTC - in response to Message 1358602.  


Still struggling to get work, splitters till not cranking up to match demand.
Grant
Darwin NT
ID: 1358669 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1358679 - Posted: 19 Apr 2013, 21:57:34 UTC - in response to Message 1358669.  


It is just a pity that we do not have caches ... it might smooth out some of the unhappiness that's around.
ID: 1358679 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1358682 - Posted: 19 Apr 2013, 22:01:01 UTC

All good here 5 machines with max tasks.
ID: 1358682 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1358700 - Posted: 19 Apr 2013, 22:50:26 UTC - in response to Message 1358682.  

All good here 5 machines with max tasks.

Down to 362 here of 400.
Every now & then i finally get some work which bumps it up, then dozens of "No tasks available" responses to work requests till the next bump up. But still unable to get the full 400 limit.
Grant
Darwin NT
ID: 1358700 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1358765 - Posted: 20 Apr 2013, 3:10:41 UTC - in response to Message 1358700.  

Results Ready to Send ... 0




ID: 1358765 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1358768 - Posted: 20 Apr 2013, 3:38:25 UTC - in response to Message 1358446.  
Last modified: 20 Apr 2013, 3:38:58 UTC

The limits are there to prevent issues from arising in the first place. I fail to see how raising them would help the project at all.

The project scope is clearly stated, to use spare cpu cycles that would otherwise be "wasted" running screensavers or other idle tasks. It was never intended to cope with dedicated computers or specially built crunching farms with multiple GPU's and demands for massive caches to sustain them running 24x7.

By constantly demanding that the limits be raised you are making your need to execute as many SETI tasks as possible more important than the well-being of the project itself you can use with those that stay.

The fix to this problem is larger workunits which will reduce the number of "in progress" entries in the database.


i'm for 10 time larger MB tasks.

IF that "spare cpu cycle" theory was real then get rid of RAC and Credit, and don't keep score. see how many stick around, and then that will be a valid argument.
ID: 1358768 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1358769 - Posted: 20 Apr 2013, 3:41:28 UTC - in response to Message 1358765.  

Results Ready to Send ... 0

It's been that way for almost 2 days, the splitters can't keep up with demand so the Ready-to-send buffer has shrunk down to 0.
Hence the multitude of "No tasks sent" messages, with the odd request resulting in work.

There also appears to have been a problem with downloads- traffic dropped off to bugger all for a while there, and i eneded up with all of my downloads in backoff mode. Hit re-rty a couple of times & they finally came through.
Grant
Darwin NT
ID: 1358769 · Report as offensive
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1358770 - Posted: 20 Apr 2013, 3:46:47 UTC - in response to Message 1358769.  

Results Ready to Send ... 0

It's been that way for almost 2 days, the splitters can't keep up with demand so the Ready-to-send buffer has shrunk down to 0.
Hence the multitude of "No tasks sent" messages, with the odd request resulting in work.

There also appears to have been a problem with downloads- traffic dropped off to bugger all for a while there, and i eneded up with all of my downloads in backoff mode. Hit re-rty a couple of times & they finally came through.

I've seen this odd download behavior also.

ID: 1358770 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1358771 - Posted: 20 Apr 2013, 3:50:48 UTC - in response to Message 1358768.  

The fix to this problem is larger workunits which will reduce the number of "in progress" entries in the database.


i'm for 10 time larger MB tasks.

The problem is that's a kludge, not a fix.
Several times over the years they're upped the sensitivity of the application, resulting in greatly increased processing times. But the fact is CPU & now GPUs continue to improve their performance at the same increadible rate they've been doing so for years.

The other option is more capable hardware on the server side, but once again that's not fixing the problem, just working around it.


The problem is the databse & the only real fix will probably be to re-design it from scratch.

One database for the in progress work- Waiting to be sent WUs, WUs returned waiting for valiadtion etc & another for the the work that has been completed & validated. The second database will continue to grow over time, the first will only ever be as as large as the amount of work that is in progress. That will allow people to have large caches, and the Scheduler & the databse won't be stuggling with the huge number of entires it needs to manage at present.
Grant
Darwin NT
ID: 1358771 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1358779 - Posted: 20 Apr 2013, 4:48:47 UTC - in response to Message 1358776.  

Wow! Now I've seen peak,

"Current result creation rate 67.6120/sec",

but sadly that isn't producing an increase in the ready to send cache though both my rigs are sitting on their limits.

Cheers.
ID: 1358779 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1358784 - Posted: 20 Apr 2013, 4:57:18 UTC - in response to Message 1358779.  

Wow! Now I've seen peak,

"Current result creation rate 67.6120/sec",

but sadly that isn't producing an increase in the ready to send cache though both my rigs are sitting on their limits.

Cheers.

Yeah, the problem is it's only doing brief bursts, then dropping down to even less than when there aren't shorties in the system.
Normally it'll peak at around 50, but hold it there for an hour or 3 till the buffer has built up, then drop down again. Lately it's been struggling to do even 40, where as i reckon 55/s or more would be needed to meet demand and build up the buffer.
Received in the last hour is usually around 70,000, it's been around 120,000 for a couple of days now with peaks up to 140,000. If it were able to meet demand i'd suspect it would have been 140,000 for the last few days with even higher peaks.
There's an amazing number of shorties in the system at the moment.
Grant
Darwin NT
ID: 1358784 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1358785 - Posted: 20 Apr 2013, 4:59:38 UTC - in response to Message 1358784.  

Wow! Now I've seen peak,

"Current result creation rate 67.6120/sec",

but sadly that isn't producing an increase in the ready to send cache though both my rigs are sitting on their limits.

Cheers.

Yeah, the problem is it's only doing brief bursts, then dropping down to even less than when there aren't shorties in the system.
Normally it'll peak at around 50, but hold it there for an hour or 3 till the buffer has built up, then drop down again. Lately it's been struggling to do even 40, where as i reckon 55/s or more would be needed to meet demand and build up the buffer.
Received in the last hour is usually around 70,000, it's been around 120,000 for a couple of days now with peaks up to 140,000. If it were able to meet demand i'd suspect it would have been 140,000 for the last few days with even higher peaks.
There's an amazing number of shorties in the system at the moment.

Actually that should have read "167.6120/sec" but for some reason the 1st digit went AWOL.

Cheers.
ID: 1358785 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (83) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.