Panic Mode On (100) Server Problems?

Message boards : Number crunching : Panic Mode On (100) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 32 · Next

AuthorMessage
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1727238 - Posted: 20 Sep 2015, 4:38:53 UTC - in response to Message 1727220.  

Good argument here for bumping up the per CPU/GPU cache above 100 WUs, to lessen the impact of these things.
Been out of work here for hours, and outages are going to happen from time to time.

Better argument for choosing an alternate project or several. My machines have been fully occupied all day.

Well, if there was alternate project crunching SETI data, or anything else I was interested in, I would most assuredly be signed up :)

Well, Einstein @ Home is crunching the same data as Seti from Areicbo, just looking for pulsars.

Yeah, I got another reference to that in Email, (probably trying to keep us on-topic here) and I do appreciate it.
But Richard's suggestion to look for other work, is very valid if my concern is 100% utilization of the hardware. My concern, though, is keeping SAH processing, so this doesn't address the issue that with the high performance of GPU processing, an absolute peg count limit of work is less effective than a relative limit based on performance in smoothing out the peaks and valleys of traffic based on outages and other factors.
Don't get me wrong. I think Seti does a wonderful job, and I very much appreciate the efforts that make it happen. Just think that this limit is a legacy of older days that should be re-examined. Chances are, there are a ton of reasons my suggestion isn't valid ...
ID: 1727238 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1727244 - Posted: 20 Sep 2015, 4:52:33 UTC - in response to Message 1727238.  

It just isn't going to ever change back to the way it was.
The project servers do better without maintaining mega WU caches on the users' rigs.
And the project has enough user capacity on tap to crunch anything they are able to send out.
So, other than massaging a user's wish to keep his own rigs running when the servers are down, there is no reason for the project to change this.

I know....LOL. Used to be a pet peeve of my own, but I now just accept it as the way it is, and continue to contribute as much as I possibly can.

If my rigs run dry during an outage, I guess I save a few pennies on the electric bill.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1727244 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1727245 - Posted: 20 Sep 2015, 4:53:59 UTC - in response to Message 1727244.  

If my rigs run dry during an outage, I guess I save a few pennies on the electric bill.

That's how I look at it.
Grant
Darwin NT
ID: 1727245 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30646
Credit: 53,134,872
RAC: 32
United States
Message 1727258 - Posted: 20 Sep 2015, 6:03:47 UTC - in response to Message 1727238.  

Yeah, I got another reference to that in Email, (probably trying to keep us on-topic here) and I do appreciate it.
But Richard's suggestion to look for other work, is very valid if my concern is 100% utilization of the hardware. My concern, though, is keeping SAH processing, so this doesn't address the issue that with the high performance of GPU processing, an absolute peg count limit of work is less effective than a relative limit based on performance in smoothing out the peaks and valleys of traffic based on outages and other factors.
Don't get me wrong. I think Seti does a wonderful job, and I very much appreciate the efforts that make it happen. Just think that this limit is a legacy of older days that should be re-examined. Chances are, there are a ton of reasons my suggestion isn't valid ...

Well, the limit was imposed IIRC because the result table was growing without bound. See the invalid host messaging thread for a good part of the reason. Bad GPU setups return thousands of invalids in seconds if there isn't a limit. All of those eat an entry in the table, until two results validate against each other or enough bad results come back. As many are still crunching on CPU those returns can take a while. There is only so much space on disk for the table. So they put a hard download limit in place so that they wouldn't run out of disk space. Now they are also running into limits on how much data they transfer to/from the disk in a reasonable amount of time.

If you want to get them some kit that is a heck of a lot faster in I/O bandwidth and a pile of hard disk space, talk to them so you get it spec'd right.

However, if you haven't been reading Matt's updates, we are now crunching data faster than they are collecting it. So we will run dry from time to time. Unlimited will just make us reach that point sooner.
ID: 1727258 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1727263 - Posted: 20 Sep 2015, 6:23:18 UTC - in response to Message 1727258.  

Yeah, I got another reference to that in Email, (probably trying to keep us on-topic here) and I do appreciate it.
But Richard's suggestion to look for other work, is very valid if my concern is 100% utilization of the hardware. My concern, though, is keeping SAH processing, so this doesn't address the issue that with the high performance of GPU processing, an absolute peg count limit of work is less effective than a relative limit based on performance in smoothing out the peaks and valleys of traffic based on outages and other factors.
Don't get me wrong. I think Seti does a wonderful job, and I very much appreciate the efforts that make it happen. Just think that this limit is a legacy of older days that should be re-examined. Chances are, there are a ton of reasons my suggestion isn't valid ...

Well, the limit was imposed IIRC because the result table was growing without bound. See the invalid host messaging thread for a good part of the reason. Bad GPU setups return thousands of invalids in seconds if there isn't a limit. All of those eat an entry in the table, until two results validate against each other or enough bad results come back. As many are still crunching on CPU those returns can take a while. There is only so much space on disk for the table. So they put a hard download limit in place so that they wouldn't run out of disk space. Now they are also running into limits on how much data they transfer to/from the disk in a reasonable amount of time.

If you want to get them some kit that is a heck of a lot faster in I/O bandwidth and a pile of hard disk space, talk to them so you get it spec'd right.

However, if you haven't been reading Matt's updates, we are now crunching data faster than they are collecting it. So we will run dry from time to time. Unlimited will just make us reach that point sooner.

Here's the message Matt posted right after the move to the Campus Data Center/CoLo, where he cites the results server I/O issues as the main reason to throttle data throughput, which I presume includes the limits on GPU tasks. And as Gary points out, until they get the splitters and applications written to crunch Green Bank data, we may, from time to time, actually RUN OUT of data to crunch. So no, I would NOT expect to see any changes in the GPU task limits in the near future. You can't always get what you want.....
Donald
Infernal Optimist / Submariner, retired
ID: 1727263 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1727646 - Posted: 21 Sep 2015, 23:16:50 UTC

So SSP looks good, but wondering if I'm the only one getting uploads rejected?
ID: 1727646 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 1727648 - Posted: 21 Sep 2015, 23:18:02 UTC - in response to Message 1727646.  
Last modified: 21 Sep 2015, 23:19:09 UTC

So SSP looks good, but wondering if I'm the only one getting uploads rejected?


Getting it here too

[EDIT] Mine just uploaded


ID: 1727648 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1727651 - Posted: 21 Sep 2015, 23:46:40 UTC

Short-lived issue, whatever it was. Late afternoon traffic spike, perhaps?
ID: 1727651 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1727917 - Posted: 22 Sep 2015, 15:13:25 UTC

VLAR storm, trouble getting GPU work...
ID: 1727917 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1727922 - Posted: 22 Sep 2015, 15:37:30 UTC - in response to Message 1727917.  

I noticed that too, but just filled up before maintenance.
ID: 1727922 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1727930 - Posted: 22 Sep 2015, 16:15:42 UTC

Today's outrage is getting a late start and APs continue to not be split.
ID: 1727930 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1727946 - Posted: 22 Sep 2015, 16:40:40 UTC - in response to Message 1727930.  
Last modified: 22 Sep 2015, 16:42:54 UTC

Today's outrage is getting a late start and APs continue to not be split.


Very soon, no MB files are splitted right now...
ID: 1727946 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1728237 - Posted: 23 Sep 2015, 17:31:35 UTC

A day later MBs seem to be splitting well and APs continue to not be split.
ID: 1728237 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1728404 - Posted: 24 Sep 2015, 4:07:47 UTC - in response to Message 1728237.  

Ready-to-send buffer isn't full, yet the MB splitters output has taken a dive.
Grant
Darwin NT
ID: 1728404 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1728584 - Posted: 24 Sep 2015, 17:57:35 UTC

Anyone else getting slow downloads?
Don't take life too seriously, as you'll never come out of it alive!
ID: 1728584 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1728585 - Posted: 24 Sep 2015, 17:59:12 UTC

Dunno...
But the kitties have the caches full, so even if so, it's not really hampering workfetch here so far.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1728585 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1728606 - Posted: 24 Sep 2015, 19:02:21 UTC - in response to Message 1728584.  

I have noticed some come in really slow, then the next request fast, and finish far before the first one, don't know why.
ID: 1728606 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1728719 - Posted: 25 Sep 2015, 5:04:59 UTC - in response to Message 1728584.  

Anyone else getting slow downloads?

Looking at my logs, a few hours ago I was getting some sticky downloads.
Grant
Darwin NT
ID: 1728719 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1729055 - Posted: 26 Sep 2015, 5:53:56 UTC - in response to Message 1728719.  

Anyone else getting slow downloads?

Looking at my logs, a few hours ago I was getting some sticky downloads.

And looking at the Server Status page that may be due to one of the download servers being off line.


And it appears the AP assimilators have given up- the backlog continues to grow. And the AP validators don't appear to be doing much, although the Server Status page shows them as working.
Grant
Darwin NT
ID: 1729055 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1729060 - Posted: 26 Sep 2015, 6:07:04 UTC - in response to Message 1729055.  
Last modified: 26 Sep 2015, 6:07:46 UTC

Anyone else getting slow downloads?

Looking at my logs, a few hours ago I was getting some sticky downloads.

And looking at the Server Status page that may be due to one of the download servers being off line.

And it appears the AP assimilators have given up- the backlog continues to grow. And the AP validators don't appear to be doing much, although the Server Status page shows them as working.

Did you also notice that the Astropulse Science Database on marvin is disabled? Can't assimilate if the database is off-line. Whatever happened, it will get fixed soon enough, and APs will validate and get credit.
Donald
Infernal Optimist / Submariner, retired
ID: 1729060 · Report as offensive
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 32 · Next

Message boards : Number crunching : Panic Mode On (100) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.