Panic Mode On (100) Server Problems?

Author	Message
Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1727238 - Posted: 20 Sep 2015, 4:38:53 UTC - in response to Message 1727220. Good argument here for bumping up the per CPU/GPU cache above 100 WUs, to lessen the impact of these things. Been out of work here for hours, and outages are going to happen from time to time. Better argument for choosing an alternate project or several. My machines have been fully occupied all day. Well, if there was alternate project crunching SETI data, or anything else I was interested in, I would most assuredly be signed up :) Well, Einstein @ Home is crunching the same data as Seti from Areicbo, just looking for pulsars. Yeah, I got another reference to that in Email, (probably trying to keep us on-topic here) and I do appreciate it. But Richard's suggestion to look for other work, is very valid if my concern is 100% utilization of the hardware. My concern, though, is keeping SAH processing, so this doesn't address the issue that with the high performance of GPU processing, an absolute peg count limit of work is less effective than a relative limit based on performance in smoothing out the peaks and valleys of traffic based on outages and other factors. Don't get me wrong. I think Seti does a wonderful job, and I very much appreciate the efforts that make it happen. Just think that this limit is a legacy of older days that should be re-examined. Chances are, there are a ton of reasons my suggestion isn't valid ... ID: 1727238 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1727244 - Posted: 20 Sep 2015, 4:52:33 UTC - in response to Message 1727238. It just isn't going to ever change back to the way it was. The project servers do better without maintaining mega WU caches on the users' rigs. And the project has enough user capacity on tap to crunch anything they are able to send out. So, other than massaging a user's wish to keep his own rigs running when the servers are down, there is no reason for the project to change this. I know....LOL. Used to be a pet peeve of my own, but I now just accept it as the way it is, and continue to contribute as much as I possibly can. If my rigs run dry during an outage, I guess I save a few pennies on the electric bill. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1727244 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1727245 - Posted: 20 Sep 2015, 4:53:59 UTC - in response to Message 1727244. If my rigs run dry during an outage, I guess I save a few pennies on the electric bill. That's how I look at it. Grant Darwin NT ID: 1727245 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30646 Credit: 53,134,872 RAC: 32	Message 1727258 - Posted: 20 Sep 2015, 6:03:47 UTC - in response to Message 1727238. Yeah, I got another reference to that in Email, (probably trying to keep us on-topic here) and I do appreciate it. But Richard's suggestion to look for other work, is very valid if my concern is 100% utilization of the hardware. My concern, though, is keeping SAH processing, so this doesn't address the issue that with the high performance of GPU processing, an absolute peg count limit of work is less effective than a relative limit based on performance in smoothing out the peaks and valleys of traffic based on outages and other factors. Don't get me wrong. I think Seti does a wonderful job, and I very much appreciate the efforts that make it happen. Just think that this limit is a legacy of older days that should be re-examined. Chances are, there are a ton of reasons my suggestion isn't valid ... Well, the limit was imposed IIRC because the result table was growing without bound. See the invalid host messaging thread for a good part of the reason. Bad GPU setups return thousands of invalids in seconds if there isn't a limit. All of those eat an entry in the table, until two results validate against each other or enough bad results come back. As many are still crunching on CPU those returns can take a while. There is only so much space on disk for the table. So they put a hard download limit in place so that they wouldn't run out of disk space. Now they are also running into limits on how much data they transfer to/from the disk in a reasonable amount of time. If you want to get them some kit that is a heck of a lot faster in I/O bandwidth and a pile of hard disk space, talk to them so you get it spec'd right. However, if you haven't been reading Matt's updates, we are now crunching data faster than they are collecting it. So we will run dry from time to time. Unlimited will just make us reach that point sooner. ID: 1727258 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1727263 - Posted: 20 Sep 2015, 6:23:18 UTC - in response to Message 1727258. Yeah, I got another reference to that in Email, (probably trying to keep us on-topic here) and I do appreciate it. But Richard's suggestion to look for other work, is very valid if my concern is 100% utilization of the hardware. My concern, though, is keeping SAH processing, so this doesn't address the issue that with the high performance of GPU processing, an absolute peg count limit of work is less effective than a relative limit based on performance in smoothing out the peaks and valleys of traffic based on outages and other factors. Don't get me wrong. I think Seti does a wonderful job, and I very much appreciate the efforts that make it happen. Just think that this limit is a legacy of older days that should be re-examined. Chances are, there are a ton of reasons my suggestion isn't valid ... Well, the limit was imposed IIRC because the result table was growing without bound. See the invalid host messaging thread for a good part of the reason. Bad GPU setups return thousands of invalids in seconds if there isn't a limit. All of those eat an entry in the table, until two results validate against each other or enough bad results come back. As many are still crunching on CPU those returns can take a while. There is only so much space on disk for the table. So they put a hard download limit in place so that they wouldn't run out of disk space. Now they are also running into limits on how much data they transfer to/from the disk in a reasonable amount of time. If you want to get them some kit that is a heck of a lot faster in I/O bandwidth and a pile of hard disk space, talk to them so you get it spec'd right. However, if you haven't been reading Matt's updates, we are now crunching data faster than they are collecting it. So we will run dry from time to time. Unlimited will just make us reach that point sooner. Here's the message Matt posted right after the move to the Campus Data Center/CoLo, where he cites the results server I/O issues as the main reason to throttle data throughput, which I presume includes the limits on GPU tasks. And as Gary points out, until they get the splitters and applications written to crunch Green Bank data, we may, from time to time, actually RUN OUT of data to crunch. So no, I would NOT expect to see any changes in the GPU task limits in the near future. You can't always get what you want..... Donald Infernal Optimist / Submariner, retired ID: 1727263 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1727646 - Posted: 21 Sep 2015, 23:16:50 UTC So SSP looks good, but wondering if I'm the only one getting uploads rejected? ID: 1727646 ·

Blurf Volunteer tester Send message Joined: 2 Sep 06 Posts: 8962 Credit: 12,678,685 RAC: 0	Message 1727648 - Posted: 21 Sep 2015, 23:18:02 UTC - in response to Message 1727646. Last modified: 21 Sep 2015, 23:19:09 UTC So SSP looks good, but wondering if I'm the only one getting uploads rejected? Getting it here too [EDIT] Mine just uploaded ID: 1727648 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1727651 - Posted: 21 Sep 2015, 23:46:40 UTC Short-lived issue, whatever it was. Late afternoon traffic spike, perhaps? ID: 1727651 ·

WezH Volunteer tester Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95	Message 1727917 - Posted: 22 Sep 2015, 15:13:25 UTC VLAR storm, trouble getting GPU work... ID: 1727917 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1727922 - Posted: 22 Sep 2015, 15:37:30 UTC - in response to Message 1727917. I noticed that too, but just filled up before maintenance. ID: 1727922 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1727930 - Posted: 22 Sep 2015, 16:15:42 UTC Today's outrage is getting a late start and APs continue to not be split. ID: 1727930 ·

WezH Volunteer tester Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95	Message 1727946 - Posted: 22 Sep 2015, 16:40:40 UTC - in response to Message 1727930. Last modified: 22 Sep 2015, 16:42:54 UTC Today's outrage is getting a late start and APs continue to not be split. Very soon, no MB files are splitted right now... ID: 1727946 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1728237 - Posted: 23 Sep 2015, 17:31:35 UTC A day later MBs seem to be splitting well and APs continue to not be split. ID: 1728237 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1728404 - Posted: 24 Sep 2015, 4:07:47 UTC - in response to Message 1728237. Ready-to-send buffer isn't full, yet the MB splitters output has taken a dive. Grant Darwin NT ID: 1728404 ·

Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0	Message 1728584 - Posted: 24 Sep 2015, 17:57:35 UTC Anyone else getting slow downloads? Don't take life too seriously, as you'll never come out of it alive! ID: 1728584 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1728585 - Posted: 24 Sep 2015, 17:59:12 UTC Dunno... But the kitties have the caches full, so even if so, it's not really hampering workfetch here so far. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1728585 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1728606 - Posted: 24 Sep 2015, 19:02:21 UTC - in response to Message 1728584. I have noticed some come in really slow, then the next request fast, and finish far before the first one, don't know why. ID: 1728606 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1728719 - Posted: 25 Sep 2015, 5:04:59 UTC - in response to Message 1728584. Anyone else getting slow downloads? Looking at my logs, a few hours ago I was getting some sticky downloads. Grant Darwin NT ID: 1728719 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1729055 - Posted: 26 Sep 2015, 5:53:56 UTC - in response to Message 1728719. Anyone else getting slow downloads? Looking at my logs, a few hours ago I was getting some sticky downloads. And looking at the Server Status page that may be due to one of the download servers being off line. And it appears the AP assimilators have given up- the backlog continues to grow. And the AP validators don't appear to be doing much, although the Server Status page shows them as working. Grant Darwin NT ID: 1729055 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1729060 - Posted: 26 Sep 2015, 6:07:04 UTC - in response to Message 1729055. Last modified: 26 Sep 2015, 6:07:46 UTC Anyone else getting slow downloads? Looking at my logs, a few hours ago I was getting some sticky downloads. And looking at the Server Status page that may be due to one of the download servers being off line. And it appears the AP assimilators have given up- the backlog continues to grow. And the AP validators don't appear to be doing much, although the Server Status page shows them as working. Did you also notice that the Astropulse Science Database on marvin is disabled? Can't assimilate if the database is off-line. Whatever happened, it will get fixed soon enough, and APs will validate and get credit. Donald Infernal Optimist / Submariner, retired ID: 1729060 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.