Panic Mode On (105) Server Problems?

Author	Message
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1857111 - Posted: 22 Mar 2017, 23:44:39 UTC - in response to Message 1857089. My machines were getting dribs and drabs up to about an hour ago. Since then all have filled up so I guess the bottleneck is slowly disappearing (Keith, Grant, appreciate that you have issues that others do not appear to be having). My only issue with this outage was that I misjudged its depth and so didn't squirrel enough WUs aside to try and get all the way through without being affected by the "bun fight" at the back end. I wish we could get a 1 or 2 day buffer of WUs, that would be really helpful. As a side note, I also do POGS every now and then, and they run with a standard 5 day buffer. ID: 1857111 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1857113 - Posted: 22 Mar 2017, 23:55:04 UTC - in response to Message 1857111. . I wish we could get a 1 or 2 day buffer of WUs, that would be really helpful. That would require a change in the current limitation of 100 Work units per GPU/CPU. I run out of work in 7 hours..... ID: 1857113 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1857116 - Posted: 23 Mar 2017, 0:08:07 UTC And correct me if I am wrong. They won't do that is because the database would be too big and unmanageable. Right? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1857116 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1857123 - Posted: 23 Mar 2017, 0:20:28 UTC - in response to Message 1857116. The current limits were an accident. It really wasn't supposed to be 100 Per GPU and CPU, it ended up that way because of however they coded it. The limit was supposed to be even less. (at least that is how I remember it being told to me) But if you allowed people to download as much as they want, then there becomes issues with thousands of lost work units (maybe even millions) having to time out if someone decides just not to return the work units. So I think the idea was to restrict the number to what they thought was a manageable amount... Lets see what inside some of the other provide.. ID: 1857123 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1857140 - Posted: 23 Mar 2017, 1:32:57 UTC Both of my machine are back to full after spending most of the day empty. The back-offs can be murder in a challenge. ID: 1857140 ·

KWSN THE Holy Hand Grenade! Volunteer tester Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0	Message 1857151 - Posted: 23 Mar 2017, 2:24:28 UTC - in response to Message 1857123. The current limits were an accident. It really wasn't supposed to be 100 Per GPU and CPU, it ended up that way because of however they coded it. The limit was supposed to be even less. (at least that is how I remember it being told to me) But if you allowed people to download as much as they want, then there becomes issues with thousands of lost work units (maybe even millions) having to time out if someone decides just not to return the work units. So I think the idea was to restrict the number to what they thought was a manageable amount... Lets see what inside some of the other provide.. Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours... . Hello, from Albany, CA!... ID: 1857151 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1857155 - Posted: 23 Mar 2017, 3:00:13 UTC - in response to Message 1857151. The current limits were an accident... Lets see what inside some of the other provide.. Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours... An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week. Hate to pay electricity for machines that are twiddling their figurative thumbs ... ID: 1857155 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1857161 - Posted: 23 Mar 2017, 3:44:09 UTC - in response to Message 1857155. When they aren't crunching the electricity usage goes way down. Something to consider is Seti owes Einstein a lot of love for their allowing us to use their Atlas cluster to run Nebula,that would be a very worthwhile project when you are out of work. ID: 1857161 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1857166 - Posted: 23 Mar 2017, 3:59:41 UTC - in response to Message 1857155. The current limits were an accident... Lets see what inside some of the other provide.. Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours... An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week. Hate to pay electricity for machines that are twiddling their figurative thumbs ... I almost never ran out of CPU work in the past... at least when the outages were less than 10-12 hours. Different situation now with the Ryzen 1700X machine. Man! That machines really chews through the CPU units now. Definitely would appreciate an increase to 200 unit per CPU on that machine. 200 units per GPU would also be appreciated. My machines never really go idle. When the SETI work dries up, they automatically move on to MilkyWay and Einstein work. Still keeps the CPU sort of busy if only feeding the GPU's. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1857166 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1857169 - Posted: 23 Mar 2017, 5:13:11 UTC - in response to Message 1857151. The current limits were an accident. The limit was supposed to be even less. (at least that is how I remember it being told to me) So I think the idea was to restrict the number to what they thought was a manageable amount... Lets see what inside some of the other provide.. Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours... . . 200 sounds pretty good to me, two of my three machines can chew through their present allotments in 6 to 8 hours. This is not very comforting as outages take longer and longer each week. But I don't think you will convince the decision makers to go along with that proposal. Stephen <shrug> ID: 1857169 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1857172 - Posted: 23 Mar 2017, 5:18:39 UTC - in response to Message 1857155. The current limits were an accident... Lets see what inside some of the other provide.. Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours... An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week. Hate to pay electricity for machines that are twiddling their figurative thumbs ... . . 8 to 11 hours? You've got it good. This week it was over 30 hours for me. After about 14 hours I could upload results AND report. But I did not regain access to the message boards or any new work until over 30 hours had passed. For me it started about 2am yesterday AEDT and did not finally end (return to some sort of normalcy) until approx 9am today AEDT. Very disheartening. Stephen :( ID: 1857172 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1857173 - Posted: 23 Mar 2017, 5:25:19 UTC - in response to Message 1857155. The current limits were an accident... Lets see what inside some of the other provide.. Maybe they could up that 100 to 200... I sometimes run through 100 WU's (particularly VLARS!) on my fastest (current) machine in less than 24 hours... An increase to 200 would definitely be appreciated here. I often run out of CPU work on the heavy hitters here during the current 8-11 hr maintenance outages each week. And bump it up to 1,000 per GPUs to get through these extra long extended outages. Unfortunately that would probably make the outages longer as the database would require more maintenance. Maybe it's time for some new to meet the demands? Grant Darwin NT ID: 1857173 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1857174 - Posted: 23 Mar 2017, 5:26:07 UTC BTW- WTF happened? Grant Darwin NT ID: 1857174 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1857175 - Posted: 23 Mar 2017, 5:33:11 UTC - in response to Message 1857161. Last modified: 23 Mar 2017, 5:39:05 UTC When they aren't crunching the electricity usage goes way down. Something to consider is Seti owes Einstein a lot of love for their allowing us to use their Atlas cluster to run Nebula,that would be a very worthwhile project when you are out of work. . . Yep the power consumption goes down that's true, but two or three machines sitting idle while still chewing up 100W to 200W of electrical power each for anything from 4 to 12 hours or sometimes longer it is a pointless expenditure and waste. . . Having said that I had set up E@H as my backup project for when I run out of work from SETI. It took a while to get it right but I did and that was under windows. After moving two machines to Linux last week that was gone and I had neglected to redo it before this weeks outage, so I was without work for many hours. When I tried to set it up again I could not remember my password (getting old sucks) so I had to sort all that out and that took a few hours too. But I now have that working under Linux and can feel at ease again, knowing that when I fail to get work from SETI the machines can still be productive. Stephen :) ID: 1857175 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1857176 - Posted: 23 Mar 2017, 5:33:47 UTC Just had a look in my manager's Event log, getting the odd Scheduler error: Couldn't connect to server. When it's ok the response is nice and quick- within 3 seconds. Grant Darwin NT ID: 1857176 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1857177 - Posted: 23 Mar 2017, 5:37:33 UTC - in response to Message 1857174. BTW- WTF happened? . . The shadow knows! . . But no-one else ... Stephen :( ID: 1857177 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1857182 - Posted: 23 Mar 2017, 6:19:19 UTC - in response to Message 1857172. . . 8 to 11 hours? You've got it good. This week it was over 30 hours for me. After about 14 hours I could upload results AND report. But I did not regain access to the message boards or any new work until over 30 hours had passed. For me it started about 2am yesterday AEDT and did not finally end (return to some sort of normalcy) until approx 9am today AEDT. Very disheartening. Stephen :( Totally different issues, I think. And yes, this week I saw the same ~30 hrs you did, and though I did manage to fill most caches right after maintenance and before the crash, I was again out of work in a few more hours. It's reasonable to ask that there be enough work cached to allow people to remain productive during scheduled maintenance events, with enough cushion to support the oddball overtime of that event. It's a very different discussion to talk about what happens when a mess-up like the other day occurs. Not reasonable to have enough cached to get through something like that. BUT, I do think it's only fair to ask that when an issue like that occurs we are given an explanation and that steps are taken to minimize recurrence. Resources are always an issue, sure, but I don't think that enough consideration is given to keeping the systems up and running. For a while I was optimistic that some efforts were being made to improve communication with "the masses". Not so sure, now, and there's no excuse for that. ID: 1857182 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380	Message 1857184 - Posted: 23 Mar 2017, 6:41:20 UTC The project has NEVER promised to keep us fed with work, they have ALWAYS told us to have standby projects to fall back onto in the event of (extended) outages. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1857184 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1857185 - Posted: 23 Mar 2017, 6:55:28 UTC - in response to Message 1857184. The project has NEVER promised to keep us fed with work, they have ALWAYS told us to have standby projects to fall back onto in the event of (extended) outages. But it would be nice if it were possible to carry at least 24 hours of work, even 48. The option is there in the settings- it would be appreciated if it were possible. Why not raise the server side limits, but restrict all users to a maximum of 2 days cache, leaving more work available to faster crunchers to see them through these outages, but still limiting the load on the database? If there is no data available to crunch, then there's no work. But If there is work available, but the project has issues, it would be nice to be able to continue processing it. It has been said that there will (eventually) be way more work than the present user base can process in a reasonable time, and more crunchers are needed. But it doesn't matter how many crunchers there are, if they can't process the work. And even if the servers spend most of a week down, if in their uptime people are able to get enough work to tide them through to the next uptime, people will continue to crunch. Grant Darwin NT ID: 1857185 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380	Message 1857189 - Posted: 23 Mar 2017, 8:06:59 UTC There was an announcement around a bout Christmas that they were running out of the first batch of BLP data, and were reprocessing selected Arecibo data using the latest apps. It would appear that some new BLP data has arrived, but who knows how much. BTW Chris - There is an easier way to trigger back-up projects - set SETI to priority to 1000 and the other projects to zero, this way SETI will get work if it is available, and the other projects will only deliver when you have run out of SETI work. If you use the "locations" you can have have sets of projects with different priorities. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1857189 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.