Panic Mode On (107) Server Problems?

Author	Message
Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1892025 - Posted: 27 Sep 2017, 18:44:29 UTC - in response to Message 1892024. I suspect, and it's only a suspicion, that the reason invoking the "ghost recovery" process is often successful in retrieving new tasks, even when no ghosts are present, is that a different timer is used, or at least a different, longer time interval. That "ghost recovery" process would, by necessity, require a database query in order to determine what tasks the server thinks are on hand for the requesting host. The results of that query then would have to be compared, task by task, against the tasks identified in the "<other_results>" section of the scheduler request, in order to see if any are missing and need to be resent. It would make sense to me (if making sense matters) that a longer response time might be allowed in order to accomplish that database retrieval and comparison, thus perhaps providing an extra cushion for normal scheduler operations. I like your deduction. Makes sense to me that there is another timer mechanism in play for the "ghost recovery" protocol. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1892025 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1892080 - Posted: 27 Sep 2017, 23:37:23 UTC Another consideration is the maximum number of tasks sent per request can be set on the sever. Several years ago the max number of AP tasks per request was reduced so it would be harder for users to stockpile them. It went from being able to get 100 at a time to ~7. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1892080 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 1892128 - Posted: 28 Sep 2017, 8:32:50 UTC - in response to Message 1892024. I suspect, and it's only a suspicion, that the reason invoking the "ghost recovery" process is often successful in retrieving new tasks, even when no ghosts are present, is that a different timer is used, or at least a different, longer time interval. That "ghost recovery" process would, by necessity, require a database query in order to determine what tasks the server thinks are on hand for the requesting host. The results of that query then would have to be compared, task by task, against the tasks identified in the "<other_results>" section of the scheduler request, in order to see if any are missing and need to be resent. It would make sense to me (if making sense matters) that a longer response time might be allowed in order to accomplish that database retrieval and comparison, thus perhaps providing an extra cushion for normal scheduler operations. See thats the thing, the "ghost recovery" procedure has the same timeouts as a normal scheduler request. See the client in its scheduler request sends a list of tasks that it is currently processing. The logic of the scheduler, in the db using the count() SQL statement sees that the host has reached the 100WU per cpu + 100WU per gpu limit, when it sees this, the scheduler decides it is absolutely necessary to query the db. And this is probably the statement that it would execute: SELECT id, workunitid, name, hostid FROM result WHERE hostid='Your host id' I've skimmed the code, so that SQL statement is a guess. And the parsing of what it received from doing that, takes up the majority of its time, resulting in you recovering 20 WU only. I'd hazard a guess it times out Oh the server that the scheduler is running on is a synergy. Which has a spec according to the SSP page: Intel Server (2 x hexa-core 2.53GHz Xeon, 96 GB RAM) I'd hazard a guess the specs of this server are: Dual Intel Xeon E5649 2.53Ghz with max turbo upto 2.93Ghz on the LGA1366 socket 96GB of DDR3 1066 Mhz memory ID: 1892128 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 1892310 - Posted: 29 Sep 2017, 5:27:31 UTC Ah, we're back. The forums went from being slower than a year of Sundays, to the Web site vanishing completely for a while there. Grant Darwin NT ID: 1892310 ·

David@home Volunteer tester Send message Joined: 16 Jan 03 Posts: 755 Credit: 5,040,916 RAC: 28	Message 1892441 - Posted: 29 Sep 2017, 18:09:52 UTC Getting "Project is temporarily shut down for maintenance" messages in my event log. All looks OK on the web site. Anybody elese experiencing this? ID: 1892441 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1892443 - Posted: 29 Sep 2017, 18:12:55 UTC - in response to Message 1892441. Yes, here too. I thought I over slept by 4 days ;P ID: 1892443 ·

Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0	Message 1892446 - Posted: 29 Sep 2017, 18:18:21 UTC - in response to Message 1892441. Yes, just noticed a few finished tasks, going nowhere. Tried to 'update'.....no luck. Been a while since we had server trouble on a Friday afternoon (here, that is)! Don't take life too seriously, as you'll never come out of it alive! ID: 1892446 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1892449 - Posted: 29 Sep 2017, 18:35:00 UTC - in response to Message 1892441. Last modified: 29 Sep 2017, 18:37:28 UTC Getting "Project is temporarily shut down for maintenance" messages in my event log. All looks OK on the web site. Anybody elese experiencing this? Yes.....it's not just you. I just noticed that all my rigs have not reported for about half an hour. Which is about the time that server message and backoff was sent out. SSP looks OK as of this post. And I just hit the update button on my daily driver.......reported and got new tasks. So, it looks like the coast is clear again. I'll just let the rest of my rigs time out by themselves. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1892449 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1892450 - Posted: 29 Sep 2017, 18:39:30 UTC I had a 30 minute backoff on all machines and I just updated and started getting task again. So a short lived event. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1892450 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1892598 - Posted: 30 Sep 2017, 10:40:38 UTC - in response to Message 1892450. I had a 30 minute backoff on all machines and I just updated and started getting task again. So a short lived event. . . Well I was blissfully asleep through that event, but right now the servers are very reluctant to this rig {i5-6600 with 2 x 970s} any work, I had to tickle its tonsils very hard to refill my cache to only almost full, still 20 or so down and it won't send any more ... Stephen ?? ID: 1892598 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1892602 - Posted: 30 Sep 2017, 12:17:26 UTC Would appear to be a lack of nvidia GPU work being available. My caches are down a bit too. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1892602 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1892607 - Posted: 30 Sep 2017, 13:49:48 UTC Yes, seeing the same. Down about 250 tasks on the linux cruncher and it sent (4) after the ghost recovery protocol. Must not have any Nvidia tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1892607 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 1892634 - Posted: 30 Sep 2017, 15:11:55 UTC - in response to Message 1892607. Yes, seeing the same. Down about 250 tasks on the linux cruncher and it sent (4) after the ghost recovery protocol. Must not have any Nvidia tasks. That or the feeder is not fast enough to refill the buffer ID: 1892634 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1892644 - Posted: 30 Sep 2017, 15:33:57 UTC - in response to Message 1892607. It seems the RTS is full of Arecibo VLARs. The only CPU tasks I'm getting are Arecibo VLARs and the numbers on the Arecibo splitters aren't changing much even though some of the files are small. Someone needs to add some more Arecibo files and hope a splitter jumps on a new file. ID: 1892644 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1892688 - Posted: 30 Sep 2017, 17:30:44 UTC - in response to Message 1892644. Yep, lots and lots of Arecibo VLARs. Even my machines are having difficulty getting new tasks, and my #1 cruncher actually got down to its last GPU task, the first time I can recall that happening. I just went ahead and rescheduled all the available guppis and non-VLAR Arecibo tasks from the CPU to the GPU queue. That freed up enough CPU queue space to be able to accept a bunch of Arecibo VLARs which, in turn, seemed to let a bunch of guppis and Arecibo non-VLAR tasks loose. It looks like that machine is slowly starting to rebuild the GPU queue but who knows how long that'll last. If I have to, I'll just move Arecibo VLARs over to the GPUs. I don't think they do all that badly with the Special App. ID: 1892688 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1892700 - Posted: 30 Sep 2017, 20:14:42 UTC The logjam of Arecibo VLARs may have finally broken up. I just got a 154 task slug of BLCs on the Ryzen system. Hope that occurs for the other crunchers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1892700 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1892706 - Posted: 30 Sep 2017, 20:39:57 UTC - in response to Message 1892700. I was just thinking ... damn that Keith for saying there was not enough BLC files for his Ryzen. I have more than enough to do my CPUs. ID: 1892706 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34869 Credit: 261,360,520 RAC: 489	Message 1892707 - Posted: 30 Sep 2017, 20:51:08 UTC Not very many Arecibo VLAR's here, but I do have heaps of GBT and some normal Arecibo work. ;-) Cheers. ID: 1892707 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1892708 - Posted: 30 Sep 2017, 20:56:56 UTC Murphies Law maybe. 90% Arecibo on my CPU and 90% GBT on my GPU whilst i would like it vice versa. With each crime and every kindness we birth our future. ID: 1892708 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 1892710 - Posted: 30 Sep 2017, 21:01:31 UTC - in response to Message 1892708. Murphies Law maybe. 90% Arecibo on my CPU and 90% GBT on my GPU whilst i would like it vice versa. For me it's generally my slower system gets mostly GBT work, and my faster system mostly Arecibo. Grant Darwin NT ID: 1892710 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.