Message boards :
Number crunching :
Panic Mode On (107) Server Problems?
Message board moderation
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 29 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I suspect, and it's only a suspicion, that the reason invoking the "ghost recovery" process is often successful in retrieving new tasks, even when no ghosts are present, is that a different timer is used, or at least a different, longer time interval. That "ghost recovery" process would, by necessity, require a database query in order to determine what tasks the server thinks are on hand for the requesting host. The results of that query then would have to be compared, task by task, against the tasks identified in the "<other_results>" section of the scheduler request, in order to see if any are missing and need to be resent. It would make sense to me (if making sense matters) that a longer response time might be allowed in order to accomplish that database retrieval and comparison, thus perhaps providing an extra cushion for normal scheduler operations. I like your deduction. Makes sense to me that there is another timer mechanism in play for the "ghost recovery" protocol. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Another consideration is the maximum number of tasks sent per request can be set on the sever. Several years ago the max number of AP tasks per request was reduced so it would be harder for users to stockpile them. It went from being able to get 100 at a time to ~7. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
I suspect, and it's only a suspicion, that the reason invoking the "ghost recovery" process is often successful in retrieving new tasks, even when no ghosts are present, is that a different timer is used, or at least a different, longer time interval. That "ghost recovery" process would, by necessity, require a database query in order to determine what tasks the server thinks are on hand for the requesting host. The results of that query then would have to be compared, task by task, against the tasks identified in the "<other_results>" section of the scheduler request, in order to see if any are missing and need to be resent. It would make sense to me (if making sense matters) that a longer response time might be allowed in order to accomplish that database retrieval and comparison, thus perhaps providing an extra cushion for normal scheduler operations. See thats the thing, the "ghost recovery" procedure has the same timeouts as a normal scheduler request. See the client in its scheduler request sends a list of tasks that it is currently processing. The logic of the scheduler, in the db using the count() SQL statement sees that the host has reached the 100WU per cpu + 100WU per gpu limit, when it sees this, the scheduler decides it is absolutely necessary to query the db. And this is probably the statement that it would execute: SELECT id, workunitid, name, hostid FROM result WHERE hostid='Your host id' I've skimmed the code, so that SQL statement is a guess. And the parsing of what it received from doing that, takes up the majority of its time, resulting in you recovering 20 WU only. I'd hazard a guess it times out Oh the server that the scheduler is running on is a synergy. Which has a spec according to the SSP page: Intel Server (2 x hexa-core 2.53GHz Xeon, 96 GB RAM) I'd hazard a guess the specs of this server are: Dual Intel Xeon E5649 2.53Ghz with max turbo upto 2.93Ghz on the LGA1366 socket 96GB of DDR3 1066 Mhz memory |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Ah, we're back. The forums went from being slower than a year of Sundays, to the Web site vanishing completely for a while there. Grant Darwin NT |
David@home Send message Joined: 16 Jan 03 Posts: 755 Credit: 5,040,916 RAC: 28 |
Getting "Project is temporarily shut down for maintenance" messages in my event log. All looks OK on the web site. Anybody elese experiencing this? |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Yes, here too. I thought I over slept by 4 days ;P |
Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0 |
Yes, just noticed a few finished tasks, going nowhere. Tried to 'update'.....no luck. Been a while since we had server trouble on a Friday afternoon (here, that is)! Don't take life too seriously, as you'll never come out of it alive! |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Getting "Project is temporarily shut down for maintenance" messages in my event log. All looks OK on the web site. Anybody elese experiencing this? Yes.....it's not just you. I just noticed that all my rigs have not reported for about half an hour. Which is about the time that server message and backoff was sent out. SSP looks OK as of this post. And I just hit the update button on my daily driver.......reported and got new tasks. So, it looks like the coast is clear again. I'll just let the rest of my rigs time out by themselves. "Time is simply the mechanism that keeps everything from happening all at once." |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I had a 30 minute backoff on all machines and I just updated and started getting task again. So a short lived event. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I had a 30 minute backoff on all machines and I just updated and started getting task again. So a short lived event. . . Well I was blissfully asleep through that event, but right now the servers are very reluctant to this rig {i5-6600 with 2 x 970s} any work, I had to tickle its tonsils very hard to refill my cache to only almost full, still 20 or so down and it won't send any more ... Stephen ?? |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Would appear to be a lack of nvidia GPU work being available. My caches are down a bit too. "Time is simply the mechanism that keeps everything from happening all at once." |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, seeing the same. Down about 250 tasks on the linux cruncher and it sent (4) after the ghost recovery protocol. Must not have any Nvidia tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0 |
Yes, seeing the same. Down about 250 tasks on the linux cruncher and it sent (4) after the ghost recovery protocol. Must not have any Nvidia tasks. That or the feeder is not fast enough to refill the buffer |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It seems the RTS is full of Arecibo VLARs. The only CPU tasks I'm getting are Arecibo VLARs and the numbers on the Arecibo splitters aren't changing much even though some of the files are small. Someone needs to add some more Arecibo files and hope a splitter jumps on a new file. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Yep, lots and lots of Arecibo VLARs. Even my machines are having difficulty getting new tasks, and my #1 cruncher actually got down to its last GPU task, the first time I can recall that happening. I just went ahead and rescheduled all the available guppis and non-VLAR Arecibo tasks from the CPU to the GPU queue. That freed up enough CPU queue space to be able to accept a bunch of Arecibo VLARs which, in turn, seemed to let a bunch of guppis and Arecibo non-VLAR tasks loose. It looks like that machine is slowly starting to rebuild the GPU queue but who knows how long that'll last. If I have to, I'll just move Arecibo VLARs over to the GPUs. I don't think they do all that badly with the Special App. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The logjam of Arecibo VLARs may have finally broken up. I just got a 154 task slug of BLCs on the Ryzen system. Hope that occurs for the other crunchers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I was just thinking ... damn that Keith for saying there was not enough BLC files for his Ryzen. I have more than enough to do my CPUs. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36355 Credit: 261,360,520 RAC: 489 |
Not very many Arecibo VLAR's here, but I do have heaps of GBT and some normal Arecibo work. ;-) Cheers. |
Mike Send message Joined: 17 Feb 01 Posts: 34348 Credit: 79,922,639 RAC: 80 |
Murphies Law maybe. 90% Arecibo on my CPU and 90% GBT on my GPU whilst i would like it vice versa. With each crime and every kindness we birth our future. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Murphies Law maybe. For me it's generally my slower system gets mostly GBT work, and my faster system mostly Arecibo. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.