Panic Mode On (93) Server Problems?

Author	Message
Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1616457 - Posted: 19 Dec 2014, 22:22:05 UTC - in response to Message 1616451. They ever move tapes from Main to Beta for crunching? Just asking.... ID: 1616457 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1616460 - Posted: 19 Dec 2014, 22:25:12 UTC - in response to Message 1616457. They ever move tapes from Main to Beta for crunching? Just asking.... I would think the same few tapes would be permanently re-used on Beta.. if the same data has been processed a few hundred times, you KNOW what the results should be when testing new apps. I could be wrong though. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1616460 ·

JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1	Message 1616476 - Posted: 19 Dec 2014, 23:34:38 UTC Eight AP's left, stocking the cache with MB since a fix is obviously not in the works for a few days(?). Still worthwhile science to be done without AP. May we all live in interesting times......... "Sour Grapes make a bitter Whine." <(0)> ID: 1616476 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1616494 - Posted: 20 Dec 2014, 0:29:34 UTC Well, if the slapping around doesn't work, it may be time to take it out behind the shed... Here is a 26my14ab from 3 Days ago; name: ap_26my14ab_B0_P1_00238_20141216_05061.wu application: AstroPulse v7 created: 16 Dec 2014, 14:44:12 UTC http://setiathome.berkeley.edu/workunit.php?wuid=1652212000 It's been around for a while. ID: 1616494 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1616553 - Posted: 20 Dec 2014, 4:44:01 UTC - in response to Message 1616494. AP in progress & returned per hour continue to decline. AP Awaiting validation continues to grow. AP assimilators are disabled. Grant Darwin NT ID: 1616553 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1616565 - Posted: 20 Dec 2014, 5:56:50 UTC - in response to Message 1616553. AP in progress & returned per hour continue to decline. AP Awaiting validation continues to grow. AP assimilators are disabled. And now back up and poking along. I think I've about given up on trying to read the tea leaves on the SSP. Seems like it doesn't resemble reality when things go sideways ... ID: 1616565 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1616627 - Posted: 20 Dec 2014, 14:17:07 UTC I just watched Inside a Google data center. Talk about 'server problems'... Wouldn't we all wish that the colo looked like that? ID: 1616627 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1616644 - Posted: 20 Dec 2014, 15:40:40 UTC Oh no, I have loads of "no14ac" tasks again, of which the 13 to 20 range cause severe stuttering when they run on my AMD HD7870. What is it with these November tasks that causes this problem? The older tasks, 21 to 26 give no problems. ID: 1616644 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1616649 - Posted: 20 Dec 2014, 16:05:53 UTC - in response to Message 1616620. Last modified: 20 Dec 2014, 16:18:02 UTC Well, AP started flowing here pretty good, but that stopped at the moment when three splitters started working on the same file: 25my14ae At that moment, AP delivery became molasses delivery again. Now it's up to Four Splitters on 26se14as, with the creation rate down to 0.5585/sec and (6) completed channels. I think it was up to .8 or .9 when there were just 2 splitters on a single file earlier. Last night someone was testing things and disabled 4 of 7 splitters. The three remaining still jumped on the same file but the creation rate was near .5, close to where it is now. Seems when 3 or 4 splitters jump on the same file the creation rate is the same as if the other splitters were disabled. Now my one machine that was building a cache is out of work again. The other 2 still have work available...for now. ID: 1616649 ·

WezH Volunteer tester Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95	Message 1616651 - Posted: 20 Dec 2014, 16:09:52 UTC - in response to Message 1616620. Last modified: 20 Dec 2014, 16:10:38 UTC Well, AP started flowing here pretty good, but that stopped at the moment when three splitters started working on the same file: 25my14ae At that moment, AP delivery became molasses delivery again. That one is gone now, but we have new troublemaker, 26se14as. 4 splitters working on it... Edit: TBar was faster. "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - ID: 1616651 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1616668 - Posted: 20 Dec 2014, 16:45:47 UTC - in response to Message 1616651. Well, AP started flowing here pretty good, but that stopped at the moment when three splitters started working on the same file: 25my14ae At that moment, AP delivery became molasses delivery again. That one is gone now, but we have new troublemaker, 26se14as. 4 splitters working on it... At 16:30:05 UTC, the display for 26se14as shows four channels in progress, and the current result creation rate as 1.1107/sec I think it's too simplistic to equate 'channel in progress' with 'splitter is working', and to equate multiple splitters with slow working in every case. ID: 1616668 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1616688 - Posted: 20 Dec 2014, 17:40:05 UTC Despite the barely-crawling-along AP creation rate, my cache continues to slowly grow (I guess it also helps that I complete ~6 APs/day, which is molasses-slow compared to you guys that do them in 30 minutes on a GPU). Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1616688 ·

cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0	Message 1616710 - Posted: 20 Dec 2014, 19:03:16 UTC Hi Folks, Well the S@H server problems seem to be contagious:-( Just had a 2nd CPU drop dead on me.. Now on my 3rd FX9370.. So now I have to RMA another one.. So def NO CPU tasks for some time to come.. Its damn strange I can run my GPU's at 50C or more but the dratted CPU croaks at less than 40C core temps.. and for limited periods. As for AP's, Since they became available I've had 4 total, all the rest are MB.. Regards, Cliff, Been there, Done that, Still no damm T shirt! ID: 1616710 ·

JohnDK Volunteer tester Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127	Message 1616719 - Posted: 20 Dec 2014, 19:23:18 UTC - in response to Message 1616688. Despite the barely-crawling-along AP creation rate, my cache continues to slowly grow (I guess it also helps that I complete ~6 APs/day, which is molasses-slow compared to you guys that do them in 30 minutes on a GPU). Even on my notebook where APs take about 2t25m, running 2 at a time, I keep running out of APs. ID: 1616719 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1616736 - Posted: 20 Dec 2014, 20:09:08 UTC OK, how's this for a theory about the 'many splitters work the same tape' issue? I think the algorithm in practice is: "When a splitter finishes a tape, start work on what was the next tape in sequence when the split started" It should be "start work on what is the next tape now". Example: I've been watching the MB column for my data distribution charts. MB has just completed 26no14af, which was the last tape in the last batch loaded. There are currently two splitters working on 03oc14aa, which is the first tape in the current batch: we would have preferred it to start on 24oc14ac, which is the next unstarted tape in the batch. Corroboration, or contradiction, anyone? ID: 1616736 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1616742 - Posted: 20 Dec 2014, 20:36:06 UTC - in response to Message 1616736. OK, how's this for a theory about the 'many splitters work the same tape' issue? I think the algorithm in practice is: "When a splitter finishes a tape, start work on what was the next tape in sequence when the split started" It should be "start work on what is the next tape now". Example: I've been watching the MB column for my data distribution charts. MB has just completed 26no14af, which was the last tape in the last batch loaded. There are currently two splitters working on 03oc14aa, which is the first tape in the current batch: we would have preferred it to start on 24oc14ac, which is the next unstarted tape in the batch. Corroboration, or contradiction, anyone? That would agree with what I've been able to determine. Seems the longest running file is the one that attracts the splitters. I have a WU from ap_26se14as dated the 19th. As far as I can tell that makes it the longest running file. Same as with the previous splitter attractor, it had been running for 3 days. Still kinda strange the way the creation rate seems to slow down after a while without any noticeable changes, it's back down to .5/sec now. ID: 1616742 ·

WezH Volunteer tester Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95	Message 1616744 - Posted: 20 Dec 2014, 20:41:29 UTC - in response to Message 1616736. I don't have answers to Your questions, Richard, but I do have my own question: Did this 3/4 splitters working on same tape happen before AP database crash? "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - ID: 1616744 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1616747 - Posted: 20 Dec 2014, 20:46:13 UTC - in response to Message 1616744. I don't have answers to Your questions, Richard, but I do have my own question: Did this 3/4 splitters working on same tape happen before AP database crash? I monitor MB rather than AP. But I think the answer is yes. ID: 1616747 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1616748 - Posted: 20 Dec 2014, 20:53:47 UTC And then, when the surplus splitter finished a channel (rather than the whole tape) on 03oc14aa, it started the first channel on 24oc14ac as required. ID: 1616748 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1616762 - Posted: 20 Dec 2014, 21:45:35 UTC Last modified: 20 Dec 2014, 22:07:00 UTC Just a little reminder. I've been running the cache at 2 days because that is Less than the amount of CPU VLARs I have. I was hoping I would be sent enough GPU APs to run my machine. That isn't happening. So, I raised the cache up a day and was immediately sent CPU APs that I don't need. I have days of CPU work and about an hour of GPU work, so, the server sends me CPU work. Brilliant. Here is the numbers the server was working with; Sat Dec 20 16:28:01 2014 \| SETI@home \| [sched_op] Starting scheduler request Sat Dec 20 16:28:01 2014 \| SETI@home \| Sending scheduler request: To fetch work. Sat Dec 20 16:28:01 2014 \| SETI@home \| Requesting new tasks for CPU and ATI Sat Dec 20 16:28:01 2014 \| SETI@home \| [sched_op] CPU work request: 169169.51 seconds; 0.00 devices Sat Dec 20 16:28:01 2014 \| SETI@home \| [sched_op] ATI work request: 1299211.44 seconds; 0.00 devices Sat Dec 20 16:28:04 2014 \| SETI@home \| Scheduler request completed: got 4 new tasks Sat Dec 20 16:28:04 2014 \| SETI@home \| [sched_op] Server version 705 Sat Dec 20 16:28:04 2014 \| SETI@home \| Project requested delay of 303 seconds Sat Dec 20 16:28:04 2014 \| SETI@home \| [sched_op] estimated total CPU task duration: 158476 seconds Sat Dec 20 16:28:04 2014 \| SETI@home \| [sched_op] estimated total ATI task duration: 0 seconds It SHOULD have sent those to the GPUs, but instead sent them to the CPUs. When I looked at the Project tab I found that instead of asking for GPU tasks, which I'm just about out of, it was just sitting there without any time deferral. As soon as I raised the cache so CPU work was needed, BOOM, it requested work. It sure would be nice if the scheduler was concerned about the GPU work as much as it's concerned about the CPU work. More CPU work while I have 2 GPU tasks left; Sat Dec 20 17:03:59 2014 \| SETI@home \| Sending scheduler request: To fetch work. Sat Dec 20 17:03:59 2014 \| SETI@home \| Requesting new tasks for CPU and ATI Sat Dec 20 17:03:59 2014 \| SETI@home \| [sched_op] CPU work request: 799965.62 seconds; 0.00 devices Sat Dec 20 17:03:59 2014 \| SETI@home \| [sched_op] ATI work request: 2596991.90 seconds; 0.00 devices Sat Dec 20 17:04:01 2014 \| SETI@home \| Scheduler request completed: got 4 new tasks Sat Dec 20 17:04:01 2014 \| SETI@home \| [sched_op] Server version 705 Sat Dec 20 17:04:01 2014 \| SETI@home \| Project requested delay of 303 seconds Sat Dec 20 17:04:01 2014 \| SETI@home \| [sched_op] estimated total CPU task duration: 158419 seconds Sat Dec 20 17:04:01 2014 \| SETI@home \| [sched_op] estimated total ATI task duration: 0 seconds ID: 1616762 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.