Message boards :
Number crunching :
Panic Mode On (93) Server Problems?
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 24 · Next
Author | Message |
---|---|
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
They ever move tapes from Main to Beta for crunching? Just asking.... |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
They ever move tapes from Main to Beta for crunching? Just asking.... I would think the same few tapes would be permanently re-used on Beta.. if the same data has been processed a few hundred times, you KNOW what the results should be when testing new apps. I could be wrong though. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
Eight AP's left, stocking the cache with MB since a fix is obviously not in the works for a few days(?). Still worthwhile science to be done without AP. May we all live in interesting times......... "Sour Grapes make a bitter Whine." <(0)> |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, if the slapping around doesn't work, it may be time to take it out behind the shed... Here is a 26my14ab from 3 Days ago; name: ap_26my14ab_B0_P1_00238_20141216_05061.wu application: AstroPulse v7 created: 16 Dec 2014, 14:44:12 UTC http://setiathome.berkeley.edu/workunit.php?wuid=1652212000 It's been around for a while. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
AP in progress & returned per hour continue to decline. AP Awaiting validation continues to grow. AP assimilators are disabled. Grant Darwin NT |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
AP in progress & returned per hour continue to decline. And now back up and poking along. I think I've about given up on trying to read the tea leaves on the SSP. Seems like it doesn't resemble reality when things go sideways ... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I just watched Inside a Google data center. Talk about 'server problems'... Wouldn't we all wish that the colo looked like that? |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Oh no, I have loads of "*no14ac*" tasks again, of which the 13 to 20 range cause severe stuttering when they run on my AMD HD7870. What is it with these November tasks that causes this problem? The older tasks, 21 to 26 give no problems. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, AP started flowing here pretty good, but that stopped at the moment when three splitters started working on the same file: 25my14ae Now it's up to Four Splitters on 26se14as, with the creation rate down to 0.5585/sec and (6) completed channels. I think it was up to .8 or .9 when there were just 2 splitters on a single file earlier. Last night someone was testing things and disabled 4 of 7 splitters. The three remaining still jumped on the same file but the creation rate was near .5, close to where it is now. Seems when 3 or 4 splitters jump on the same file the creation rate is the same as if the other splitters were disabled. Now my one machine that was building a cache is out of work again. The other 2 still have work available...for now. |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
Well, AP started flowing here pretty good, but that stopped at the moment when three splitters started working on the same file: 25my14ae That one is gone now, but we have new troublemaker, 26se14as. 4 splitters working on it... Edit: TBar was faster. "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Well, AP started flowing here pretty good, but that stopped at the moment when three splitters started working on the same file: 25my14ae At 16:30:05 UTC, the display for 26se14as shows four channels in progress, and the current result creation rate as 1.1107/sec I think it's too simplistic to equate 'channel in progress' with 'splitter is working', and to equate multiple splitters with slow working in every case. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Despite the barely-crawling-along AP creation rate, my cache continues to slowly grow (I guess it also helps that I complete ~6 APs/day, which is molasses-slow compared to you guys that do them in 30 minutes on a GPU). Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0 |
Hi Folks, Well the S@H server problems seem to be contagious:-( Just had a 2nd CPU drop dead on me.. Now on my 3rd FX9370.. So now I have to RMA another one.. So def NO CPU tasks for some time to come.. Its damn strange I can run my GPU's at 50C or more but the dratted CPU croaks at less than 40C core temps.. and for limited periods. As for AP's, Since they became available I've had 4 total, all the rest are MB.. Regards, Cliff, Been there, Done that, Still no damm T shirt! |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
Despite the barely-crawling-along AP creation rate, my cache continues to slowly grow (I guess it also helps that I complete ~6 APs/day, which is molasses-slow compared to you guys that do them in 30 minutes on a GPU). Even on my notebook where APs take about 2t25m, running 2 at a time, I keep running out of APs. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
OK, how's this for a theory about the 'many splitters work the same tape' issue? I think the algorithm in practice is: "When a splitter finishes a tape, start work on what was the next tape in sequence when the split started" It should be "start work on what is the next tape now". Example: I've been watching the MB column for my data distribution charts. MB has just completed 26no14af, which was the last tape in the last batch loaded. There are currently two splitters working on 03oc14aa, which is the first tape in the current batch: we would have preferred it to start on 24oc14ac, which is the next unstarted tape in the batch. Corroboration, or contradiction, anyone? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
OK, how's this for a theory about the 'many splitters work the same tape' issue? That would agree with what I've been able to determine. Seems the longest running file is the one that attracts the splitters. I have a WU from ap_26se14as dated the 19th. As far as I can tell that makes it the longest running file. Same as with the previous splitter attractor, it had been running for 3 days. Still kinda strange the way the creation rate seems to slow down after a while without any noticeable changes, it's back down to .5/sec now. |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
I don't have answers to Your questions, Richard, but I do have my own question: Did this 3/4 splitters working on same tape happen before AP database crash? "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I don't have answers to Your questions, Richard, but I do have my own question: I monitor MB rather than AP. But I think the answer is yes. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
And then, when the surplus splitter finished a channel (rather than the whole tape) on 03oc14aa, it started the first channel on 24oc14ac as required. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Just a little reminder. I've been running the cache at 2 days because that is Less than the amount of CPU VLARs I have. I was hoping I would be sent enough GPU APs to run my machine. That isn't happening. So, I raised the cache up a day and was immediately sent CPU APs that I don't need. I have days of CPU work and about an hour of GPU work, so, the server sends me CPU work. Brilliant. Here is the numbers the server was working with; Sat Dec 20 16:28:01 2014 | SETI@home | [sched_op] Starting scheduler request It SHOULD have sent those to the GPUs, but instead sent them to the CPUs. When I looked at the Project tab I found that instead of asking for GPU tasks, which I'm just about out of, it was just sitting there without any time deferral. As soon as I raised the cache so CPU work was needed, BOOM, it requested work. It sure would be nice if the scheduler was concerned about the GPU work as much as it's concerned about the CPU work. More CPU work while I have 2 GPU tasks left; Sat Dec 20 17:03:59 2014 | SETI@home | Sending scheduler request: To fetch work. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.