Message boards :
Number crunching :
Panic Mode On (116) Server Problems?
Message board moderation
Previous · 1 . . . 39 · 40 · 41 · 42 · 43 · 44 · 45 . . . 47 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Apparently Keith checked the results and they are shorties with ARs around 0.55. That would explain the consistency in run times I am seeing, usually noisy overflows are more erratic in times. So the super long ones are the actual VLAR tasks. . . I have now had a look at the stderr for a few of the blc41/42 tasks and the shorter run time tasks do not have an AR of 0.55 but rather of 0.055 so that is not the explanation. :( Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I am getting "aborted runtime limit" on blc41/blc42 tasks on what appear to be gpu tasks. They are running 41 odd minutes and then hitting this error. . . As TBar pointed out to me there is a limit built into BOINC that will restrict tasks that run overly long. If a task runs for more than 10 or 20 times as long as the device's APR indicates it should, then it is aborted. So if the blc41 tasks have achieved an APR that says they should complete in 10 mins, then one that runs for 100 mins will get the chop. And these blc41/42 tasks are showing extreme variation in run times on all my rigs, taking as little as 6 mins then suddenly one will take 30 mins. That is not enough to trigger this limit but perhaps some are much worse and will. Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Well it's been a month now and that's enough time for things to settle out. I have a verdict. . . A good score dude. A nice cheapy upgrade. Now just find a sweet GTX750/ti or maybe a Gtx9 series card and watch it climb :) Stephen |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Apologies for the upload issue yesterday. As many here properly guessed, this was fallout from the shortie / fast runner / noise bomb file set that was being split. I moved this file set out of the way but it took a few hours to work through the already split data. . . When this happened before with a similar set of 2 blc25 tape series the noise bomb problem was with the 58340 series but the 58405/6 series was OK, so I would guess it is much the same case this time. . . And again, thanks for the update and the news ... looking forward to the upgrade. It might even cope with Keith and Ian/Steve :) Stephen . . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13765 Credit: 208,696,464 RAC: 304 |
We are hoping to replace the upload server (bruno) before too long with a machine that is both faster and will store the results on SSDs. Yay! Thanks for the update. It is nice to know why things happened & what's going on behind the scenes. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13765 Credit: 208,696,464 RAC: 304 |
WU_awaiting_deletion continues to climb, splitter output continues to decline, as does the Ready_to_send buffer. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13765 Credit: 208,696,464 RAC: 304 |
Splitter output has fallen further, Ready_to_send buffer continues to empty, and the WU_awaiting_deletion backlog continues to grow. Get work while you still can (about 5hrs worth left at the present rate of consumption & supply). Grant Darwin NT |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
Thanks Grant for sounding the early warning. Something is definitely wrong. I was hoping the problem would fix itself, but the RTS is down to 500k and the returned per hour is 142K , so the slow splitting will help, but we will hit empty sometime during the night if something doesn't change. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
As long as the workunit assimilation and deletions keep climbing, the splitter output is going to fall. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
As predicted, as soon as the WU deletions and assimilations began to fall, the splitter output picked up to normal. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
Thanks for explaining it Keith. I'll wait until RTS is in the low 400K before panicking , as this is just normal behavior. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If you look at the weekly graphs at Haveland, you can see the correlation clearly for splitter output versus WU deletions/assimilations. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13765 Credit: 208,696,464 RAC: 304 |
Thanks for explaining it Keith. I'll wait until RTS is in the low 400K before panicking , as this is just normal behavior. It's not really normal (although it has become the new normal), it's a sign of server issues. It basically shows that the servers have reached their present limits of Input Output (I/O) load. Once the load reaches a certain point, the system just jams up and everything suffers as a result, till the load backs off again & the backlogs can clear. It's good that the upload server is to be replaced with more powerful hardware- and better yet SSD storage - that will (hopefully) sort out upload issues once and for all. If only we could get the live database on SSD storage as well (and of course more power full servers to remove what would then become the next bottleneck) that would once and for all stop the issues with the splitter output falling, at the times we need it's greatest possible output the most. It wouldn't matter if a couple of dozen files of noise bombs were loaded, we'd be able to process them, and the servers would be able to deal with the returning load. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for explaining it Keith. I'll wait until RTS is in the low 400K before panicking , as this is just normal behavior. +1 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13765 Credit: 208,696,464 RAC: 304 |
Nice to see all this new Arecibo work, even while 18dc09aa sits there mocking us with it's refusal to split. Grant Darwin NT |
Stargate (SA) Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0 |
I can at least say I got 3x 18dc09aa in last 24 hrs |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I can at least say I got 3x 18dc09aa in last 24 hrs . . Were they resends or new tasks? (that is did they end in a 0 or 1, or was it 2 or up) Stephen ? ? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13765 Credit: 208,696,464 RAC: 304 |
I can at least say I got 3x 18dc09aa in last 24 hrs I have had a few 18se10aa and a couple of 28s. But i haven't seen a WU from 18dc09aa (other than a resend) since it came to a grinding halt, what 3 or 4 weeks ago? Grant Darwin NT |
Stargate (SA) Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0 |
My Bad too much excitement my way, after a while all these WU's look the same...Moooooove along :P |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
My Bad too much excitement my way, after a while all these WU's look the same...Moooooove along :P . . :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.