Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 81 · 82 · 83 · 84 · 85 · 86 · 87 . . . 94 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304 ![]() ![]() |
A large volume of shorties, with many noise bombs thrown in, going through the system at the moment. In progress climbing, Ready to send falling, and all the past couple of weeks of the Validation & Assimilation backlogs clearing have been undone. Backlogs growing & growing fast. Grant Darwin NT |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1644 Credit: 12,921,799 RAC: 89 ![]() ![]() |
A large volume of shorties, with many noise bombs thrown in, going through the system at the moment. In progress climbing, Ready to send falling, and all the past couple of weeks of the Validation & Assimilation backlogs clearing have been undone. Backlogs growing & growing fast. As I type the splitters seem to be coping quite well splitting at over 80 a second. RTS is over 2500, could be higher but it could also be a lot lower ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
The assimilation backlog had been going down at a steady rate for several days but couple of days ago it reversed direction. It was 2.2 mil at the lowest point but is now grown to almost 2.8 mil. And each workunit in assimilation queue is keeping about 2.2 results stuck in the database. They seem to be shown in the validation queue on the SSP. This has pushed the total number of results to a level that forces their generation to be throttled again by stopping the splitters :( |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
And the replica again unable to keep up, as of now 5,478 seconds (1.5 hours) behind. I’m starting to think this is a weekend problem. 3rd Saturday in a row that it’s happened, and usually catches back up by Monday morning. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19520 Credit: 40,757,560 RAC: 67 ![]() ![]() |
And the replica again unable to keep up, as of now 5,478 seconds (1.5 hours) behind. Looks like someone has applied a touch of TLC, or a size 10 boot, to the problem. The replica has caught up. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Validation / Assimilation backlog continues to grow. Shouldn't be too long and all splitter output will cease again till the backlog starts going down again. The number of shorties & noise bombs has dropped off, so we might get lucky. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304 ![]() ![]() |
The number of shorties & noise bombs has dropped off, so we might get lucky.Looks like i spoke too soon. Another batch of shorties going through. Grant Darwin NT |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19520 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Panic Mode; ON The replica is falling behind again. As of 16 Feb 2020, 0:00:04 UTC Replica seconds behind master 1 0m LOL With that it's Good Night. C U Domani |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Getting long delays in Scheduler response, and it's not supplying as many WUs as are being reported. Hopefully just a brief passing glitch. Grant Darwin NT |
![]() ![]() Send message Joined: 23 May 99 Posts: 7380 Credit: 44,181,323 RAC: 238 ![]() ![]() |
Greetings, I do believe that we have finally made it beyond the major slow down we've been experiencing over the past 5 or 6 Sundays. :) I do hope I didn't just jinx the whole operation! ;) Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Got home form work to find my Linux system out of GPU work. Heaps of work to be downloaded- all in super extended extreme backoff mode. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1858 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
@Grant: Got home form work to find my Linux system out of GPU work. Heaps of work to be downloaded- all in super extended extreme backoff mode. Throw this in a cron job, executed from within the BOINC directory, every 15 minutes or so to solve that: ./boinccmd --project http://setiathome.berkeley.edu update Note that you can execute this from other than inside the BOINC directory, but will have to add password from gui_rpc_auth.cfg: ./BOINC/boinccmd --passwd [password] --project http://setiathome.berkeley.edu update ![]() ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13893 Credit: 208,696,464 RAC: 304 ![]() ![]() |
@Grant:Thanks, i'll put a copy of this aside for when i feel like wrestling with something new. After today i need a shower, a bit of a read, then a good lie down. :-) Grant Darwin NT |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
@Grant:The solution doesn't match the problem. 'Update' will (amongst other things) try to get more work allocated by the scheduler: but this work is already allocated - it just needs to be downloaded. What's needed is some variation on --file_transfer URL filename {retry | abort}but as it stands that needs a filename. You could use --get_file_transfersand pipe the output to a file - then pick filenames from that. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Also unconditional running of update will force a scheduler request to happen when you are still within the cooldown form the previous request. This makes the scheduler refuse to send you any work and reset the cooldown back to full five minutes. When it happens every 15 min, you may lose up to 33% of your opportunities to get more work, which puts you in a disadvantage in a situation where work generation is being throttled. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
for i in `boinccmd --get_file_transfers | sed -n -e 's/^.*name: //p'`;do boinccmd --file_transfer http://setiathome.berkeley.edu $i retry;done Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Also unconditional running of update will force a scheduler request to happen when you are still within the cooldown form the previous request. This makes the scheduler refuse to send you any work and reset the cooldown back to full five minutes. When it happens every 15 min, you may lose up to 33% of your opportunities to get more work, which puts you in a disadvantage in a situation where work generation is being throttled. i run it on an interval of 930s. while you might miss a couple times, it doesn't seem to have any real adverse affects. my systems still get topped up all the way. it's a lot better than going into extended backoffs while you're sleeping and waking up to a cold system Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
i run it on an interval of 930s. while you might miss a couple times, it doesn't seem to have any real adverse affects. my systems still get topped up all the way. it's a lot better than going into extended backoffs while you're sleeping and waking up to a cold systemYour cron can use smaller units than minutes? |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
I don’t run it in cron. I just open a terminal window and use the watch command and let it run. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Harri Liljeroos ![]() Send message Joined: 29 May 99 Posts: 5150 Credit: 85,281,665 RAC: 126 ![]() ![]() |
BoincTasks can do the upload/download retries for you automatically. There is an example config.xml file under "Program Files\eFMer\BoincTasks\examples". You copy and edit this file to folder where BoincTasks exe-files are and restart BoincTasks. By default it checks upload/download queue every 4000 seconds and retries them. If file transfers still fail, it will shorten the interval automatically by 360 seconds (or something like that, I don't remember exactly). The interval is decreased after each retry if file transfers fail until it is 180/360 seconds (sorry, don't remember the exact value). If file transfers are successful, the interval is increased with same step until it is 4000 seconds. The config.xml also allows you to control the work requests as well but I haven't experimented with those parameters. The file has comments in it for different parameters and their purpose. You can see how it has operated under BoincTasks menu Show->Log. ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.