Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 82 · 83 · 84 · 85 · 86 · 87 · 88 . . . 94 · Next
Author | Message |
---|---|
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1858 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
The solution doesn't match the problem. 'Update' will (amongst other things) try to get more work allocated by the scheduler: but this work is already allocated - it just needs to be downloaded.That's what I get for posting so late (early) at night. Sorry, I was conflating two different issues. What I was addressing there was the issue of BOINC "forgetting' to even go asking for work for extended periods when there was none. NVM Sounds like the BoincTasks option may be worth exploring ... appreciate it. ![]() ![]() |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
for i in `boinccmd --get_file_transfers | sed -n -e 's/^.*name: //p'`;do boinccmd --file_transfer http://setiathome.berkeley.edu $i retry;done This command will give the boot to stuck transfers if you don’t want to run additional software. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() ![]() ![]() Send message Joined: 19 Jun 00 Posts: 173 Credit: 54,916,209 RAC: 833 ![]() ![]() |
for i in `boinccmd --get_file_transfers | sed -n -e 's/^.*name: //p'`;do boinccmd --file_transfer http://setiathome.berkeley.edu $i retry;done Do you just run this every time you power up the machine or do you wait for an issue to come up? I am new to Linux as well, what options if any do you set for the watch command? ![]() ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I would only run the watch command when you expect or getting issues and aren't going to be monitoring the machine while you are away or asleep. Because it can preempt the normal 305 second scheduler connection and force a "too soon" timeout backoff interval. Generally that doesn't cause an issue because it is infrequent or worst case synchronizes with the stock interval and keeps preempting it. But if you choose a much longer than 305 second interval and an odd number that won't happen very often. But it is safe to use. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
Do you need to panic? I'm not panicking in the slightest Results ready to send 0 0 391,092 9m Current result creation rate ** 0/sec 0.0184/sec 3.9030/sec 5m Results out in the field 0 32,248 6,382,603 9m ![]() |
Niteryder Send message Joined: 1 Mar 99 Posts: 64 Credit: 22,663,988 RAC: 18 ![]() ![]() |
Are we back? |
![]() ![]() ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 ![]() ![]() |
Are we back? yup, but expect some recovery lag as the system gets pounded by EVERYONE all returning the finished WUs and asking for fresh data. The system will be wonky while dealing with the volume for a while. |
Cruncher-American ![]() ![]() ![]() Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 ![]() ![]() |
hello? |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
Are we back? Yep. Much Wonkiness. That's the computer equivalent to Jerkiness in Calculus. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
hello? Good evening cruncher America ![]() |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
hello? Good Afternoon in NZ! Not a lot of crunching going on right now. Seems things are a little stuck at the moment. Validations are slow, downloads aren't happening despite a growing number of ready to send. |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
hello? All part of the recovery process ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Splitters stopped too late. There is now over 1.2 million tasks in RTS queue and the total number of tasks in the db is way past the 20 mil stability limit :( |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Yes, typical lately of the recent long outages. They let the splitters run rampant with no throttling and don't allow any downloads till the RTS buffer gets filled to over a million ready to send. Then they allow downloads. You should have been able to report all finished work by now by setting NNT and can now unset NNT and just wait for the floodgates to open on tasks going out to all the hungry hosts. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
NNT doesn't seem to make much difference. For the first couple of hours after the outage when when all my scheduler requests failed, they failed with and without NNT and when they started working, they also worked with and without NNT. I had one computer running with NNT and another without. Some hosts somewhere seemed to be able to get new work already when I was receiving just timeouts and http errors. There was 852 Astropulse tasks in the RTS queue shortly after the server status page started updating after the outage. Those 852 tasks melted away fast - over half of them were gone by the next SSP update 20 minutes later. But I haven't received any new tasks for any of my computers yet. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
It does in my case. Everyone's farm and internet connection is different. Yes, the scheduler connection always times out right after the outage finishes and then starts eventually connecting. But in my case if I don't set NNT it does not connect after connections are starting to go through. Set NNT and the connection can finally report finished work. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
So, has anyone actually been able to get some work? Not a one here, even after after all completed work returned. In progress has fallen by 1.5 million, and is still falling. Looks like work is being reported, but nothing is going out. Grant Darwin NT |
Alien Seeker ![]() Send message Joined: 23 May 99 Posts: 57 Credit: 511,652 RAC: 32 ![]() ![]() |
Since the recovery I got one task on one of my computers, and three on the other. So not zero, but it's really only a trickle. Gazing at the skies, hoping for contact... Unlikely, but it would be such a fantastic opportunity to learn. My alternative profile |
![]() ![]() ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 ![]() ![]() |
Replica is 30 minutes behind, so hard to see what is going on. There were a bunch of APs that went out, I can see that there is an increased return rate for APs, but out in the field (for AP) should be higher, so it is still wonky. I have a slow machine, so I just did a test ask for some work and it said it had no tasks available. Usually my machine will get tasks before all you fast machines. It is still in recovery mode, hard to tell if it just needs more time, or a swift kick. I'm guessing it will resolve in time when the total results falls below the magic 20 million number. Honestly that is just a wild guess on my part though. edit: RTS is falling. the dam is broken. I think work is going out |
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1858 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
So, has anyone actually been able to get some work?Across 4 clients, I've gotten 10 total tasks so far :) But at least with the boinccmd update kluge launched every 15 minutes, I do ask for work rather than sitting there stupid ... once the tasks start flowing, it will be interesting to see if putting stuck transfer retries under the control of boinctasks as previously mentioned benefits that issue. ![]() ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.