The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 83 · 84 · 85 · 86 · 87 · 88 · 89 . . . 94 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032957 - Posted: 19 Feb 2020, 3:29:15 UTC

Yes, typical lately of the recent long outages. They let the splitters run rampant with no throttling and don't allow any downloads till the RTS buffer gets filled to over a million ready to send. Then they allow downloads. You should have been able to report all finished work by now by setting NNT and can now unset NNT and just wait for the floodgates to open on tasks going out to all the hungry hosts.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032957 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2032959 - Posted: 19 Feb 2020, 3:39:56 UTC
Last modified: 19 Feb 2020, 3:53:59 UTC

NNT doesn't seem to make much difference. For the first couple of hours after the outage when when all my scheduler requests failed, they failed with and without NNT and when they started working, they also worked with and without NNT. I had one computer running with NNT and another without.

Some hosts somewhere seemed to be able to get new work already when I was receiving just timeouts and http errors. There was 852 Astropulse tasks in the RTS queue shortly after the server status page started updating after the outage. Those 852 tasks melted away fast - over half of them were gone by the next SSP update 20 minutes later. But I haven't received any new tasks for any of my computers yet.
ID: 2032959 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032961 - Posted: 19 Feb 2020, 3:49:09 UTC - in response to Message 2032959.  

It does in my case. Everyone's farm and internet connection is different. Yes, the scheduler connection always times out right after the outage finishes and then starts eventually connecting. But in my case if I don't set NNT it does not connect after connections are starting to go through. Set NNT and the connection can finally report finished work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032961 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 2032966 - Posted: 19 Feb 2020, 4:38:06 UTC
Last modified: 19 Feb 2020, 4:39:43 UTC

So, has anyone actually been able to get some work?
Not a one here, even after after all completed work returned.

In progress has fallen by 1.5 million, and is still falling. Looks like work is being reported, but nothing is going out.
Grant
Darwin NT
ID: 2032966 · Report as offensive
Alien Seeker
Avatar

Send message
Joined: 23 May 99
Posts: 57
Credit: 511,652
RAC: 32
France
Message 2032967 - Posted: 19 Feb 2020, 4:46:08 UTC - in response to Message 2032966.  

Since the recovery I got one task on one of my computers, and three on the other. So not zero, but it's really only a trickle.
Gazing at the skies, hoping for contact... Unlikely, but it would be such a fantastic opportunity to learn.

My alternative profile
ID: 2032967 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2032968 - Posted: 19 Feb 2020, 4:50:35 UTC
Last modified: 19 Feb 2020, 4:52:54 UTC

Replica is 30 minutes behind, so hard to see what is going on.
There were a bunch of APs that went out, I can see that there is an increased return rate for APs, but out in the field (for AP) should be higher, so it is still wonky.
I have a slow machine, so I just did a test ask for some work and it said it had no tasks available. Usually my machine will get tasks before all you fast machines. It is still in recovery mode, hard to tell if it just needs more time, or a swift kick. I'm guessing it will resolve in time when the total results falls below the magic 20 million number. Honestly that is just a wild guess on my part though.

edit: RTS is falling. the dam is broken. I think work is going out
ID: 2032968 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 2032970 - Posted: 19 Feb 2020, 4:59:24 UTC - in response to Message 2032966.  
Last modified: 19 Feb 2020, 5:04:04 UTC

So, has anyone actually been able to get some work?
Across 4 clients, I've gotten 10 total tasks so far :)
But at least with the boinccmd update kluge launched every 15 minutes, I do ask for work rather than sitting there stupid ... once the tasks start flowing, it will be interesting to see if putting stuck transfer retries under the control of boinctasks as previously mentioned benefits that issue.
ID: 2032970 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 2032972 - Posted: 19 Feb 2020, 5:11:39 UTC

Linux system has managed to score some work on 2 requests. Still nothing on the Windows system.
Looks like it's going to be a very rough recovery.
Grant
Darwin NT
ID: 2032972 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032973 - Posted: 19 Feb 2020, 5:22:35 UTC - in response to Message 2032972.  

The last couple of outages I have not received anything but 1 or 2 tasks after the project comes back and before I go to bed. I awake in the morning with full caches.

Expect the same to occur tonight.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032973 · Report as offensive
Profile Shannon Lester
Avatar

Send message
Joined: 27 Jul 09
Posts: 83
Credit: 12,388,119
RAC: 140
United States
Message 2032976 - Posted: 19 Feb 2020, 5:45:55 UTC - in response to Message 2032973.  

Same... I’m hoping by morning I’ll be back up and running.

All of this has happened before and will happen again -Battlestar Galactica
ID: 2032976 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 2032977 - Posted: 19 Feb 2020, 5:49:14 UTC

Hmm, seem to be getting work every 20 or so requests- Windows system has only picked up a few WUs in total, Linux system out of GPU work again & soon to be out of CPU work again.
Grant
Darwin NT
ID: 2032977 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 2032978 - Posted: 19 Feb 2020, 5:50:41 UTC

As seems to always be the case, the low producer systems now have work, the workhorses are yet to get anything.
ID: 2032978 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19319
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2032980 - Posted: 19 Feb 2020, 6:15:41 UTC
Last modified: 19 Feb 2020, 7:11:14 UTC

Finally, got some work.

I've never seen this before, a split in reporting.
My computer crunched all the 150 tasks during the outage, and as far as the "Events Log" records didn't make contact until 02:41, to report said 150.
But https://setiathome.berkeley.edu/results.php?userid=8083616&offset=120&show_names=0&state=0&appid= says 20 of the tasks were reported at 00:26.

How does that happen.

edit] Those AP's that went out were _2's, I got three.
ID: 2032980 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2032986 - Posted: 19 Feb 2020, 7:13:09 UTC - in response to Message 2032980.  

Finally, got some work.

I've never seen this before, a split in reporting.
My computer crunched all the 150 tasks during the outage, and as far as the "Events Log" records didn't make contact until 02:41, to report said 150.
But https://setiathome.berkeley.edu/results.php?userid=8083616&offset=120&show_names=0&state=0&appid= says 20 of the tasks were reported at 00:26.

How does that happen.

edit] Those AP's that went out were _2's, I got three.


The Events log will be local time, and the results on the server will be in UTC. does that explain it?? or is there some other issue??
ID: 2032986 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19319
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2032989 - Posted: 19 Feb 2020, 7:54:42 UTC - in response to Message 2032986.  
Last modified: 19 Feb 2020, 7:56:29 UTC

Finally, got some work.

I've never seen this before, a split in reporting.
My computer crunched all the 150 tasks during the outage, and as far as the "Events Log" records didn't make contact until 02:41, to report said 150.
But https://setiathome.berkeley.edu/results.php?userid=8083616&offset=120&show_names=0&state=0&appid= says 20 of the tasks were reported at 00:26.

How does that happen.

edit] Those AP's that went out were _2's, I got three.


The Events log will be local time, and the results on the server will be in UTC. does that explain it?? or is there some other issue??

No because I am in the UK and my local winter time is UTC. This is the 02:41 Event log entry. Note it is still saying it is reporting 150 tasks.

19/02/2020 02:41:41 | SETI@home | Sending scheduler request: To report completed tasks.
19/02/2020 02:41:41 | SETI@home | Reporting 150 completed tasks
19/02/2020 02:41:41 | SETI@home | Not requesting tasks: don't need (CPU: ; NVIDIA GPU: job cache full)
19/02/2020 02:41:54 | SETI@home | Scheduler request completed
ID: 2032989 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 2032996 - Posted: 19 Feb 2020, 9:10:30 UTC

Both systems are now starting to pick up work.
Usual response is "Project has no tasks available", but every so often it gets work, and more than just a few WUs. Hopefully enough to get through the remainder of the recovery period.


Unfortunately the forums have now slowed to a crawl. Any slower, and they'd be non-functional.
Grant
Darwin NT
ID: 2032996 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 2032998 - Posted: 19 Feb 2020, 9:38:31 UTC
Last modified: 19 Feb 2020, 9:39:09 UTC

Downloads stuck, 120 on one machine, 180 on the other. Uploads working OK.
Can't shake them loose. Any suggestions?
WTF?
ID: 2032998 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 2032999 - Posted: 19 Feb 2020, 9:44:53 UTC - in response to Message 2032998.  
Last modified: 19 Feb 2020, 9:50:38 UTC

Downloads stuck, 120 on one machine, 180 on the other. Uploads working OK.
Can't shake them loose.
Yep. Sticky downloads seem to be a part of the after outage recovery these days.
Even suspending & enabling network access won't get them to start.
Elapsed time ticks away while nothing actually happens...

Edit- that's on the Windows system with the download server set in the Hosts file. On the Linux system with no Hosts file setting, it's an instant timeout. After a few retries you'll get a download or 2 that becomes sticky while all the others timeout instantly.


At least the forums are back to normal.
Grant
Darwin NT
ID: 2032999 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 2033002 - Posted: 19 Feb 2020, 9:55:17 UTC - in response to Message 2032999.  

My complaining seems to have shaken both machines loose - they are now d/l the backlogs. BUT at <10% of normal speed. Aaaaarrrrggghhhhh!
ID: 2033002 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 2033003 - Posted: 19 Feb 2020, 10:07:42 UTC - in response to Message 2033002.  
Last modified: 19 Feb 2020, 10:09:30 UTC

My complaining seems to have shaken both machines loose - they are now d/l the backlogs. BUT at <10% of normal speed. Aaaaarrrrggghhhhh!
Likewsie.
But at much slower speeds than you.

Posting about a problem is generally the best way to fix it.


Edit- shouldn't be too long now and the Ready-to-send buffer will be empty. Splitters still haven't shown much in the way of output.
Grant
Darwin NT
ID: 2033003 · Report as offensive
Previous · 1 . . . 83 · 84 · 85 · 86 · 87 · 88 · 89 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.