Panic Mode On (115) Server Problems?

Message boards : Number crunching : Panic Mode On (115) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 31 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1983017 - Posted: 1 Mar 2019, 23:25:12 UTC - in response to Message 1982978.  

B) The download list supports multi-select. Click, shift-click, Retry Now, put the kettle on. (Once selected, they stay selected, so it's just 'Retry' next time.)


. . A I had noticed (and used) but I was unaware of B. But in this present dilemma I did not think ti would be of much avail.

Stephen

?
If you're in the middle of re-caching, you might have 100 tasks waiting to download. You can multi-select, then leave the mouse pointer hovering over 'Retry'. Click as much as you want, and the retries will be continuous. Eventually, one will latch on, and you can walk away.
ID: 1983017 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1983023 - Posted: 1 Mar 2019, 23:44:40 UTC - in response to Message 1982996.  

Whatever has been done has helped, but the underlying issue is still there and needs to be addressed.


. . OR the download servers are fixed but are being swamped ...

Which happens after every outage, even the extended ones, but with completely different symptoms.

Definitely getting sick again, from what I can see. It may be a long weekend ...
ID: 1983023 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1983028 - Posted: 2 Mar 2019, 0:01:33 UTC - in response to Message 1983023.  

Definitely getting sick again, from what I can see. It may be a long weekend ...

It certainly affects the faster systems the most.

If only returning a few WUs at a time, even when the downloads time out on their first attempt, a couple of times in a row the backoffs aren't too silly. However if you've got a lot of work to return and the first few tries don't succeed while the backoffs aren't too bad, if you miss on the next attempt then things degrade quickly.
And even only trying to download 1 WU at a time, after those first couple of failed attempts things can quickly spiral to multi hour backoffs.
And even when things go well, the more WUs you have to download, the greater the chance that even after getting them downloading there will be one (or more) that doesn't download & just times out & blows out the project backoffs even more.
Grant
Darwin NT
ID: 1983028 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1983039 - Posted: 2 Mar 2019, 0:17:38 UTC

Yea its definitely still a problem - Not a Glitch, LOL.
I went back to 1 task downloads with much better results than 4, but still a few go to backoffs and have to do retries every so often to keep things going.
ID: 1983039 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1983043 - Posted: 2 Mar 2019, 0:23:29 UTC - in response to Message 1983028.  

Definitely getting sick again, from what I can see. It may be a long weekend ...

It certainly affects the faster systems the most.

If only returning a few WUs at a time, even when the downloads time out on their first attempt, a couple of times in a row the backoffs aren't too silly. However if you've got a lot of work to return and the first few tries don't succeed while the backoffs aren't too bad, if you miss on the next attempt then things degrade quickly.
And even only trying to download 1 WU at a time, after those first couple of failed attempts things can quickly spiral to multi hour backoffs.
And even when things go well, the more WUs you have to download, the greater the chance that even after getting them downloading there will be one (or more) that doesn't download & just times out & blows out the project backoffs even more.

+1
Exactly the case I am observing.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1983043 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1983044 - Posted: 2 Mar 2019, 0:26:04 UTC - in response to Message 1983039.  

Yea its definitely still a problem - Not a Glitch, LOL.
I went back to 1 task downloads with much better results than 4, but still a few go to backoffs and have to do retries every so often to keep things going.

I set the cc_config back to stock two tasks and still having issues. I have tried on this daily driver to revert to last night's single task download temporarily and it isn't helping either, just as with yesterday, still the chance you will be left with at least one stuck download which blows everything out.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1983044 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1983072 - Posted: 2 Mar 2019, 2:27:56 UTC - in response to Message 1982931.  

I see the AP Assimilator (synergy) kicked back in as well.
I was wondering before if a hang in that was causing issues.
IDK
I think the current round of download issues could be associated with synergy trying to clear the historical large AP assimilation.
Synergy is also the scheduling server, scheduler process , and feeder.
ID: 1983072 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1983075 - Posted: 2 Mar 2019, 2:57:11 UTC

Still getting stuck tasks on all hosts leading to multi hour backoffs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1983075 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1983077 - Posted: 2 Mar 2019, 3:00:50 UTC - in response to Message 1983075.  

Still getting stuck tasks on all hosts leading to multi hour backoffs.

Beat me to it- looks like we're almost back to where we were yesterday.
50+ retries to get a single WU to upload, a couple more follow, then it all grinds to a halt again with 6+ hour backoffs.
Grant
Darwin NT
ID: 1983077 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1983080 - Posted: 2 Mar 2019, 3:06:51 UTC - in response to Message 1983075.  

I'm having really good luck here, but continually watching BoincTasks transfers.
Even with a 1 task limit, inevitably the first 1 or 2 downloads fail, but the remaining will start up. By clicking retry on those first 2 while the other downloads are in progress the entire batch goes through.
It's just establishing that first connection that's a pain. And seems even more so on my slower computers trying for 1 or 2 tasks. A pain to get them started ...
ID: 1983080 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1983082 - Posted: 2 Mar 2019, 3:14:13 UTC - in response to Message 1983080.  

It's just establishing that first connection that's a pain. And seems even more so on my slower computers trying for 1 or 2 tasks. A pain to get them started ...

Physically a pain.
I've given up as my wrist & hand is sore from the number of times I've clicked on Retry now with no luck.
Grant
Darwin NT
ID: 1983082 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1983084 - Posted: 2 Mar 2019, 3:31:59 UTC
Last modified: 2 Mar 2019, 3:32:17 UTC

I don't want to tempt fate, but maybe things are finally working again?

One of my systems managed to clear some of it's downloads when the backoff time was reached, and a retry cleared the others. And 2 Scheduler requests since then have resulted in downloads not timing out (even though slower than usual, and taking several seconds to actually start- but at least it's not an instant timeout). Gave Retry pending transfers another go on my other system, and they too cleared. And the following Scheduler request resulted in WUs that downloaded themselves.
Maybe, maybe...?
Grant
Darwin NT
ID: 1983084 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1983085 - Posted: 2 Mar 2019, 3:37:08 UTC - in response to Message 1983084.  

Yes it seems better the last 15 minutes. But still the odd straggler that starts the backoffs if not caught.
ID: 1983085 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1983086 - Posted: 2 Mar 2019, 3:44:17 UTC - in response to Message 1983082.  

It's just establishing that first connection that's a pain. And seems even more so on my slower computers trying for 1 or 2 tasks. A pain to get them started ...

Physically a pain.
I've given up as my wrist & hand is sore from the number of times I've clicked on Retry now with no luck.

I know what you mean. My hand goes numb from resting on the table clicking the mouse for retries.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1983086 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1983087 - Posted: 2 Mar 2019, 3:48:12 UTC - in response to Message 1983086.  

Mine isn't that bad... 50 at a time download, 43 complete, 7 in back off but then I manually retry those and all complete the download. So, for me it's getting better.
ID: 1983087 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1983088 - Posted: 2 Mar 2019, 4:02:17 UTC - in response to Message 1983087.  

Mine isn't that bad... 50 at a time download, 43 complete, 7 in back off but then I manually retry those and all complete the download. So, for me it's getting better.

Yes, doable. But requires constant manual intervention. BOINC is supposed to be a hand's off and everything runs on its own program. Not for high production hosts though. At least with the current state of the flaky servers.

I expect to be seeing cold iron tomorrow morning like this morning and spend the day trying to unstick downloads to rebuild the caches.

They need to revisit the download servers again and see what changed before the outage yesterday morning and things were working normally.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1983088 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1983090 - Posted: 2 Mar 2019, 4:17:31 UTC - in response to Message 1983088.  

They need to revisit the download servers again and see what changed before the outage yesterday morning and things were working normally.

Or what did they do during the outage that killed them so successfully?
Grant
Darwin NT
ID: 1983090 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1983095 - Posted: 3 Mar 2019, 11:29:39 UTC
Last modified: 3 Mar 2019, 11:34:18 UTC

. . Well the site is back up, the forums are back up and I was able to report work. But! Back to the usual post outage "no tasks available" :(

. . What was that song? 2 out of 3 something or other :)

. . Hey! I just got 3 tasks, and 2 stalled in d/ls :( Talk about jinxing myself ...

Stephen

:)
ID: 1983095 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1983099 - Posted: 3 Mar 2019, 11:49:35 UTC

Greetings,

https://setiathome.berkeley.edu/forum_thread.php?id=82901&postid=1983098#1983098

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1983099 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1983104 - Posted: 3 Mar 2019, 12:11:36 UTC

i reported work and was able to get 63 tasks on one system, and a few on another.

but the same problem as before with the downloads. requiring many many retries before they finally download.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1983104 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (115) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.