Panic Mode On (116) Server Problems?

Message boards : Number crunching : Panic Mode On (116) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 47 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991336 - Posted: 24 Apr 2019, 22:58:59 UTC - in response to Message 1991334.  

My issue occurred 15 hours after the outage recovery and before this mornings shorty outage. RTS buffer was fully stocked by then. Also I got every download I asked for without any issues except for the six or so stuck tasks on each host. The common factor was that every host had the same elapsed time on the stuck tasks or within a couple of minutes of each other since the hosts normally sync up on scheduler request timers. So they all hit the servers at approximately the same time and ended up with stuck tasks.

So don't think it was because the servers were being hit with any different amount of traffic at that time.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991336 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1991337 - Posted: 24 Apr 2019, 23:00:10 UTC - in response to Message 1991334.  

could the download problems be caused by too many people trying to get WUs all at the same time? Too many connections at once? Is it usually after an outage like we had today and yesterday?
I had a few connection problems after the second outage today, but they downloaded OK after a few retries.

I also had problems accessing this website, with - apparently - the Cloudflare service failing to get me a secure connection and giving me a 'forbidden' plain http connection instead. So I went out to the pub, and it was fine when I got back.
ID: 1991337 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1991339 - Posted: 24 Apr 2019, 23:29:21 UTC - in response to Message 1991337.  
Last modified: 24 Apr 2019, 23:30:10 UTC

I had a few connection problems after the second outage today, but they downloaded OK after a few retries.

I also had problems accessing this website, with - apparently - the Cloudflare service failing to get me a secure connection and giving me a 'forbidden' plain http connection instead. So I went out to the pub, and it was fine when I got back.


. . That seems to fall in with Wiggo's cure. :)

Stephen

:)
ID: 1991339 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34837
Credit: 261,360,520
RAC: 489
Australia
Message 1991341 - Posted: 24 Apr 2019, 23:45:21 UTC

It works most of the time for me, but it's a bummer during Coffee O'clock like today. :-D

Cheers.
ID: 1991341 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991349 - Posted: 25 Apr 2019, 0:21:19 UTC

I had a brief occurrence of backoffs and http errors after this mornings brief outage. But they cleared rather fast once I started hitting the retry button in BoincTasks.

The running elapsed timer on stalled active downloads with no backoffs and no progress is a completely different problem from all the various download issues we have experienced in the last few months. Only shown up in the past couple of weeks and always occurs in the wee hours of the morning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991349 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1991350 - Posted: 25 Apr 2019, 0:30:13 UTC - in response to Message 1991349.  

I had a brief occurrence of backoffs and http errors after this mornings brief outage. But they cleared rather fast once I started hitting the retry button in BoincTasks.

The running elapsed timer on stalled active downloads with no backoffs and no progress is a completely different problem from all the various download issues we have experienced in the last few months. Only shown up in the past couple of weeks and always occurs in the wee hours of the morning.


. . The problem I noticed was on 2 of the slower machines but as it self corrected I didn't think to check the faster Linux rig. I should have! It had been completely out of work for about 2 hours with 90 stalled downloads. One click on retry got the d/l's running immediately but I didn't stop to see what the errors were. It must have started about 4 hours prior to that.

. . Oh well, they are back to working again now.

Stephen

:)
ID: 1991350 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991416 - Posted: 25 Apr 2019, 10:52:08 UTC

This is a screenshot is when I am speaking of "stuck" downloads
stuck_downloads.png
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991416 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1991420 - Posted: 25 Apr 2019, 11:15:50 UTC - in response to Message 1991416.  

This is a screenshot is when I am speaking of "stuck" downloads
stuck_downloads.png
Interesting, and no, I've never seen that before either.

There should be a timeout in your TCP/IP stack somewhere. BOINC has

<http_transfer_timeout>seconds</http_transfer_timeout>
Abort HTTP transfers if idle for this many seconds; default 300.
in cc_config.xml: I've got that one turned down to 60 seconds, on the basis that if it ain't happened, it ain't gonna happen. ('Abort' doesn't mean throw the file away: just stop trying this time, backoff, and try again later)

The thing I've tried in years past is to 'Suspend network activity' from the 'Activity' menu in BOINC Manager, count to ... ooh, some random number or other ... and turn it back on again. At least you can do that without interrupting the running tasks and wasting time while they start again from checkpoint.
ID: 1991420 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1991421 - Posted: 25 Apr 2019, 11:22:06 UTC - in response to Message 1991420.  
Last modified: 25 Apr 2019, 11:26:34 UTC

I had turned that down to 90 seconds in the past. But when we started getting the download issues, I removed any value and defaulted back to the standard 300 seconds.

[Edit]Coulda - Shoulda thought of that on my own. That does eventually wake up the download process on the stuck tasks. Just tried on another machine with the same half dozen stuck downloads. Saves time as you say and doesn't inflict a restart on tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1991421 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1992001 - Posted: 30 Apr 2019, 14:54:25 UTC

Shortest Tuesday out(r)age yet ever today I think. I don't know whether this forebodes anything but I will try to stay positive. :^)
ID: 1992001 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1992003 - Posted: 30 Apr 2019, 15:51:08 UTC - in response to Message 1992001.  

Shortest Tuesday out(r)age yet ever today I think. I don't know whether this forebodes anything but I will try to stay positive. :^)


Very nice short outage. Good Job, seti team.
ID: 1992003 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 1992004 - Posted: 30 Apr 2019, 15:52:38 UTC

shockingly short outage
ID: 1992004 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30669
Credit: 53,134,872
RAC: 32
United States
Message 1992007 - Posted: 30 Apr 2019, 16:16:14 UTC

What did they forget to do?
ID: 1992007 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1992012 - Posted: 30 Apr 2019, 17:26:20 UTC - in response to Message 1992007.  

What did they forget to do?

I wonder also. The one last week was short also. But needed another outage later in the day. Will the pattern repeat?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1992012 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1992370 - Posted: 3 May 2019, 2:37:01 UTC
Last modified: 3 May 2019, 2:38:44 UTC

It's only me or the site & the DLs are very slow?
ID: 1992370 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19072
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1992373 - Posted: 3 May 2019, 3:03:13 UTC - in response to Message 1992370.  

It's only me or the site & the DLs are very slow?

Looking from here it's working fine 4 to 5 secs/download.
ID: 1992373 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34837
Credit: 261,360,520
RAC: 489
Australia
Message 1992375 - Posted: 3 May 2019, 3:05:35 UTC

No problems here either.

Cheers.
ID: 1992375 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1992381 - Posted: 3 May 2019, 3:52:52 UTC

I've been noticing that the website server has been slow to serve pages since the outage. No issues with downloads.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1992381 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1992386 - Posted: 3 May 2019, 4:36:52 UTC
Last modified: 3 May 2019, 4:38:35 UTC

Web pages are very slow show, 5-10 secs to mount a single page.

DL are happening with no issues but very slow (around 15KBps) instead of the > 256KBPs normal.

ISP Speed test normal (170 MBps)
ID: 1992386 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19072
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1992388 - Posted: 3 May 2019, 4:53:08 UTC - in response to Message 1992381.  

I've been noticing that the website server has been slow to serve pages since the outage. No issues with downloads.

Neither of those issues seen here at any time since the outage.
ID: 1992388 · Report as offensive
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · 18 . . . 47 · Next

Message boards : Number crunching : Panic Mode On (116) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.