Panic Mode On (115) Server Problems?

Message boards : Number crunching : Panic Mode On (115) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 31 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1982828 - Posted: 1 Mar 2019, 13:38:26 UTC - in response to Message 1982824.  
Last modified: 1 Mar 2019, 13:39:19 UTC

Are you sure you didn't wake up still drunk?

Nothing different than the usual headache here.

On the DL side, all stalled DL finished and asking for more work. Will stop to ask for more work since i still have a lot of WU on my cache and leave the others DL first. Back in few hours after all returns to normal.
ID: 1982828 · Report as offensive
Sesson

Send message
Joined: 29 Feb 16
Posts: 43
Credit: 1,353,463
RAC: 3
Message 1982833 - Posted: 1 Mar 2019, 13:46:46 UTC
Last modified: 1 Mar 2019, 13:51:51 UTC

I just requested more work from SETI@home while capturing the network with Wireshark. Both download servers sent me RSTs initially. 208.68.240.119 sent me RST immediately after my computer sent ACK. 208.68.240.127 sent me RST after the HTTP request has been sent. Then I used my web browser to download one of my workunits from 208.68.240.127 and my file download dialog appears. HTTP requests to 208.68.240.119 always results in RSTs. At the moment my workunit is still not downloaded.

It could suggest that on 208.68.240.119 the web server process is crashed, so there is nobody listening on the port 80. The web server on 208.68.240.127 is overloaded and could only serve some of the TCP connections.
ID: 1982833 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1982836 - Posted: 1 Mar 2019, 13:53:22 UTC - in response to Message 1982827.  

I have a computer elsewhere, and that is doing "OK", but main main pile are not getting anything, so it might be that a DNS table has got screwed somewhere, and that could take a good few hours to resolve....
The next machine I tried attempted the right stuff:

01/03/2019 13:42:09 | SETI@home | [http] [ID#2776] Info:  Hostname boinc2.ssl.berkeley.edu was found in DNS cache
01/03/2019 13:42:09 | SETI@home | [http] [ID#2776] Info:    Trying 208.68.240.119...
01/03/2019 13:42:11 | SETI@home | [http] [ID#2776] Info:  connect to 208.68.240.119 port 80 failed: Connection refused
01/03/2019 13:42:11 | SETI@home | [http] [ID#2776] Info:    Trying 208.68.240.127...
01/03/2019 13:42:11 | SETI@home | [http] [ID#2776] Info:  Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#3224)
but got the empty server reply, and no download. But at least Georgem is handing off properly, not leaving us to wait through a timeout.

And seven minutes later I got a connection and a dozen downloads on this machine too:

01/03/2019 13:49:06 | SETI@home | Started download of blc36_2bit_guppi_58405_84630_HIP86141_0024.13265.818.22.45.166.vlar
01/03/2019 13:49:06 | SETI@home | [http] [ID#2830] Info:  Re-using existing connection! (#3267) with host boinc2.ssl.berkeley.edu
01/03/2019 13:49:06 | SETI@home | [http] [ID#2830] Info:  Connected to boinc2.ssl.berkeley.edu (208.68.240.127) port 80 (#3267)
01/03/2019 13:49:09 | SETI@home | Finished download of blc36_2bit_guppi_58405_84630_HIP86141_0024.13265.818.22.45.166.vlar
ID: 1982836 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1982844 - Posted: 1 Mar 2019, 14:29:00 UTC
Last modified: 1 Mar 2019, 14:42:59 UTC

I was finally able to force feed my dl's one at a time, so my queue is now sitting with 90 CPU & 116 GPU tasks that I will now attempt to force feed.

[edit] Looks like the server is handling 1 dl at a time automatically so I don't have to force feed it or someone kicked the side of the bad server and got it working again. [/edit]

[edit2] Looks like I got ahead of myself and ended up to having to force feed 1 @ a time. As soon as I reverted back to more than 1 d/l in the cc_config.xml, everything went to project backoff. So it's back to 1 and force feed. [/edit2].


I don't buy computers, I build them!!
ID: 1982844 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1982855 - Posted: 1 Mar 2019, 15:10:32 UTC
Last modified: 1 Mar 2019, 15:11:34 UTC

Woo Hoo Finally!
Retry #82 cleared a stuck download on my daily Win8 computer that have been there for 8 hours or more, LOL

STILL ongoing retry issues ... Eric owes me a mouse -The clicker is clunky now :(
ID: 1982855 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1982858 - Posted: 1 Mar 2019, 15:31:32 UTC

I have a split decision.

About half of my DL's are on "backoff" and half are pending. The ones that are pending are downloading.

Either we hit the magic "everyone is in 25+ hour backoff or something was magically fixed or someone got in there early and rebooted the downed server.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1982858 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 1982862 - Posted: 1 Mar 2019, 15:48:10 UTC

It's going to be either a long or idle weekend if they don't get this fixed today.
ID: 1982862 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1982867 - Posted: 1 Mar 2019, 15:57:43 UTC - in response to Message 1982862.  
Last modified: 1 Mar 2019, 15:58:37 UTC

Lots of people have given their method of starting the stuck downloads. I have two computers with stuck downloads. I have tried everything people have listed as doing to get theirs started, and still nothing here. I have rebooted computers and am almost at the point of perhaps aborting the downloads.
(back off time keeps changing every time I try a retry)
Is there perhaps something that I have not tried?

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1982867 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1982868 - Posted: 1 Mar 2019, 16:01:54 UTC - in response to Message 1982867.  

i just keep clicking retry until it goes through.

sometimes 5 clicks, sometimes 30 clicks.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1982868 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24909
Credit: 3,081,182
RAC: 7
Ireland
Message 1982870 - Posted: 1 Mar 2019, 16:03:13 UTC

A couple of hours ago, topped up my cache for Main & Beta. Got 23 Beta & 15 Main.
4 stuck on Beta & 7 on Main. 4 main D/L without any interference. :-)
ID: 1982870 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1982873 - Posted: 1 Mar 2019, 16:08:01 UTC - in response to Message 1982868.  
Last modified: 1 Mar 2019, 16:08:50 UTC

i just keep clicking retry until it goes through.

sometimes 5 clicks, sometimes 30 clicks.

I have hit over 100 clicks...………

Just added 50 more tries to one computer.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1982873 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1982880 - Posted: 1 Mar 2019, 16:18:34 UTC - in response to Message 1982873.  

The only thing that seems to help is:
<max_file_xfers_per_project>1</max_file_xfers_per_project>

Having with or w/o DNS entries for 119/127 don't seem to help. Although I seem to have better luck with 127.
It's more in timing, as in once you get a connection they start chugging though 200 no problem.
ID: 1982880 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1982881 - Posted: 1 Mar 2019, 16:19:02 UTC

On TBar Boinc 7.4.4, the usual (in this type of situations) client crash as soon as I enable network (immediate effect).
I have changed nothing. Crunching is regular as long as cache is not empty.
ID: 1982881 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1982883 - Posted: 1 Mar 2019, 16:22:40 UTC - in response to Message 1982867.  

Lots of people have given their method of starting the stuck downloads. I have two computers with stuck downloads. I have tried everything people have listed as doing to get theirs started, and still nothing here. I have rebooted computers and am almost at the point of perhaps aborting the downloads.
(back off time keeps changing every time I try a retry)
Is there perhaps something that I have not tried?


You have tried restarting the last "stuck" download?
A proud member of the OFA (Old Farts Association).
ID: 1982883 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22491
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1982884 - Posted: 1 Mar 2019, 16:26:18 UTC

One at a time worked
Just as the last one in the log jam cleared over a hundred new ones arrived in the queue, and instantly stalled :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1982884 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1982891 - Posted: 1 Mar 2019, 16:41:12 UTC - in response to Message 1982884.  

One at a time worked
Just as the last one in the log jam cleared over a hundred new ones arrived in the queue, and instantly stalled :-(


But when I hit the retry on the last one, they started up again.....

Tom
A proud member of the OFA (Old Farts Association).
ID: 1982891 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1982893 - Posted: 1 Mar 2019, 16:48:53 UTC - in response to Message 1982891.  

There seem to be 2 different types of backoffs: "Rety in:", and "Project".

Selecting the last task when in Project backoff, it appears to try 3 tasks, before returning to Project.
Keep doing that and it eventually goes to individual Retries, where you can select which file to retry.

That is with v7.4, v7.8 may be different.
ID: 1982893 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982895 - Posted: 1 Mar 2019, 16:50:33 UTC - in response to Message 1982815.  

Yesterday I was able to manually retry one at a time until an AP task was stuck on each PC. It wouldn't download so I couldn't get any more. I aborted the DL on one and received 100 tasks. 10 were able to download but nothing more. Hopefully this is fixed soon.

That's been my experience. Everything is fine downloading one at at time UNTIL a AP gets stuck and that is the end of it. No matter how many times that single AP task is retried, (this morning I see an empty machine with one stuck AP task on retry count 36), the only way to get more work is abort that AP task and then then get another block of downloads. And hope that AP snowstorm has ended.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982895 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1982897 - Posted: 1 Mar 2019, 16:59:22 UTC - in response to Message 1982883.  

Lots of people have given their method of starting the stuck downloads. I have two computers with stuck downloads. I have tried everything people have listed as doing to get theirs started, and still nothing here. I have rebooted computers and am almost at the point of perhaps aborting the downloads.
(back off time keeps changing every time I try a retry)
Is there perhaps something that I have not tried?


You have tried restarting the last "stuck" download?

What do you mean by " restarting"? Do you mean Retry? If so that is what I have been doing.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1982897 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1982900 - Posted: 1 Mar 2019, 17:01:33 UTC - in response to Message 1982895.  

Yesterday I was able to manually retry one at a time until an AP task was stuck on each PC. It wouldn't download so I couldn't get any more. I aborted the DL on one and received 100 tasks. 10 were able to download but nothing more. Hopefully this is fixed soon.

That's been my experience. Everything is fine downloading one at at time UNTIL a AP gets stuck and that is the end of it. No matter how many times that single AP task is retried, (this morning I see an empty machine with one stuck AP task on retry count 36), the only way to get more work is abort that AP task and then then get another block of downloads. And hope that AP snowstorm has ended.

I have some APs stuck on the one computer. If as you say they can not be downloaded I will try aborting them after lunch and see if that does anything for downloads on that computer. The other computer has no APs stuck.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1982900 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (115) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.