Panic Mode On (115) Server Problems?

Message boards : Number crunching : Panic Mode On (115) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 31 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982710 - Posted: 1 Mar 2019, 1:46:16 UTC

The way I understand the download servers to work from a long ago post was the client asks for work from boinc2.ssl.berkeley.edu.

That gets resolved to both Georgem and vader through a round-robin load balancer mechanism. When one of the servers is disabled, the single surviving download server has to service the entire download workload. It can't support the total number of requests on its own and the download mechanism falls over.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982710 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1982722 - Posted: 1 Mar 2019, 2:18:33 UTC - in response to Message 1982710.  

The way I understand the download servers to work from a long ago post was the client asks for work from boinc2.ssl.berkeley.edu.

That gets resolved to both Georgem and vader through a round-robin load balancer mechanism. When one of the servers is disabled, the single surviving download server has to service the entire download workload. It can't support the total number of requests on its own and the download mechanism falls over.


Is there any chance the server that is down can be rebooted remotely?

Tom
A proud member of the OFA (Old Farts Association).
ID: 1982722 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1982728 - Posted: 1 Mar 2019, 2:31:42 UTC

im getting work, but every request i get some downloads that stick and i have to refresh them about 10 times before they finally kick through. sigh.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1982728 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982729 - Posted: 1 Mar 2019, 2:35:41 UTC

I believe all the servers can be rebooted remotely. But that is not what happened today. They took the project down for maintenance earlier this morning and when they brought the project back up, the replica database was at first disabled, then enabled but left offline, then they put Georgem to disabled. And that is where it has stayed.For what reason, we can only guess as the staff is not forthcoming with any current technical news except on rare occasions.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982729 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982730 - Posted: 1 Mar 2019, 2:48:36 UTC

The inability to receive work because of stuck download is really dropping the return rate.

Earlier this morning before the unexpected short maintenance event, we were returning tasks at about 150K/hr. Now we have fallen down to around 88K/hr.

Sure hope they get Georgem back running before the weekend. Would be nice to hear the technical reason why Georgem is disabled also. We mushrooms crave information.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982730 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1982734 - Posted: 1 Mar 2019, 3:02:21 UTC - in response to Message 1982675.  

Scroll to the bottom task in the list on the download page in the Manager. Select it with the mouse and click the Retry button. I can usually get a dozen or so tasks that way cleared from the list before the download server begins to ignore me and give me a increased backoff. Then I move to another host and try there until it too craps out. Then move to another host etc.


Ahh, the "bottom" has it. TY!


Honest, I have no idea. Bumped one and suddenly tons of CUDA90 tasks are downloading.

Tom

If I try that will I get CUDA 9 on my machine? Tom I suggest having a look at my host before you answer
ID: 1982734 · Report as offensive
Profile Pierre A Renaud
Avatar

Send message
Joined: 3 Apr 99
Posts: 998
Credit: 9,101,544
RAC: 65
Canada
Message 1982735 - Posted: 1 Mar 2019, 3:04:36 UTC - in response to Message 1982681.  

Is it worth using the following IP for the email system (in cases of DNS service failure) ? Haven't needed to use it in ages but have kept it as a (now possibly obsolete or erroneous) reference...

208.68.240.110 setiathome.berkeley.edu # IP address for the messages/email system(s)

If you want to modify your hosts list, then these are the current IP addresses.

208.68.240.118 setiboincdata.ssl.berkeley.edu # upload server Oct 2016
208.68.240.119 boinc2.ssl.berkeley.edu # Georgem download server Oct 2016
208.68.240.126 setiboinc.ssl.berkeley.edu # scheduler Oct 2016
208.68.240.127 vader.ssl.berkeley.edu # Vader download server Oct 2016
Well, they were current at the dates stated. My local reference set has the dates updated to August 2017, the last time we had to dust them off.

But you should not modify the final line like that. The purpose of the hosts file is to replace the DNS service when that fails (which is not the case in this outage).

So, when a program calls for a URL in the second column, the hosts file returns the IP address in the first column. BOINC will never try to access Vader by name: our downloads all come from boinc2.ssl.berkeley.edu. Only the IP part changes.

Apr 3, 1999 - May 3, 2020
ID: 1982735 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982738 - Posted: 1 Mar 2019, 4:02:36 UTC - in response to Message 1982734.  
Last modified: 1 Mar 2019, 4:08:15 UTC

If I try that will I get CUDA 9 on my machine? Tom I suggest having a look at my host before you answer


Uh, no. You need to run Linux and the special app to get CUDA9 tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982738 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982739 - Posted: 1 Mar 2019, 4:03:31 UTC - in response to Message 1982735.  

Is it worth using the following IP for the email system (in cases of DNS service failure) ? Haven't needed to use it in ages but have kept it as a (now possibly obsolete or erroneous) reference...

208.68.240.110 setiathome.berkeley.edu # IP address for the messages/email system(s)


Only if we have DNS issues and the web server is not being resolved.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982739 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1982744 - Posted: 1 Mar 2019, 5:01:02 UTC - in response to Message 1982734.  

Scroll to the bottom task in the list on the download page in the Manager. Select it with the mouse and click the Retry button. I can usually get a dozen or so tasks that way cleared from the list before the download server begins to ignore me and give me a increased backoff. Then I move to another host and try there until it too craps out. Then move to another host etc.


Ahh, the "bottom" has it. TY!


Honest, I have no idea. Bumped one and suddenly tons of CUDA90 tasks are downloading.

Tom

If I try that will I get CUDA 9 on my machine? Tom I suggest having a look at my host before you answer


. . I can offer some guidance on changing over to Linux to run CUDA90 :)

Stephen

:)
ID: 1982744 · Report as offensive
Profile B. Ahmet KIRAN

Send message
Joined: 19 Oct 14
Posts: 77
Credit: 36,140,903
RAC: 140
Turkey
Message 1982745 - Posted: 1 Mar 2019, 5:03:20 UTC

Please, Please, Please, someone stop this torture of failed downloads... I have been trying on all my machines to get a decent download without any success for now around 8 hours... Why doesn't someone close down the downloads until the problem is resolved? It is better to read "project has no tasks available" than "download retry in xx:xx:xx" which never manages to succeed in the retry...
ID: 1982745 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1982746 - Posted: 1 Mar 2019, 5:10:15 UTC

Got home to a relatively cool house, due to 1 system being out of all work & the other out of CPU work. Found lots of downloads, all in excessive backoff mode. Tried "Retry Pending transfers" with no joy.
From the looks of this thread, it's nice to know i'm not the only one.
Grant
Darwin NT
ID: 1982746 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982749 - Posted: 1 Mar 2019, 5:22:40 UTC - in response to Message 1982746.  

Downloads have been fubared most of the day ever since the mini maintenance outage this morning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982749 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1982750 - Posted: 1 Mar 2019, 5:37:53 UTC

After multiple (very frustrating) retries... I finally got some WUs to download to the faster machine.

I sorted the files by size in the download tab and tried the bottom one as suggested ( and threw a penny in a fountain and made a wish on a star) and finally got things to move. I didn't get all the stuck files, but at least the machine is crunching again.

good luck... as more machines go into lengthy time outs maybe the traffic jam won't be so bad and us crazy die-hards can get some WUs here and there.
ID: 1982750 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1982752 - Posted: 1 Mar 2019, 6:13:27 UTC - in response to Message 1982744.  

Scroll to the bottom task in the list on the download page in the Manager. Select it with the mouse and click the Retry button. I can usually get a dozen or so tasks that way cleared from the list before the download server begins to ignore me and give me a increased backoff. Then I move to another host and try there until it too craps out. Then move to another host etc.


Ahh, the "bottom" has it. TY!


Honest, I have no idea. Bumped one and suddenly tons of CUDA90 tasks are downloading.

Tom

If I try that will I get CUDA 9 on my machine? Tom I suggest having a look at my host before you answer


. . I can offer some guidance on changing over to Linux to run CUDA90 :)

Stephen

:)

Thanks for the offer Stephen I will stick with Windows
ID: 1982752 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1982753 - Posted: 1 Mar 2019, 6:27:20 UTC - in response to Message 1982750.  

After multiple (very frustrating) retries... I finally got some WUs to download to the faster machine.

I sorted the files by size in the download tab and tried the bottom one as suggested ( and threw a penny in a fountain and made a wish on a star) and finally got things to move. I didn't get all the stuck files, but at least the machine is crunching again.

good luck... as more machines go into lengthy time outs maybe the traffic jam won't be so bad and us crazy die-hards can get some WUs here and there.

If you can get at least one stuck download to clear, then you can start clearing one at a time. The download server seems to only respond to a single request to the database at a time for the stuck downloads. It may take dozens of tries to get the first one to start, but once it does, don't stop until you clear all the stuck ones, one at a time. Then you will finally be able to get more work and start the whole stuck download process over again.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1982753 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1982755 - Posted: 1 Mar 2019, 6:55:05 UTC

They threw another Aricebo file on the splitter. AP files are being handed out to those lucky enough to get through. queries/second has spiked into the 5k range, so I don't think I'll get anymore WUs tonight. I got lucky and got a few when the queries were less than 2k.

I hope tomorrow brings the Seti team some better luck and that they can solve this issue so we can all have a good weekend.
ID: 1982755 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1982756 - Posted: 1 Mar 2019, 6:57:01 UTC - in response to Message 1982753.  

If you can get at least one stuck download to clear, then you can start clearing one at a time.

Not here.
Regardless of what I try, it's 1 WU every 30-50 retries.

I think i'll wait for the servers to get sorted out.
Grant
Darwin NT
ID: 1982756 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36584
Credit: 261,360,520
RAC: 489
Australia
Message 1982757 - Posted: 1 Mar 2019, 7:05:55 UTC

Everything is ok here still and the AP's for today is into 3 figures. :-D

Cheers.
ID: 1982757 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1982762 - Posted: 1 Mar 2019, 7:30:02 UTC

I found a little trick that seems to be working once you can get downloads started.
In cc_config, set max files transfer to 1.
<max_file_xfers>1</max_file_xfers>
Have Boinc read the config file.
Once it gets going, it will only download one task at a time.
Seems to be working at the moment.

Meow!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1982762 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (115) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.